User:Manetta/graduation-proposals/proposal-0.5

From XPUB & Lens-Based wiki

graduation proposal +0.5

title: "i could have written that"

alternatives:

Introduction

For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induice understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent" to that reserved for curious, fit to be discussed only with people less enlightened than he. (Joseph Weizenbaum, 1966)

abstract

'i-could-have-written-that' will be a publishing project around technologies that process natural language. The project will put linguistic tools and techniques central, to report on their contraints and possibilities. How do these techniques mediate the written word? What effect these have on the information they transmit and the people that use them. How is natural language processed into data? That will hopefully lead to conversations that are not limited to technological aspects, but also reflect on cultural & political implications.

The computer is a linguistic device, a chain of multiple linguistic systems that can communicate with eachother. The computer uses a large set of languages on various levels, for example: digits, logic, semantic interface language.

[description of #0 issue]!

'#0'-issue:

intro

  • WordNet as a dataset that 'maps' language
  • Not 'mapping' as a tool to understand (as a primary aim) (as Julie speaks about mapping the physicality of the Internet) but rather 'mapping' in the sense of 'modeling', in order to automate 'natural language processes'.

→ 'automation' is key here ? (natural language processing techniques or automatic reading systems)

→ western urge to simplify / structure / archive knowledge, as sharing knowledge is regarded as something that will bring development in society for the future

(...)

(...)

(...)

elements

the following elements could be part of this issue: (now collected on this 'i-could-have-written-that' webpage)

→ WordNet as structure
i-could-have-written-that/ WordNet skeleton
→ a historical list of information processing systems
i-could-have-written-that/ historical list of information systems
→ text on automatic reading machines, 
placing automation in an optical process (1967) in contrast with an algorithmic process (2015)
i-could-have-written-that/ Automatic Reading Machines

(...)

(...)

(...)


publishing platform?

alternative conversations (alternative to what?)

Language as main communication system ... which makes it possible to create information, documents, friends. Not a human-only system, but also computer-system, or human-computer-system.

... A computer system or computer process is easily regarded as an objective act.

... When using a computer, language is used (as interface to control computer systems), executed (as scripts) and processed (natural language as data) ...

... both to reveal the fascinating systems that have been developed, the attempts, the dreams, but also to present a critical take on the way that these systems construct their 'truths'.

The publications will report on techniques that can be called 'reading-writing(-executing?) systems'. They touch the issues of systemization & automation that work with simplification & modeling processes. Examples are: WordNet (a lexical dataset), Pattern (text-mining software), programming languages (high-level languages like Python or markup-languages like HTML), text-parsing (turning text into number), ngrams (common word combinations);

By departing from a very technical point of view, i hope to develop a stage for alternative perspectives on these issues (making 'it-just-not-works' tutorials for example).

how?

'i-could-have-written-that' will be a publishing experiment that will formulate design questions on an 'workflow' level. The publications will not only question how information can be structured, visualized and/or related to a context through semiotic processes, but rather will question how 'design' can construct information processes, to create (human) readable documents, datasets, bug-reports, and tutorials.

These aims are related to cultural principles present in the field of open-source: take for example the aim for distribution in stead of centralized sources (for example: Wikipedia), the aim of making information available for everyone (in the sense that it should not only be available but also legible), and the aim for collaborative work (as opposed to ownership). These principles will influence my design choices, for example: to consider an infrastructure that enables collaborative work.

from designing information, to designing information processes

Comming from a background in graphic design, i got educated in a traditional way (focus on typography and aesthetics) in combination with courses in 'design strategy' and 'meaning-making' (which was not defined in such clear terms btw.). I became interested in semiotics, and in systems that use symbols/icons/indexes to gain meaning.

After my first year at the Piet Zwart, i feel that my interest shifts from designing information on an interface level, to designing information processes. Being fascinated by looking at inner workings of techniques and being affected by the open source principles, bring up a whole set of new design questions. For example: How can an interface reveal its inner system? How can infrastructural descisions be design actions? And how could a workflow effect the information it is processing?

Existing techniques that already give answers to these questions are GIT and MediaWiki. I would like to include these questions in my graduation work, and work with these software packages.

Relation to a larger context

natural language?

Natural language could be considered as the language that evolves naturally in the human mind through repetition, a process that starts for many people at a young age. For this project i would like to look at 'natural language' from a perspective grounded in computer science, computational linguistics and artificial intelligence (AI), where natural language is mostly used in the context of 'natural language processing' (NLP), a field of studies that researches the interactions between human language and the computer.

systemizing natural language?

It is discussable if language itself could be regarded as a technology or not. For my project i will follow James Gleick's statement in his book 'The Information: a Theory, a History, a Flood'[1], where he states: Language is not a technology, (...) it is not best seen as something separate from the mind; it is what the mind does. (...) but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools and it is a tool. From this moment on 'language' is turned into 'written language'.

A very primary writing technology is the latin alphabet. The main set of 26 characters is a toolbox that enables us to systemize language into characters, into words, into sentences. When considering these tools as technologies, it makes it possible to follow a line from natural language to a language that computers can take as input, via various forms of mediation.

technologies that systemize natural language?

By working closely with software that is used (for example in the fields of machine learning & text-mining), i hope to reveal the inner workings of such mediating techniques through a practical approach. Elements to work with include for example dictionaries, lexicons, lexical databases (WordNet), other datasets (ConceptNet), ngrams, and other elements that are implemented in such software.


Relation to previous practice

In the last year, i've been looking at different tools that process natural language. From speech-to-text software to text-mining tools, they all systemize language in various ways.

training common sense, work track at Relearn 2015

As a continutation of that i took part at the Relearn summerschool in Brussels last August (2015), to propose a work track in collaboration with Femke Snelting on the subject of 'training common sense'. With a group of people we have been trying to deconstruct the 'truth-construction' in algorithmic cultures, by looking at data mining processes, deconstructing the mathematical models that are used, finding moments where semantics are mixed with mathematics, and trying to grasp what kind of cultural context is created around this field. We worked with a text-mining software package called 'Pattern'. The workshop during Relearn transformed into a project that we called '#!Pattern+, and will be strongly collaborative and ongoing over a longer time span. #!Pattern+ will be a critical fork of the latest version of Pattern, including reflections and notes on the software and the culture it is surrounded within. The README file that has been written for #!PATTERN+ is online here, and more information is collected on this wiki page.

i will tell you everything (my truth is a constructed truth") catalog of "Encyclopedia of Media Object" in V2, June 2015

Another entrance to understanding what happens in algorithmic practises such as machine learning, is by looking at training sets that are used to train algorithms to recognize certain patterns in a set of data. These training sets could contain a large set of images, texts, 3d models, or video's. By looking at such datasets, and more specifically at the choices that have been made in terms of structure and hierarchy, steps of the construction a certain 'truth' are revealed. For the exhibition "Encyclopedia of Media Object" in V2 last June, i created a catalog, voice over and booklet, which placed the objects from the exhibition within the framework of the SUN database, a resource of images for image recognition purposes. (link to the "i-will-tell-you-everything (my truth is a constructed truth" interface)

There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called 'knowledge bases'. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented here and here on the Piet Zwart wiki.

Thesis intention

I would like to integrate my thesis in my graduation project, to let it be the content of the publication(s). This could take multiple forms, for example:

  • interview with creators of datasets or lexicons like WordNet
  • close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software Pattern (Relearn), or Wecka 3.0; or WordNet, ConceptNet, OpenCyc


Practical steps

how?

  • starting a series of reading/writing excercises, in continuation of the way of working in the prototype classes and during Relearn.
    • mapping WordNet's structure
    • using WordNet as a writing filter?
    • WordNet as structure for a collection (similar to the way i've used the SUN database)

while using open-source software, in order to be able to have a conversation with the tools that will be discussed, open them up.

questions of research

  • How can an interface reveal its inner system? How can structural descisions be design actions? And how could a workflow change to the information it is processing?
  • how to communicate an alternative view on algorithmic reading-writing machines?
  • how to built and maintain a (collaborative) publishing project?
    • technically: what kind of system to use to collect? wiki? mailinglist interface?
    • what kind of system to use to publish?
    • publishing: online + print --> inter-relation
    • in what context ?

references

  1. James Gleick's personal webpage, The Information: a Theory, a History, a Flood - James Gleick (2011)

current or former (related) magazines :

other publishing platforms :

publications :

datasets

* WordNet (Princeton)
* ConceptNet 5 (MIT Media)
* OpenCyc

people

algorithmic culture

Luciana Parisi
Matteo Pasquinelli
Antoinette Roivoy
Seda Gurses 

other

Software Studies. A lexicon. by Matthew Fuller (2008)

reading list

notes and related projects

BAK lecture: Matthew Fuller, on the discourse of the powerpoint (Jun. 2015) - annotations

project: Wordnet

project: i will tell you everything (my truth is a constructed truth)

project: serving simulations