graduation proposal +0.3

title: "i could have written that"

alternatives:

typographic technologies / typographic systems
turning words into numbers

Introduction

For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induice understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent" to that reserved for curious, fit to be discussed only with people less enlightened than he. (Joseph Weizenbaum, 1966)

→ react on the quote, 'less enlightened than he'? ... 'its magic crumbles away'? (for me the magic starts to reveal, but is definitly not crumbeling away), 'intelligent to curious' → that fits

what do you want to do?

"i-could-have-written-that" will be a publishing platform operating from an aim of revealing inner workings of technologies that systemize natural language, through tools that function as natural language interfaces (for both the human and machines), while regarding such technologies as (contemporary) typographic systems.

This publishing platform will attempt to reveal where such typographic systems touch the issues of systemization / automation / an algorithmic 'truth' that contain elements of simplification / probability / modeling processes.

... by looking closely at the material (technical) elements that are used to construct certain systems ... (in order to look for alternative perspectives)

a first prototype to start collecting related material is online here: 
http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/

→ a historical list of information processing systems:
i-could-have-written-that/ online here

Relation to a larger context

natural language?

Natural language could be considered as the language that evolves naturally in the human mind through repetition, a process that starts for many people at a young age. For this project i would like to look at 'natural language' from a perspective grounded in computer science, computational linguistics and artificial intelligence (AI), where natural language is mostly used in the context of 'natural language processing' (NLP), a field of studies that researches the interactions between human language and the computer.

systemizing natural language?

We could consider the latin alphabet as a very basic piece of technology for natural language. It's a tool that we use to relate to spoken language. The main set of 26 characters is a toolbox that enables us to systemize sounds into characters, into words, into sentences. Writing is a technology that collaborates in close friendship with the alphabet, in order to start being a technique for communication. When considering these tools as technologies, it makes it possible to follow a line from natural language to a language that computers can work with, via various forms of mediation.

technologies that systemize natural language?

By working closely with software that is used in the fields of machine learning & text-mining, i hope to reveal the inner workings of such mediating techniques through a practical approach. Elements to work with include for example dictionaries, lexicons, lexical databases (WordNet), other datasets (ConceptNet), ngrams, and other elements that are implemented in such software.

typography?

Typography is a type of language mediation that creates a semantic dimension to a text at an aesthetic level: it is present in the choise for a certain font, the style and connotation of the letterform, size, kerning, line-height, (non)capitalization, whitespace present on a page, layout, text density, and many more other forms that could be thought of. But typographic choises are never alone, they always intertwingle with the content of the text itself, how it's written, where it's published, or even the availability of the text. Typography therefore mediates the meaning of language, and could therefore be considered to be a reading- or writing technique.

typography for both the computer & the human eye

An enthusiastic attempt to create typography for both the human as also the computer's 'eye' is published in 'The Journal of Typographic Research' (V1N2-1967). The article OCR-B : A Standardized Character for Optical Recognition presents 'OCR-B', a typeface that is optimized for machinic reading, designed by Adrian Frutiger. The author ends the article with (techno-optimistically) stating the hope that one day "reading machines" will have reached perfection and will be able to distinguish without any error the symbols of our alphabets, in whatever style they may be written.

(contemporary) typographic systems?

In this same article, the author did fortold us a future wherein [a]utomatic optical reading is likely to widen the bounds of the field of data processing. The term 'data-processing' is probably referring here to typed or printed information in documents, but in a current time 'data-processing' could be read in a different way. It does now rather refer to techniques that 'read' natural language not through an optical process, but by perceiving language as 'data'. In the field of data-mining, computers 'read' counted words and most-common-word-combinations (called bag-of-words). And as data-mining techniques are often reffered to as also being in direct connection to the data (as it is the data that speaks! or because no humans were even involved!) it seems to come very close to what has been predicted in the article from 1967: Is this the perfect machinic reading situation aimed for, without the presence of any errors? If so, we could regard technologies that deal with language as 'data' as typographic systems i think, as they mediate reading/writing processes as much as typography does, only at a much more infrastructural level.

In order to work with natural language processes, and regarding them as typographic systems, i would like to work closely with such techniques: data/text-mining (text-parsing, text-simplification, vector-space-models, algorithmic recognition analysises, algorithmic culture), machine learning (training-sets, taxonomies, categories), logic (simplification, universal representative systems)

publishing platform?

As a magazine / blog / or newsletter is a periodical format that evolves over time, it captures and reflects a certain time and location including the present ideals and concerns. Setting up such publishing platform is partly also comming from an archival aim. Looking back now at the issues of Radical Software published in the 1970s for example, it gives me an urge to capture the concerns of today about (algorithmic) technologies (data monopolies, a strong believe in algorithms and a objectivication of mathematics for example). By departing from a very technical point of view, i hope to develop a stage for alternative perspectives on these issues (making 'it-just-not-works' tutorials for example). But also by regarding machine-learning processes as typography, and therefore as reading/writing machines or processes, it could create a playground where i cannot only collect information about such systems, but also put them into practise.

It would be great if technology would be as visible as possible again, opened up, and deconstructed, at a time where the invisibility of technique is key, and computers or phones are 'just' working. These ideals come from a certain set of cultural principles present in the field of open-source: take for example the importance of distribution in stead of centralization, the aim of making information available for everyone (in the sense that it should not only be available but also legible), and the openness of the software packages which makes it possible to dig into the files a piece of software uses to function. These principles will influence choises to make for for example the distributive side of the publishing platform.

As coming from a graphic design background, working with typography and systems are closely related to 'make'-situations. A publishing platform could function as a framework for me to both reflect on the topic, while working with the techniques directly. Also, being closer to techniques and the open source principles bring up a whole set of new design questions. For example: How can an interface reveal its inner system? How can structural descisions be design actions?

Relation to previous practice

In the last year, i've been looking at different tools that contain linguistic systems. From speech-to-text software to text-mining tools, they all systemize language in various ways in order to understand natural language—as human language is called in computer science. These tools fall under the term 'Natural Language Processing' (NLP), which is a field of computer science that is closely related to Artificial Intelligence (AI).

As a continutation of that i took part at the Relearn summerschool in Brussels last August, to propose a working track in collaboration with Femke Snelting on the subject of 'training common sense'. With a group of people we have been trying to deconstruct the truth-construction process in algorithmic cultures, by looking at data mining processes, deconstructing the mathematical models that are used, finding moments where semantics are mixed with mathematic models, and understanding which cultural context is created around this field. These steps where in close relation to a text-mining software package called 'Pattern'. The workshop during Relearn transformed into a project that we called '#!Pattern+, and will be strongly collaborative and ongoing over a longer time span. #!Pattern+ will be a critical fork of the latest version of Pattern, including reflections and notes on the software and the culture it is surrounded within. The README file that has been written for #!PATTERN+ is online here, and more information is collected on this wiki page.

Another entrance to understanding what happens in algorithmic practises such as machine learning, is by looking at training sets that are used to train software that is able to recognize certain patterns in a set of data. These training sets could contain a large set of images, texts, 3d models, or video's. By looking at such datasets, and more specifically at the choises that have been made in terms of structure and hierarchy, steps of the construction a certain 'truth' are revealed. For the exhibition "Encyclopedia of Media Object" in V2 last June, i created a catalog, voice over and booklet, which placed the objects from the exhibition within the framework of the SUN database, a resource of images for image recognition purposes. (link to the "i-will-tell-you-everything (my truth is a constructed truth" interface)

There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called 'knowledge bases'. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented here and here on the Piet Zwart wiki.

Thesis intention

Practical steps

how?

reflecting on such typographic systems by writing/collecting while staying close to the technologies.

reffering to the type of writing done by:

- Matthew Fuller, powerpoint (+ in 'software studies, a lexicon')
- Constant, pipelines
- Steve Rushton, feedback
- Angie Keefer, Octopus

starting a series of reading/writing excercises, in continuation of the way of working in the prototype classes and during Relearn.

- using WordNet as a writing filer? 
- WordNet as structure for a collection (similar to the way i've used the SUN database)

touching the following issues around the systemization of language:

- automation (of tasks/human labour; algorithmic culture, machine learning, ...)
- simplification (as step in the process; turning text into number)
- the aim for an universal system (taxonomy structures, categorization, ascii/unicode, logic)
- it works? (revealing inner workings and non-workings of technologies)
- cultural context (algorithmic agree-ability, believe in technology, AI, aim for invisibility / naturalization)

while using open-source software, in order to be able to have a conversation with the tools that will be discussed, open them up.

questions of research

if natural language systems could be regarded as typography, which reading/writing options does that bring?
how to built and maintain a (collaborative) publishing project?
- technically: what kind of system to use to collect? wiki? mailinglist interface?
- what kind of system to use to publish?
- publishing: online + print --> inter-relation
- in what context ?

References

current or former (related) magazines :

other publishing platforms :

Monoskop
unfold.thevolumeproject.org
mailinglist interface: lurk.org
mailinglist interface: nettime --> discussions in public
archive of publications closely related to technology: P-DPA (Silvio Larusso)

publications :

datasets

* WordNet (Princeton)
* ConceptNet 5 (MIT Media)
* OpenCyc

people

algorithmic culture

Luciana Parisi
Matteo Pasquinelli
Antoinette Roivoy
Seda Gurses

other

Software Studies. A lexicon. by Matthew Fuller (2008)

reading list

notes and related projects

BAK lecture: Matthew Fuller, on the discourse of the powerpoint (Jun. 2015) - annotations

project: Wordnet

project: i will tell you everything (my truth is a constructed truth)

project: serving simulations

User:Manetta/graduation-proposals/proposal-0.3

Contents