User:Manetta/graduation-proposals/proposal-0.5: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div style="width:100%;max-width:800px;">
<div style="width:100%;max-width:850px;">
=<span style="color:blue;">graduation proposal +0.5</span>=
=<span style="color:blue;">graduation proposal +0.5</span>=


== title: "i could have written that" ==  
== title: "i could have written that" ==  
[[File:I-could-have-written-that-webpage.png|thumbnail|right|200px|[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ i could have written that (webpage / blog), a webpage to start collecting related material. working with git, make, pandoc, markdown, and experimental microdata] ]]
[[File:I-could-have-written-that-webpage.png|thumbnail|right|200px|[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ i could have written that (webpage / blog), a webpage to start collecting related material. working with git, make, pandoc, markdown, and experimental microdata] [[User:Manetta/i-could-have-written-that/filesystem-interface-related-material|(page about workflow)]]]]
[[File:I-could-have-written-that-wiki.png|right|thumbnail|200px|[http://i-could-have-written-that.manettaberends.nl/ i could have written that (wiki), first prototype (running on DokuWiki software) to work collaboratively at material that could be integrated in the publication(s)] ]]


alternatives:
alternatives:
Line 15: Line 14:


===abstract===
===abstract===
'i-could-have-written-that' will be a publishing project around technologies that process natural language. The project will put ''tools'' and ''techniques'' central, to report on their contraints and possibilities, and what effect these have on the information they transmit and the people that use them. That will hopefully lead to conversations that are not limited to technological aspects, but also reflect on cultural & political implications.  
[[File:World-Well-Being-Project-wordclouds.gif|thumbnail|right|200px|wordclouds visualzing text-mining research results; from [http://www.wwbp.org/ the World Well Being Project] ]]
[[File:Sentiment-analyses_wwbp.png|thumbnail|right|200px|typography analysis as sentiment analysis; from [http://www.wwbp.org/ the World Well Being Project]]]
[[File:Antropomorphic-reading-terms.gif|thumbnail|right|200px|antropomorphism used as a tool to relate to computer processes]]
'i-could-have-written-that' will be a publishing project around technologies that process natural language. The project will put ''tools'' and ''techniques'' central. The first issue will look at the lexical dataset 'WordNet', a primary resource in the field of Knowlegde Discovery in Data processes (also known as the field of data-mining and big-data). WordNet is built with word-'synsets' (organizing a vocabulary on word-meanings in stead of terms), which are structured by various types of relations like word-type or word-categorie. This dataset has been developed since 1985, and is basically a norm in the field, used during training processes of data-mining algorithms. Although the focus on word-synsets is an attempt to create a nuanced model of a human language, the dataset is still a model, and will always be 'imperfect'. But today, as written language is regarded as 'data', data-mining techniques make it possible to 'analyse' such data easily and in big quantity. They return 'knowledge' in a human readable form, which is presented as an objective truth.  


[description of #0 issue]!
What is it that is gained, to keep on working on the illusion to create a model of human language? (Is it techno-positivism?) Why is information been presented as objective? (''It is the data that speaks.'') How does this objectivication process look like? (How is data created?)


---------------------------------
These questions will hopefully lead to conversations that are not limited to technological aspects, but also reflect on cultural & political implications of WordNet.  
==='#0'-issue:===
==== intro ====
* WordNet as a dataset that 'maps' language
* Not 'mapping' as a tool to understand (as a primary aim) (as Julie speaks about mapping the physicality of the Internet) but rather 'mapping' in the sense of 'modeling', in order to '''automate''' 'natural language processes'.


&rarr; 'automation' is key here ? (''natural language processing techniques'' or ''automatic reading systems'')
== personal interest ==


&rarr; western urge to simplify / structure / archive knowledge, as sharing knowledge is regarded as something that will bring development in society for the future
As coming from a background in graphic design i became interested in mediation and communication techniques. My interest lies specifically in language, our primary communication system. Language does not only mediate the contact between human to human, but also the relation from a human to objects and other entities in the world around us. It is a tool to understand, categorize and give meaning to the world. This is an obvious process that happens to everyone, but the semiotic writings of Saussure and Pierce made such invisible systems a topic of research: they introduced me to systems that describe such mediating processes. I became enthusiastic to learn about signs that are signifying certain meaning within culture, as the color 'gold' can signify richness, or the color 'red' can either stand for aggresion or socialism. Fascinating systems, useful as analysis tools, but described already in the 60s and mainly applied in the field of advertisement. It was therefore difficult to implement in my practise.


(...)
Since my time here at the Piet Zwart, i started to look at software that processes human language. I was curious to see how software is designed to understand us, our human language system. These software packages contain dictionaries, datasets, wordlists, scripts (and much more), which suddenly materialized the meaning-making-process. The physicality of those wordlist and the structure of such datasets are reflecting aims to understand the human world, (written by humans of course). To create a digital system for our linguistic system, is i think both poetic but also revealing current aims to systemize and automate.  


(...)
As my design tools and material also change, and the field as well, i try to understand how software mediates, how programming languages mediate. They are my new material to design with, my new languages. I don't 'master' them, and, they are heavily related to fast changes in new media. This is both exciting and terrifying at the same time.  


(...)
===from designing information, to designing information processes===
I got educated in a traditional way (focus on typography and visual language) in combination with courses in editorial design. I became interested in semiotics, and in systems that use symbols/icons/indexes to gain meaning.  


==== elements ====
After my first year at the Piet Zwart, i feel that my interest shifts from designing information on an interface level, to designing information processes. Being fascinated by looking at inner workings of techniques and being affected by the 'free and open software' principles, bring up a whole set of new design questions. For example:  
the following elements could be part of this issue: (now collected [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ on this 'i-could-have-written-that' webpage])
&rarr; WordNet as structure
[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/wordnet-skeleton/wordnet-skeleton.html i-could-have-written-that/ WordNet skeleton]


  &rarr; a historical list of ''information processing systems''
  How can an interface ''reveal'' its inner system?
  [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/historical-list-of-information-systems/historical-list-of-information-systems.html i-could-have-written-that/ historical list of information systems]
How can infrastructural descisions ''be'' design actions?
And how could a workflow effect the information it is processing?
Would you like to be dependent on an online service like WordPress, to publish your material?
Would you like to be able to work on your blog online only?
How do you preserve the material you will publish?
When could a document be called 'a document'; when it is readable for the user or for the computer?
  How could you implement the inner working of an online document in the interface?
How can we work together at the same file at the same time?


&rarr; text on ''automatic reading machines'',  
Existing techniques that already give answers to these questions are GIT and MediaWiki. I would like to include these questions in my graduation work, and work with these software packages.
placing automation in an optical process (1967) in contrast with an algorithmic process (2015)
[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html i-could-have-written-that/ Automatic Reading Machines]


(...)
--------------------------------------------------------
('''in the field of''' graphic design, '''i recognize a pattern''' of 'designing' on a high level. I call this 'high level design' '''because of''' the tradition to formulate design questions on the level of the interface and the interpretation of the reader. This is different in the field of computed type design, where fonts are built out of outlines, skeletons or dots; depending on their constructions. Also the field of experimental book design could be regarded as a exception. Though, in the field of web design, working on the level of the interface is very common, but '''could be avoided by''' regarding 'design' as designing a 'workflow'. That will lead to a design practise where current general questions could be included as design questions. Design questions are then related to your tools (software), social workstructure, principles, etc.)


(...)
==context==
'''In current reactions on''' the effects of an optimistic believe in computation, '''i recognize a pattern''' of requests for a focus on computation and algorithmic culture as 'synthetic' processes, '''because''' there is a convention to recognize anthropomorphic qualities in computers (like 'thinking' or 'intelligence') that obscures their syntactical nature. '''Also, because of''' ideological aims of big data to have direct access to information, there is a believe that 'truth' can be gained from data without any mediation, as ''it is the data that speaks'' or ''no humans are every involved''. '''This could be avoided by''' a collection of alternative perspectives on these techniques, made accessible and legible for people with an interest to regard software as a cultural product. A choice for a focus on specifically software that processes natural language, is both a poetic choice, as also a way to speak directly with the techniques that create information ''using'', ''excecuting'' & ''processing'' language*; the main communication system for both humans & computers. (Though stating this feels dangerous here: they differ!) '''Then it would be possible''' to formulate an opinion or understanding about computational (syntactical) techniques without relying only on the main (and overpowering) perspective.
--------------------------------------------------------
* ''Critical thinking about computers is not possible without an informed understanding.'' (...) ''Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation.'' (...) ''It's upon critics to reflect on the contraints that computer control languages write into culture.'' — from: Language, by Florian Cramer; published in Software Studies, edited by Matthew Fuller (2008)
* ''in many cases these debates [about artificial intelligence] may be missing the real point of what it means to live and think with forms of '''synthetic intelligence''' very different from our own.'' (...) ''To regard A.I. as inhuman and machinic should enable a more reality-based understanding of ourselves, our situation, and a fuller and more complex understanding of what 'intelligence' is and is not.'' — from Benjamin Bratton, Outing A.I. Beyond the Turing Test (2015)
* ''Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation'' — from: Antoinette Roivoy; during her lecture as part of the panel 'All Watched Over By Algorithms', held during the Transmediale festival (2015)
* ''The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality.'' — from: Colossal Data and Black Futures, lecture by Ramon Amaro; as part of the Impakt festival (2015)


(...)
<small>* A computer is a linguistic device. When using a computer, language is ''used'' (as interface to control computer systems), ''executed'' (as scripts written in (a set of) programming languages) and ''processed'' (turning natural language into data).</small>


-------------------------------------------
==how?==
[[File:Babelnet-wordnet.png|thumbnail|right|200px|BabelNet, a encyclopedic dictionary; using i.e. WordNet & Wikipedia; [http://www.babelnet.org/ link to Babelnet] ]]
* i'm currently looking for a 'case-study', a project, tool or software package that uses WordNet [[User:Manetta/i-could-have-written-that/wordnet-case-studies | documented here]]
* documenting the investigative element of the project: central collection of material? (wiki / dokuwiki / git + gitweb / ...)
* first publication focussing on the case-study, and regarding it as a publishing experiment;
* a publication could later return periodically; the format will not be fixed, could be printed, digital, combination, offline, online; this is the publishing experiment part; the public aimed at is interested to regard software as a cultural product (not only a technological tone is hence needed)


====publishing platform? ====
'i-could-have-written-that' will be a publishing experiment that will formulate design questions on an 'workflow' level. The publications will not only question how information can be structured, visualized and/or related to a context through semiotic processes, but rather will question how 'design' can construct information processes, to create (human) readable documents, datasets, bug-reports, and tutorials.  
===== alternative conversations (alternative to what?)=====
Language as main communication system ... which makes it possible to create information, documents, friends. Not a human-only system, but also computer-system, or human-computer-system.  


... A computer system or computer process is easily regarded as an objective act.
These aims are related to cultural principles present in the field of 'free and open software': take for example ''the aim for distribution in stead of centralized sources'' (for example: Wikipedia or Git), the aim of ''making information available'' (in the sense that it should not only be available but also legible), and the aim for ''collaboration'' (in the sense of learning together). These principles will influence my design choices, for example: to consider an infrastructure that enables collaborative work.


... When using a computer, language is ''used'' (as interface to control computer systems), ''executed'' (as scripts) and ''processed'' (natural language as data) ...
==#0 issue==
=== WordNet ===
* WordNet case-study


... both to reveal the fascinating systems that have been developed, the attempts, the dreams, but also to present a critical take on the way that these systems construct their 'truths'.
* starting a series of reading/writing excercises, in continuation of the way of working in the prototype classes and during Relearn.
 
** mapping WordNet's structure
The publications will report on techniques that can be called 'reading-writing(-executing?) systems'. They touch the issues of systemization & automation that work with simplification & modeling processes. Examples are: WordNet (a lexical dataset), Pattern (text-mining software), programming languages (high-level languages like Python or markup-languages like HTML), text-parsing (turning text into number), ngrams (common word combinations);
** using WordNet as a writing filter?  
** WordNet as structure for a collection (similar to the way i've used the SUN database)


By departing from a very technical point of view, i hope to develop ''a stage for alternative perspectives'' on these issues (making 'it-just-not-works' tutorials for example).
===other elements i'm currently working on===


====how?====
[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/historical-list-of-information-systems/historical-list-of-information-systems.html * historical context (of code-systems/...?) (list document)]


'i-could-have-written-that' will be a publishing experiment that will formulate design questions on an 'workflow' level. The publications will not only question how information can be structured, visualized and/or related to a context through semiotic processes, but rather will question how 'design' can construct information processes, to create (human) readable documents, datasets, bug-reports, and tutorials.
[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html * automatic reading machines; from encoding-decoding to constructed-truths (video/slideshow?)]


These aims are related to cultural principles present in the field of open-source: take for example ''the aim for distribution in stead of centralized sources'' (for example: Wikipedia), the aim of ''making information available for everyone'' (in the sense that it should not only be available but also legible), and the aim for ''collaborative work'' (as opposed to ownership). These principles will influence my design choices, for example: to consider an infrastructure that enables collaborative work.
[[User:Manetta/i-could-have-written-that/syntactic-view | * call for a syntactic view; Florian Cramer & Benjamin Bratton (text)]]


=====from designing information, to designing information processes=====
[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/transcription_Presentation-by-Antoinette-Rouvroy/transcription_Presentation-by-Antoinette-Rouvroy.html  * Antoinette Rouvroy; All Watched Over by Algorithms, Transmediale 2015 (transcription)]
Comming from a background in graphic design, i got educated in a traditional way (focus on typography and aesthetics) in combination with courses in 'design strategy' and 'meaning-making' (which was not defined in such clear terms btw.). I became interested in semiotics, and in systems that use symbols/icons/indexes to gain meaning.  


After my first year at the Piet Zwart, i feel that my interest shifts from designing information on an interface level, to designing information processes. Being fascinated by looking at inner workings of techniques and being affected by the open source principles, bring up a whole set of new design questions. For example: How can an interface ''reveal'' its inner system? How can infrastructural descisions ''be'' design actions? And how could a workflow effect the information it is processing?  
[[User:Manetta/i-could-have-written-that/anthropomorphic-qualities | * anthropomorphic qualities of a computer ('poster' visualizations?)]]


Existing techniques that already give answers to these questions are GIT and MediaWiki. I would like to include these questions in my graduation work, and work with these software packages.
[[User:Manetta/i-could-have-written-that/the-data-apparatus | * the photographic apparatus &rarr; the data apparatus (voice-over?)]]


== Relation to a larger context ==
[[User:Manetta/i-could-have-written-that/wordnet-case-studies | * WordNet case-studies]]


====natural language?====
[[User:Manetta/i-could-have-written-that | &rarr; i document these elements on this page]]
Natural language could be considered as the language that evolves naturally in the human mind through repetition, a process that starts for many people at a young age. For this project i would like to look at 'natural language' from a perspective grounded in ''computer science'', ''computational linguistics'' and ''artificial intelligence'' (AI), where natural language is mostly used in the context of 'natural language processing' (NLP), a field of studies that researches the interactions between human language and the computer.


====systemizing natural language? ====
[[User:Manetta/i-could-have-written-that/little-glossary | &rarr; little glossary of topics]]
It is discussable if language itself could be regarded as a technology or not. For my project i will follow James Gleick's statement in his book 'The Information: a Theory, a History, a Flood'<ref name="gleick">[http://around.com/the-information James Gleick's personal webpage], The Information: a Theory, a History, a Flood - James Gleick (2011)</ref>, where he states: ''Language is not a technology, (...) it is not best seen as something separate from the mind; it is what the mind does. (...) but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools and it is a tool.'' From this moment on 'language' is turned into 'written language'.
 
A very primary writing technology is the latin alphabet. The main set of 26 characters is a toolbox that enables us to systemize language into characters, into words, into sentences. When considering these tools as technologies, it makes it possible to follow a line from natural language to a language that computers can take as input, via various forms of mediation.
 
====technologies that systemize natural language? ====
By working closely with software that is used (for example in the fields of ''machine learning & text-mining''), i hope to reveal the inner workings of such mediating techniques through a practical approach. Elements to work with include for example dictionaries, lexicons, lexical databases (WordNet), other datasets (ConceptNet), ngrams, and other elements that are implemented in such software.




Line 116: Line 121:
* interview with creators of datasets or lexicons like WordNet
* interview with creators of datasets or lexicons like WordNet
* close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software ''Pattern'' (Relearn), or ''Wecka 3.0''; or WordNet, ConceptNet, OpenCyc
* close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software ''Pattern'' (Relearn), or ''Wecka 3.0''; or WordNet, ConceptNet, OpenCyc
*
*
== Practical steps ==
===how?===
* creating a historical context, a list of information processing systems, started here: [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/historical-list-of-information-systems/historical-overview-of-information-systems.html i-could-have-written-that/ historical list of information systems]
* creating a context of automatic 'reading' machines, started here: [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html i-could-have-written-that/ Automatic Reading Machines]
* starting a series of reading/writing excercises, in continuation of the way of working in the prototype classes and during Relearn.
** mapping WordNet's structure
** using WordNet as a writing filter?
** WordNet as structure for a collection (similar to the way i've used the SUN database)
while using open-source software, in order to be able to have a conversation with the tools that will be discussed, open them up.
===questions of research===
* How can an interface ''reveal'' its inner system? How can structural descisions ''be'' design actions? And how could a workflow change to the information it is processing?
* how to communicate an alternative view on algorithmic reading-writing machines?
* how to built and maintain a (collaborative) publishing project?
** technically: what kind of system to use to collect? wiki? mailinglist interface?
** what kind of system to use to publish?
** publishing: online + print --> inter-relation
** in what context ?
*
*  
*  


== references ==
== references ==
<references />
<references />
===publishing experiments===
* [http://networkcultures.org/digitalpublishing/ digital publishing toolkit]
* [http://activearchives.org/aaa/ Active Archives VideoWiki - Constant's VideoWiki]
* [http://post-inter.net/ art post-internet (2014), a PDF + webpage catalogue]<br>
* [https://mcluhan.consortium.io/ Hybrid Lecture Player (to be viewed in Chrome/Chromium)]<br>
*


===current or former (related) magazines : ===
===current or former (related) magazines : ===
* [https://s3-us-west-2.amazonaws.com/visiblelanguage/pdf/V1N1_1967_E.pdf The Journal of Typographic Research (1967-1971)] (now: [http://visiblelanguagejournal.com/about Visible Language])
* [https://s3-us-west-2.amazonaws.com/visiblelanguage/pdf/V1N1_1967_E.pdf The Journal of Typographic Research (1967-1971)] (now: [http://visiblelanguagejournal.com/about Visible Language])
* [http://www.radicalsoftware.org/e/index.html Radical Software (1970-1974, NY)]
* [http://www.radicalsoftware.org/e/index.html Radical Software (1970-1974, NY)]
* [http://ds.ccc.de/download.html die Datenschleuder, Chaos Computer Club publication (1984-ongoing, DE)]
* [http://www.dot-dot-dot.us/ Dot Dot Dot (2000-2011, USA)]
* [http://www.dot-dot-dot.us/ Dot Dot Dot (2000-2011, USA)]
* [http://www.servinglibrary.org/ the Serving Library (2011-ongoing, USA)]
* [http://www.servinglibrary.org/ the Serving Library (2011-ongoing, USA)]
* OASE, on architecture (NL)
* [http://libregraphicsmag.com/ Libre Graphics Magazine (2010-ongoing) PR)]
* [https://worksthatwork.com/ Works that Work (2013-ongoing, NL)]
* [http://neural.it/ Neural (IT)]
* [http://www.aprja.net/ Aprja (DK)]


===other publishing platforms :===
===other publishing platforms :===
* [http://monoskop.org/Monoskop Monoskop]
* [http://monoskop.org/Monoskop Monoskop]
* [http://unfold.thevolumeproject.com/ unfold.thevolumeproject.org]
* mailinglist interface: lurk.org
* mailinglist interface: lurk.org
* mailinglist interface: nettime --> discussions in public
* mailinglist interface: nettime --> discussions in public
* [http://p-dpa.net/ archive of publications closely related to technology: P-DPA (Silvio Larusso)]
* [http://p-dpa.net/ archive of publications closely related to technology: P-DPA (Silvio Larusso)]
===publications :===
* [http://post-inter.net/ art post-internet (2014), a PDF + webpage catalogue]<br>
* [https://mcluhan.consortium.io/ Hybrid Lecture Player (to be viewed in Chrome/Chromium)]<br>


===datasets===
===datasets===
Line 179: Line 150:


===people===
===people===
====algorithmic culture====
algorithmic culture
Luciana Parisi
Matteo Pasquinelli
  Antoinette Roivoy
  Antoinette Roivoy
  Seda Gurses  
  Seda Gurses  
Ramon Amaro
computational culture
Florian Cramer
Benjamin Bratton


==== other ====
==== other ====
[https://mitpress.mit.edu/books/software-studies Software Studies. A lexicon. by Matthew Fuller (2008)]
[https://mitpress.mit.edu/books/software-studies Software Studies. A lexicon. by Matthew Fuller (2008)]
===reading list===


===notes and related projects===
===notes and related projects===
[http://pzwart1.wdka.hro.nl/~manetta/annotations/txt/bak-algorithmic-cultures-2015.html BAK lecture: Matthew Fuller, on the discourse of the powerpoint (Jun. 2015) - annotations]<br>
[http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/bak-algorithmic-cultures-2015.html BAK lecture: Matthew Fuller, on the discourse of the powerpoint (Jun. 2015) - annotations]<br>


[[User:Manetta/semantic-systems/knowledge-bases/wordnet | project: Wordnet]]
[[User:Manetta/semantic-systems/knowledge-bases/wordnet | project: Wordnet]]

Latest revision as of 15:08, 11 November 2015

graduation proposal +0.5

title: "i could have written that"

alternatives:

Introduction

For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induice understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent" to that reserved for curious, fit to be discussed only with people less enlightened than he. (Joseph Weizenbaum, 1966)

abstract

wordclouds visualzing text-mining research results; from the World Well Being Project
typography analysis as sentiment analysis; from the World Well Being Project
antropomorphism used as a tool to relate to computer processes

'i-could-have-written-that' will be a publishing project around technologies that process natural language. The project will put tools and techniques central. The first issue will look at the lexical dataset 'WordNet', a primary resource in the field of Knowlegde Discovery in Data processes (also known as the field of data-mining and big-data). WordNet is built with word-'synsets' (organizing a vocabulary on word-meanings in stead of terms), which are structured by various types of relations like word-type or word-categorie. This dataset has been developed since 1985, and is basically a norm in the field, used during training processes of data-mining algorithms. Although the focus on word-synsets is an attempt to create a nuanced model of a human language, the dataset is still a model, and will always be 'imperfect'. But today, as written language is regarded as 'data', data-mining techniques make it possible to 'analyse' such data easily and in big quantity. They return 'knowledge' in a human readable form, which is presented as an objective truth.

What is it that is gained, to keep on working on the illusion to create a model of human language? (Is it techno-positivism?) Why is information been presented as objective? (It is the data that speaks.) How does this objectivication process look like? (How is data created?)

These questions will hopefully lead to conversations that are not limited to technological aspects, but also reflect on cultural & political implications of WordNet.

personal interest

As coming from a background in graphic design i became interested in mediation and communication techniques. My interest lies specifically in language, our primary communication system. Language does not only mediate the contact between human to human, but also the relation from a human to objects and other entities in the world around us. It is a tool to understand, categorize and give meaning to the world. This is an obvious process that happens to everyone, but the semiotic writings of Saussure and Pierce made such invisible systems a topic of research: they introduced me to systems that describe such mediating processes. I became enthusiastic to learn about signs that are signifying certain meaning within culture, as the color 'gold' can signify richness, or the color 'red' can either stand for aggresion or socialism. Fascinating systems, useful as analysis tools, but described already in the 60s and mainly applied in the field of advertisement. It was therefore difficult to implement in my practise.

Since my time here at the Piet Zwart, i started to look at software that processes human language. I was curious to see how software is designed to understand us, our human language system. These software packages contain dictionaries, datasets, wordlists, scripts (and much more), which suddenly materialized the meaning-making-process. The physicality of those wordlist and the structure of such datasets are reflecting aims to understand the human world, (written by humans of course). To create a digital system for our linguistic system, is i think both poetic but also revealing current aims to systemize and automate.

As my design tools and material also change, and the field as well, i try to understand how software mediates, how programming languages mediate. They are my new material to design with, my new languages. I don't 'master' them, and, they are heavily related to fast changes in new media. This is both exciting and terrifying at the same time.

from designing information, to designing information processes

I got educated in a traditional way (focus on typography and visual language) in combination with courses in editorial design. I became interested in semiotics, and in systems that use symbols/icons/indexes to gain meaning.

After my first year at the Piet Zwart, i feel that my interest shifts from designing information on an interface level, to designing information processes. Being fascinated by looking at inner workings of techniques and being affected by the 'free and open software' principles, bring up a whole set of new design questions. For example:

How can an interface reveal its inner system? 
How can infrastructural descisions be design actions? 
And how could a workflow effect the information it is processing? 
Would you like to be dependent on an online service like WordPress, to publish your material? 
Would you like to be able to work on your blog online only? 
How do you preserve the material you will publish? 
When could a document be called 'a document'; when it is readable for the user or for the computer? 
How could you implement the inner working of an online document in the interface? 
How can we work together at the same file at the same time?

Existing techniques that already give answers to these questions are GIT and MediaWiki. I would like to include these questions in my graduation work, and work with these software packages.


(in the field of graphic design, i recognize a pattern of 'designing' on a high level. I call this 'high level design' because of the tradition to formulate design questions on the level of the interface and the interpretation of the reader. This is different in the field of computed type design, where fonts are built out of outlines, skeletons or dots; depending on their constructions. Also the field of experimental book design could be regarded as a exception. Though, in the field of web design, working on the level of the interface is very common, but could be avoided by regarding 'design' as designing a 'workflow'. That will lead to a design practise where current general questions could be included as design questions. Design questions are then related to your tools (software), social workstructure, principles, etc.)

context

In current reactions on the effects of an optimistic believe in computation, i recognize a pattern of requests for a focus on computation and algorithmic culture as 'synthetic' processes, because there is a convention to recognize anthropomorphic qualities in computers (like 'thinking' or 'intelligence') that obscures their syntactical nature. Also, because of ideological aims of big data to have direct access to information, there is a believe that 'truth' can be gained from data without any mediation, as it is the data that speaks or no humans are every involved. This could be avoided by a collection of alternative perspectives on these techniques, made accessible and legible for people with an interest to regard software as a cultural product. A choice for a focus on specifically software that processes natural language, is both a poetic choice, as also a way to speak directly with the techniques that create information using, excecuting & processing language*; the main communication system for both humans & computers. (Though stating this feels dangerous here: they differ!) Then it would be possible to formulate an opinion or understanding about computational (syntactical) techniques without relying only on the main (and overpowering) perspective.


  • Critical thinking about computers is not possible without an informed understanding. (...) Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation. (...) It's upon critics to reflect on the contraints that computer control languages write into culture. — from: Language, by Florian Cramer; published in Software Studies, edited by Matthew Fuller (2008)
  • in many cases these debates [about artificial intelligence] may be missing the real point of what it means to live and think with forms of synthetic intelligence very different from our own. (...) To regard A.I. as inhuman and machinic should enable a more reality-based understanding of ourselves, our situation, and a fuller and more complex understanding of what 'intelligence' is and is not. — from Benjamin Bratton, Outing A.I. Beyond the Turing Test (2015)
  • Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation — from: Antoinette Roivoy; during her lecture as part of the panel 'All Watched Over By Algorithms', held during the Transmediale festival (2015)
  • The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality. — from: Colossal Data and Black Futures, lecture by Ramon Amaro; as part of the Impakt festival (2015)

* A computer is a linguistic device. When using a computer, language is used (as interface to control computer systems), executed (as scripts written in (a set of) programming languages) and processed (turning natural language into data).

how?

BabelNet, a encyclopedic dictionary; using i.e. WordNet & Wikipedia; link to Babelnet
  • i'm currently looking for a 'case-study', a project, tool or software package that uses WordNet documented here
  • documenting the investigative element of the project: central collection of material? (wiki / dokuwiki / git + gitweb / ...)
  • first publication focussing on the case-study, and regarding it as a publishing experiment;
  • a publication could later return periodically; the format will not be fixed, could be printed, digital, combination, offline, online; this is the publishing experiment part; the public aimed at is interested to regard software as a cultural product (not only a technological tone is hence needed)

'i-could-have-written-that' will be a publishing experiment that will formulate design questions on an 'workflow' level. The publications will not only question how information can be structured, visualized and/or related to a context through semiotic processes, but rather will question how 'design' can construct information processes, to create (human) readable documents, datasets, bug-reports, and tutorials.

These aims are related to cultural principles present in the field of 'free and open software': take for example the aim for distribution in stead of centralized sources (for example: Wikipedia or Git), the aim of making information available (in the sense that it should not only be available but also legible), and the aim for collaboration (in the sense of learning together). These principles will influence my design choices, for example: to consider an infrastructure that enables collaborative work.

#0 issue

WordNet

  • WordNet case-study
  • starting a series of reading/writing excercises, in continuation of the way of working in the prototype classes and during Relearn.
    • mapping WordNet's structure
    • using WordNet as a writing filter?
    • WordNet as structure for a collection (similar to the way i've used the SUN database)

other elements i'm currently working on

* historical context (of code-systems/...?) (list document)
* automatic reading machines; from encoding-decoding to constructed-truths (video/slideshow?)
 * call for a syntactic view; Florian Cramer & Benjamin Bratton (text)
* Antoinette Rouvroy; All Watched Over by Algorithms, Transmediale 2015 (transcription)
 * anthropomorphic qualities of a computer ('poster' visualizations?)
 * the photographic apparatus → the data apparatus (voice-over?) 
 * WordNet case-studies

→ i document these elements on this page

→ little glossary of topics


Relation to previous practice

In the last year, i've been looking at different tools that process natural language. From speech-to-text software to text-mining tools, they all systemize language in various ways.

training common sense, work track at Relearn 2015

As a continutation of that i took part at the Relearn summerschool in Brussels last August (2015), to propose a work track in collaboration with Femke Snelting on the subject of 'training common sense'. With a group of people we have been trying to deconstruct the 'truth-construction' in algorithmic cultures, by looking at data mining processes, deconstructing the mathematical models that are used, finding moments where semantics are mixed with mathematics, and trying to grasp what kind of cultural context is created around this field. We worked with a text-mining software package called 'Pattern'. The workshop during Relearn transformed into a project that we called '#!Pattern+, and will be strongly collaborative and ongoing over a longer time span. #!Pattern+ will be a critical fork of the latest version of Pattern, including reflections and notes on the software and the culture it is surrounded within. The README file that has been written for #!PATTERN+ is online here, and more information is collected on this wiki page.

i will tell you everything (my truth is a constructed truth") catalog of "Encyclopedia of Media Object" in V2, June 2015

Another entrance to understanding what happens in algorithmic practises such as machine learning, is by looking at training sets that are used to train algorithms to recognize certain patterns in a set of data. These training sets could contain a large set of images, texts, 3d models, or video's. By looking at such datasets, and more specifically at the choices that have been made in terms of structure and hierarchy, steps of the construction a certain 'truth' are revealed. For the exhibition "Encyclopedia of Media Object" in V2 last June, i created a catalog, voice over and booklet, which placed the objects from the exhibition within the framework of the SUN database, a resource of images for image recognition purposes. (link to the "i-will-tell-you-everything (my truth is a constructed truth" interface)

There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called 'knowledge bases'. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented here and here on the Piet Zwart wiki.

Thesis intention

I would like to integrate my thesis in my graduation project, to let it be the content of the publication(s). This could take multiple forms, for example:

  • interview with creators of datasets or lexicons like WordNet
  • close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software Pattern (Relearn), or Wecka 3.0; or WordNet, ConceptNet, OpenCyc

references

publishing experiments

current or former (related) magazines :

other publishing platforms :

datasets

* WordNet (Princeton)
* ConceptNet 5 (MIT Media)
* OpenCyc

people

algorithmic culture

Antoinette Roivoy
Seda Gurses 
Ramon Amaro 

computational culture

Florian Cramer
Benjamin Bratton

other

Software Studies. A lexicon. by Matthew Fuller (2008)

notes and related projects

BAK lecture: Matthew Fuller, on the discourse of the powerpoint (Jun. 2015) - annotations

project: Wordnet

project: i will tell you everything (my truth is a constructed truth)

project: serving simulations