User:Manetta/graduation-proposals/proposal-1.0: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
<div style="width:100%;max-width:900px;">
<div style="width:100%;max-width:900px;">
=<span style="color:black;">graduation proposal +1.0</span>=
__NOTOC__
 
== title: "i could have written that" ==  
== title: "i could have written that" ==  
[[File:I-could-have-written-that-webpage.png|thumbnail|right|200px|[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ i could have written that (webpage / blog), a webpage to start collecting related material. working with git, make, pandoc, markdown, and experimental microdata] [[User:Manetta/i-could-have-written-that/filesystem-interface-related-material|(page about workflow)]]]]
[[File:I-could-have-written-that-webpage.png|thumbnail|right|200px|[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ i could have written that (webpage / blog), a webpage to start collecting related material. working with git, make, pandoc, markdown, and experimental microdata] [[User:Manetta/i-could-have-written-that/filesystem-interface-related-material|(page about workflow)]]]]
Line 9: Line 8:


===abstract===
===abstract===
'i-could-have-written-that' will be a publishing project around technologies that process natural language (NLP). The project will put ''tools'' and ''techniques'' central, which will mainly be computer software. By regarding NLP software as cultural objects, i would like to look at the inner workings of their technologies: how do they systemize our natural language? This could create a space for alternative perspectives on software culture, as opposed to an attitude of accepting that software 'just works'.  
'i-could-have-written-that' will be a publishing series around technologies that process natural language (NLP). The project will put ''tools'' and ''techniques'' central, in order to reflect on computer processes that 'read' natural language. By regarding NLP software as cultural objects, i would like to look at the inner workings of their technologies: how do they systemize our natural language?  
 
The first issue (#0) will speak about the lexical dataset 'WordNet', a main resource in NLP projects & research.
 
A choice for a focus on specifically software that processes language, is both a poetic choice, as also a way to speak directly with the techniques that ''use'', ''excecute'' & ''process'' language*.


This approach will hopefully lead to conversations that are not limited to technological views, but also reflect on cultural & political implications.  
<small>* The computer-interface is a linguistic device. When using a computer, language is ''used'' (as interface to control computer systems), ''executed'' (as scripts written in (a set of) programming languages) and ''processed'' (turning natural language into data).</small><br>


==context==
==context==
===importance?===
[[File:World-Well-Being-Project-wordclouds.gif|thumbnail|right|200px|wordclouds visualizing text-mining research results; from [http://www.wwbp.org/ the World Well Being Project] ]]
[[File:World-Well-Being-Project-wordclouds.gif|thumbnail|right|200px|wordclouds visualizing text-mining research results; from [http://www.wwbp.org/ the World Well Being Project] ]]
[[File:Antropomorphic-reading-terms.gif|thumbnail|right|200px|antropomorphism used as a tool to relate to computer processes]]
[[File:Meta-metaphors_data-mining.gif|thumbnail|right|200px|data-mining (part of [[User:Manetta/metaphors-at-the-internet#metaphors_at_the_internet|meta-metaphors]], a collaborative project with Julie; listing the metaphors for the Internet]]
[[File:Antropomorphic-reading-terms.gif|thumbnail|right|200px|anthropomorphism used as a tool to relate to computer processes]]
 
Today we read and produce a big amount of written material, while being accompanied by computers and the Internet. Written language appears in the form of emails, webpages, interfaces, and m.m. more. Because there is such a high quantity of this material produced and published 'publically' online, this material could be used as 'data': mathematically processable simplified text. The field of ''natural language processes'' (NLP) and more specifically ''text-mining'' comes in here: it tries to 'read' meaning in the data. How they exactly do that is mostly hidden, and sometimes complex. It is difficult to relate to their processes, and to see how they 'read' information out of meaningless data. Not to mention the complexity to be aware of or disagree with them. How do we relate to these information processes? In this part of NLP systems —relating to their processes as a reader— i recognize two patterns:
 
* Because of big-data's ideological aims to be in direct contact with information, there is a belief that 'truth' can be gained from data without any mediation: as ''it is the data that speaks''<ref name="data">Using twitter to predict heart disease, Lyle Ungar at TEDxPenn; https://www.youtube.com/watch?v=FjibavNwOUI (June 2015)</ref> or ''no humans are ever involved''<ref name="humans">Introducing the automatic Flickr tagging bot, on the Flickr Forums; https://www.flickr.com/help/forum/en-us/72157652019487118/ (May 2015)</ref>. The format of the 'wordcloud' is regarly used as a way to visualize such 'truths' and is a common way to relate to the practise of text-mining.
 
''Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation'' (Antoinette Rouvroy, 2015)<ref name="antoinette">''Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation'' — Antoinette Rouvroy; during her lecture as part of the panel 'All Watched Over By Algorithms', held during the Transmediale festival (2015); https://www.youtube.com/watch?v=4nPJbC1cPTE</ref> Data is seen as a natural resource, where only has to be 'mined' for, like in the act of mining for gold. These assumptions are ideological because such aims are invisible, and part of a 'control-system' that presents data as meaningful information.


===importance?===
* on the other hand i recognize a habit to apply anthropomorphic qualities to software (like 'thinking' or 'machine learning'), which can easily obscure and mislead their syntactical nature. Benjamin Bratton calls for a habit to regard computers as inhuman and machinic: ''in many cases these debates may be missing the real point of what it means to live and think with forms of synthactic intelligence very different from our own.'' (Benjamin Bratton, 2015)<ref name="benjamin">''in many cases these debates may be missing the real point of what it means to live and think with forms of synthactic intelligence very different from our own.'' (...) ''To regard A.I. as inhuman and machinic should enable a more reality-based understanding of ourselves, our situation, and a fuller and more complex understanding of what 'intelligence' is and is not.'' — Benjamin Bratton, Outing A.I. Beyond the Turing Test (2015)</ref>
In current reactions on the effects of an optimistic believe in software, i recognize a gap:


* on the one hand there is an objective attitude, that regards data-mining results as information or even 'knowledge'; Because of big-data's ideological aims to be in direct contact with information, there is a believe that 'truth' can be gained from data without any mediation: as ''it is the data that speaks<ref name="data"></ref>'' or ''no humans are every involved''<ref name="humans"></ref>.
The field of NLP is a field of designers: designing meaningful information out of meaningless data. These two patterns are functioning as their typography: they both create a readable format that makes it possible for a reader to relate with the information. Information that is regarded as complex and therefore rather hidden away. I would like to offer an alternative entrance to these information design processes, by starting to publish a collection of alternative reflections on these techniques, made accessible and legible for people with an interest to regard software as a cultural product.
* and on the other hand i recognize a habit to apply antropomorphic qualities to software (like 'thinking' or 'machine learning'), which can easily obscure and mislead their syntactical nature.  


This could be avoided by a collection of alternative perspectives on these techniques, made accessible and legible for people with an interest to regard software as a cultural product. A choice for a focus on specifically software that processes natural language, is both a poetic choice, as also a way to speak directly with the techniques that create information ''using'', ''excecuting'' & ''processing'' language*. Then it would hopefully be possible to formulate an opinion or understanding about computational (syntactical) techniques without relying only on the main (and overpowering) perspective.**
Then it could hopefully be a trigger to formulate an opinion or understanding about computational (syntactical) techniques without relying only on the main (and overpowering) perspective.
--------------------------------------------------------
--------------------------------------------------------
* ''Critical thinking about computers is not possible without an informed understanding.'' (...) ''Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation.'' (...) ''It's upon critics to reflect on the contraints that computer control languages write into culture.'' — from: Language, by Florian Cramer; published in Software Studies, edited by Matthew Fuller (2008)
* ''Critical thinking about computers is not possible without an informed understanding.'' (...) ''Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation.'' (...) ''It's upon critics to reflect on the contraints that computer control languages write into culture.'' (Florian Cramer, 2008)<ref name="florian">''Critical thinking about computers is not possible without an informed understanding.'' (...) ''Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation.'' (...) ''It's upon critics to reflect on the contraints that computer control languages write into culture.'' — Florian Cramer, Langage (essay); published in Software Studies, edited by Matthew Fuller (2008)</ref>
* ''in many cases these debates [about artificial intelligence] may be missing the real point of what it means to live and think with forms of synthactic intelligence very different from our own.'' (...) ''To regard A.I. as inhuman and machinic should enable a more reality-based understanding of ourselves, our situation, and a fuller and more complex understanding of what 'intelligence' is and is not.'' — from Benjamin Bratton, Outing A.I. Beyond the Turing Test (2015)
* ''The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality.'' (Ramon Amaro, 2015)<ref name="ramon">''The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality.'' — Ramon Amaro, Colossal Data and Black Futures (lecture); as part of the Impakt festival (2015)</ref>
* ''Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation'' — from: Antoinette Roivoy; during her lecture as part of the panel 'All Watched Over By Algorithms', held during the Transmediale festival (2015)
* ''The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality.'' — from: Colossal Data and Black Futures, lecture by Ramon Amaro; as part of the Impakt festival (2015)


===publishing?===
===publishing?===
[[File:Sentiment-analyses_wwbp.png|thumbnail|right|200px|typography analysis as sentiment analysis; from [http://www.wwbp.org/ the World Well Being Project]]]
[[File:Sentiment-analyses_wwbp.png|thumbnail|right|200px|typography analysis as sentiment analysis; from [http://www.wwbp.org/ the World Well Being Project]]]
In the field of graphic design, i recognize a pattern of 'designing' on a high level. I call this 'high level design' because of the tradition to formulate design questions on the level of the interface and the interpretation of the reader. This is different in the field of computed type design, where fonts are built out of outlines, skeletons or dots; depending on their constructions. Also the field of experimental book design could be regarded as a exception. Though, in the field of web design, working on the level of the interface is very common, but could be avoided by regarding 'design' as designing a 'workflow'. That will lead to a design practise where questions regarding 'new media' and 'computation' could be considered as design questions. Design questions are then related to the use of tools (software), social workstructure, conservation, and principles regarding sharing and commercializing.**
As coming from a background in graphic design i got educated in typography and visual language, combined with courses in editorial design. During this bachelor, i became interested in semiotics, and in systems that use symbols/icons/indexes to construct meaning.
 
My interest focused specifically on language, our primary communication system. Language does not only mediate the contact between human to human, but also the relation from a human to objects and other entities in the world around us. It is a tool to understand, categorize and give meaning to the world. This is an obvious process that happens to everyone, but the semiotic writings of Saussure and Pierce made such invisible systems a topic of research: they introduced me to systems that describe such mediating processes. I became enthusiastic to learn about signs that are signifying certain meaning within culture, as the color 'gold' can signify richness, or the color 'red' can either stand for aggresion or socialism. Fascinating systems, useful as analysis tools, but described already in the 60s and mainly applied in the field of advertisement. It was therefore difficult to implement in my practise.  


During my bachelor in graphic design, i got educated in a quite traditional way: a focus on typography and visual language, combined with courses in editorial design. I became interested in semiotics, and in systems that use symbols/icons/indexes to construct meaning.  
During my first year at the Piet Zwart, i looked at software that processes human language. I was curious to see how software is designed to understand us, our human language system. These software packages contain dictionaries, datasets, wordlists, scripts (and much more), which suddenly materialized the meaning-making-process. Those wordlist and the structure of such datasets are reflecting aims to understand the human world, (while being written by humans themselves).  


After my first year at the Piet Zwart, i feel that my interest shifts from designing information on an interface level, to '''designing information processes'''. Being fascinated by looking at inner workings of techniques and being affected by the 'free and open software' principles, bring up a whole set of new design questions. For example:  
Now i feel that 'designing' shifts from designing information on an interface level, to '''designing information processes'''. To 'design' for me now includes working with programming languages & software; including the design of information processes, and making them workable (workflow, collaborative worksituations) & readable (typography). These elements bring new questions on the table, which i would like to include as design questions: eg. regarding software culture and their syntactical nature:


  * How can an interface ''reveal'' its inner system?  
  * How can an interface ''reveal'' its inner system?  
Line 47: Line 56:
   When it is readable for the user or for the computer?  
   When it is readable for the user or for the computer?  


Existing techniques that already give answers to these questions are GIT and MediaWiki. I would like to include these questions in my graduation work, and work with these software packages.
==framework==
 
Each publication takes one NLP technique as a study object.
 
===for who?===
[[User:Manetta/i-could-have-written-that/information-processing-design | &rarr; (bit) more about this shift in looking at design is on this page]]
people with an interest:
 
* to regard software as a cultural product
<small>* A computer is a linguistic device. When using a computer, language is ''used'' (as interface to control computer systems), ''executed'' (as scripts written in (a set of) programming languages) and ''processed'' (turning natural language into data).</small><br>
* in the inner workings of NLP techniques
<small>** derived from the following 'argument template': In the history of (...) i recognize a pattern of (...). This could be avoided by (...) because (...). Then it would be possible to (...).</small>
* to bridge typography and information processes (designers?)
 
==publishing framework==
===framework===
====for who?====
* people with an interest to regard software as a cultural product
====with who?====
====my position?====
====how often?====


===#0 issue===
===#0 issue===
* WordNet case-study
* WordNet case-study
* ontologies / taxonomies / vocabularies like RDF, OWL, OpenCyc; (aiming at a semantic web & linked data)
* ontologies / taxonomies / vocabularies like RDF, OWL, OpenCyc; (aiming at a semantic web & linked data)
* historical categorization work (Leibniz statistic's, Roget's thesaurus)
* historical categorization work (Leibniz statistics, Roget's thesaurus)


For the #0-issue i would like to look at 'WordNet'. WordNet is a lexical dataset and a primary resource in the field of Knowlegde Discovery in Data processes (also known as the field of data-mining and big-data). WordNet is built with word-'synsets' (where a word could have multiple entries according to multiple meanings), which are related to eachother by various relations, like: word-type, categorie or synonyms. This dataset has been developed since 1985, and is basically a norm in the field, used during training processes of data-mining algorithms. Although the focus on word-synsets is an attempt to create a nuanced model of a human language, the dataset is still a model, and will always be 'imperfect'.  
For the #0-issue i would like to look at 'WordNet'. WordNet is a lexical dataset and a primary resource in the field of Knowlegde Discovery in Data processes (also known as the field of data-mining and big-data). WordNet is built with word-'synsets' (where a word could have multiple entries according to multiple meanings), which are related to eachother by various relations, like: word-type, categorie or synonyms. This dataset has been developed since 1985, and is basically a norm in the field, used during training processes of data-mining algorithms. Although the focus on word-synsets is an attempt to create a nuanced model of a human language, the dataset is still a model, and will always be 'imperfect'.  


Today, as written language is regarded as 'data', data-mining techniques 'read' written text to return certain 'information' or also called 'knowledge'. It's a constructive truth instead, and datasets as WordNet are forming the basis of such truths.
As written language is regarded as 'data' today, data-mining techniques 'read' written text to return 'information'. It's a constructive truth instead, and datasets as WordNet are functioning as the 'norm' of such truths.
 
By looking at WordNet and a related case-study, i hope to reflect on methods in which meaningless data is transformed into semantic data, according to WordNet's norms. How can this transformation be a design tool? And where would we like to apply it onto? And would it be possible to reflect on the question why there is still such an aim to built universal systems?


[[User:Manetta/i-could-have-written-that | &rarr; other elements i have been working on are documented on this page]]
[[User:Manetta/i-could-have-written-that | &rarr; other elements i have been working on are documented on this page]]
Line 76: Line 79:
[[User:Manetta/i-could-have-written-that/little-glossary | &rarr; little glossary of topics]]
[[User:Manetta/i-could-have-written-that/little-glossary | &rarr; little glossary of topics]]


===planning for practical steps===
===practical steps===
* selecting a WordNet case study (to speak about a specific context)
* selecting a WordNet case study (to speak about a specific context)
* set the specific topic of #0; (approach towards WordNet)
* set the specific topic of #0; (approach towards WordNet)
Line 94: Line 97:
* interview with creators of datasets or lexicons like WordNet
* interview with creators of datasets or lexicons like WordNet
* close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software ''Pattern'' (Relearn), or ''Weka 3.0''; or WordNet, ConceptNet, OpenCyc
* close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software ''Pattern'' (Relearn), or ''Weka 3.0''; or WordNet, ConceptNet, OpenCyc


== Relation to previous practice ==
== Relation to previous practice ==
In the last year, i've been looking at different '''tools''' that process natural language. From ''speech-to-text software'' to ''text-mining tools'', they all '''systemize language''' in various ways.
In the last year, i've been looking at different '''tools''' that process natural language. From ''speech-to-text software'' to ''text-mining tools'', they all '''systemize language''' in various ways.


Line 108: Line 113:
There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called ''''knowledge bases''''. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented [[User:Manetta/wordnet/|here]] and [[User:Manetta/semantic-systems/knowledge-bases/wordnet| here]] on the Piet Zwart wiki.
There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called ''''knowledge bases''''. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented [[User:Manetta/wordnet/|here]] and [[User:Manetta/semantic-systems/knowledge-bases/wordnet| here]] on the Piet Zwart wiki.


<ref name="data">Using twitter to predict heart disease, Lyle Ungar at TEDxPenn; https://www.youtube.com/watch?v=FjibavNwOUI (June 2015)</ref>
== references ==
<ref name="humans">Introducing the automatic Flickr tagging bot, on the Flickr Forums; https://www.flickr.com/help/forum/en-us/72157652019487118/ (May 2015)</ref>
<references />
<references />
===publishing experiments===
* [http://networkcultures.org/digitalpublishing/ digital publishing toolkit]
* [http://activearchives.org/aaa/ Active Archives VideoWiki - Constant's VideoWiki]
* [http://post-inter.net/ art post-internet (2014), a PDF + webpage catalogue]<br>
* [https://mcluhan.consortium.io/ Hybrid Lecture Player (to be viewed in Chrome/Chromium)]<br>
===current or former (related) magazines : ===
* [https://s3-us-west-2.amazonaws.com/visiblelanguage/pdf/V1N1_1967_E.pdf The Journal of Typographic Research (1967-1971)] (now: [http://visiblelanguagejournal.com/about Visible Language])
* [http://www.radicalsoftware.org/e/index.html Radical Software (1970-1974, NY)]
* [http://www.dot-dot-dot.us/ Dot Dot Dot (2000-2011, USA)]
* [http://www.servinglibrary.org/ the Serving Library (2011-ongoing, USA)]
===other publishing platforms :===
* [http://monoskop.org/Monoskop Monoskop]
* mailinglist interface: lurk.org
* mailinglist interface: nettime --> discussions in public
* [http://p-dpa.net/ archive of publications closely related to technology: P-DPA (Silvio Larusso)]
===datasets===
[http://wordnet.princeton.edu/  * WordNet (Princeton)]<br>
[http://conceptnet5.media.mit.edu/ * ConceptNet 5 (MIT Media)]<br>
[http://www.opencyc.org/ * OpenCyc]
==== texts & books & talks ====
* All Watched Over By Algorithms, lectures by ie. Antoinette Rouvroy (Jan. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/transmediale_all-watched-over-by-algorithms_2015.html &rarr; annotations]
* Software Studies. A lexicon. by Matthew Fuller (2008)
: * Language, Florian Cramer; [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/florian-cramer_language.html &rarr; annotations]
: * Algorithm, Andrew Goffey;
* Code, the hidden languages and inner structures of computer hardware and software --- Charles Petzold (2000) [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/charles-petzold_code.html &rarr; annotations]
* Outing A.I. - Beyond the Turing test, Benjamin Bratton (Feb. 2015) [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/benjamin-bratton-outing-a.i._beyond-the-turing-test.html &rarr; annotations]
* Towards a Philosophy of Photography, Vilem Flusser (1983); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/vilem-flusser_towards-a-philosophy-of-photography.html &rarr; annotations]
* The Information, a History, a Theory, a Flood - by James Gleick (2008); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/gleick-the-information.html &rarr; annotations]
* Colossal Data and Black Futures, lecture by Ramon Amaro (Okt. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/ramon-amaro_colossal-data-and-black-futures.html &rarr; annotations]
* the physics of data, lecture by Marissa Meyer (2009); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/marissa-meyer_the-physics-of-data.html &rarr; annotations]
===people===
algorithmic culture
Antoinette Roivoy
Andrew Goffey
Ramon Amaro
computational culture
Florian Cramer
Benjamin Bratton
</div>
</div>

Latest revision as of 14:13, 22 November 2015

title: "i could have written that"

Introduction

For in those realms machines are made to behave in wondrous ways, often sufficient to dazzle even the most experienced observer. But once a particular program is unmasked, once its inner workings are explained in language sufficiently plain to induice understanding, its magic crumbles away; it stands revealed as a mere collection of procedures, each quite comprehensible. The observer says to himself "I could have written that". With that thought he moves the program in question from the shelf marked "intelligent" to that reserved for curious, fit to be discussed only with people less enlightened than he. (Joseph Weizenbaum, 1966)

abstract

'i-could-have-written-that' will be a publishing series around technologies that process natural language (NLP). The project will put tools and techniques central, in order to reflect on computer processes that 'read' natural language. By regarding NLP software as cultural objects, i would like to look at the inner workings of their technologies: how do they systemize our natural language?

The first issue (#0) will speak about the lexical dataset 'WordNet', a main resource in NLP projects & research.

A choice for a focus on specifically software that processes language, is both a poetic choice, as also a way to speak directly with the techniques that use, excecute & process language*.

* The computer-interface is a linguistic device. When using a computer, language is used (as interface to control computer systems), executed (as scripts written in (a set of) programming languages) and processed (turning natural language into data).

context

importance?

wordclouds visualizing text-mining research results; from the World Well Being Project
data-mining (part of meta-metaphors, a collaborative project with Julie; listing the metaphors for the Internet
anthropomorphism used as a tool to relate to computer processes

Today we read and produce a big amount of written material, while being accompanied by computers and the Internet. Written language appears in the form of emails, webpages, interfaces, and m.m. more. Because there is such a high quantity of this material produced and published 'publically' online, this material could be used as 'data': mathematically processable simplified text. The field of natural language processes (NLP) and more specifically text-mining comes in here: it tries to 'read' meaning in the data. How they exactly do that is mostly hidden, and sometimes complex. It is difficult to relate to their processes, and to see how they 'read' information out of meaningless data. Not to mention the complexity to be aware of or disagree with them. How do we relate to these information processes? In this part of NLP systems —relating to their processes as a reader— i recognize two patterns:

  • Because of big-data's ideological aims to be in direct contact with information, there is a belief that 'truth' can be gained from data without any mediation: as it is the data that speaks[1] or no humans are ever involved[2]. The format of the 'wordcloud' is regarly used as a way to visualize such 'truths' and is a common way to relate to the practise of text-mining.

Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation (Antoinette Rouvroy, 2015)[3] Data is seen as a natural resource, where only has to be 'mined' for, like in the act of mining for gold. These assumptions are ideological because such aims are invisible, and part of a 'control-system' that presents data as meaningful information.

  • on the other hand i recognize a habit to apply anthropomorphic qualities to software (like 'thinking' or 'machine learning'), which can easily obscure and mislead their syntactical nature. Benjamin Bratton calls for a habit to regard computers as inhuman and machinic: in many cases these debates may be missing the real point of what it means to live and think with forms of synthactic intelligence very different from our own. (Benjamin Bratton, 2015)[4]

The field of NLP is a field of designers: designing meaningful information out of meaningless data. These two patterns are functioning as their typography: they both create a readable format that makes it possible for a reader to relate with the information. Information that is regarded as complex and therefore rather hidden away. I would like to offer an alternative entrance to these information design processes, by starting to publish a collection of alternative reflections on these techniques, made accessible and legible for people with an interest to regard software as a cultural product.

Then it could hopefully be a trigger to formulate an opinion or understanding about computational (syntactical) techniques without relying only on the main (and overpowering) perspective.


  • Critical thinking about computers is not possible without an informed understanding. (...) Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation. (...) It's upon critics to reflect on the contraints that computer control languages write into culture. (Florian Cramer, 2008)[5]
  • The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality. (Ramon Amaro, 2015)[6]

publishing?

typography analysis as sentiment analysis; from the World Well Being Project

As coming from a background in graphic design i got educated in typography and visual language, combined with courses in editorial design. During this bachelor, i became interested in semiotics, and in systems that use symbols/icons/indexes to construct meaning.

My interest focused specifically on language, our primary communication system. Language does not only mediate the contact between human to human, but also the relation from a human to objects and other entities in the world around us. It is a tool to understand, categorize and give meaning to the world. This is an obvious process that happens to everyone, but the semiotic writings of Saussure and Pierce made such invisible systems a topic of research: they introduced me to systems that describe such mediating processes. I became enthusiastic to learn about signs that are signifying certain meaning within culture, as the color 'gold' can signify richness, or the color 'red' can either stand for aggresion or socialism. Fascinating systems, useful as analysis tools, but described already in the 60s and mainly applied in the field of advertisement. It was therefore difficult to implement in my practise.

During my first year at the Piet Zwart, i looked at software that processes human language. I was curious to see how software is designed to understand us, our human language system. These software packages contain dictionaries, datasets, wordlists, scripts (and much more), which suddenly materialized the meaning-making-process. Those wordlist and the structure of such datasets are reflecting aims to understand the human world, (while being written by humans themselves).

Now i feel that 'designing' shifts from designing information on an interface level, to designing information processes. To 'design' for me now includes working with programming languages & software; including the design of information processes, and making them workable (workflow, collaborative worksituations) & readable (typography). These elements bring new questions on the table, which i would like to include as design questions: eg. regarding software culture and their syntactical nature:

* How can an interface reveal its inner system? 
* And how could a workflow effect the information it is processing? 
* What to do with being dependent on an online services like WordPress, to publish? 
* Should we reconsider workflows that only can happen online? 
* How can documents not only be published but also conserved as readible documents?
  (as opposed to database entries that are stored in binary?) 
* When could a document be called 'a document'? 
  When it is readable for the user or for the computer? 

framework

Each publication takes one NLP technique as a study object.

for who?

people with an interest:

  • to regard software as a cultural product
  • in the inner workings of NLP techniques
  • to bridge typography and information processes (designers?)

#0 issue

  • WordNet case-study
  • ontologies / taxonomies / vocabularies like RDF, OWL, OpenCyc; (aiming at a semantic web & linked data)
  • historical categorization work (Leibniz statistics, Roget's thesaurus)

For the #0-issue i would like to look at 'WordNet'. WordNet is a lexical dataset and a primary resource in the field of Knowlegde Discovery in Data processes (also known as the field of data-mining and big-data). WordNet is built with word-'synsets' (where a word could have multiple entries according to multiple meanings), which are related to eachother by various relations, like: word-type, categorie or synonyms. This dataset has been developed since 1985, and is basically a norm in the field, used during training processes of data-mining algorithms. Although the focus on word-synsets is an attempt to create a nuanced model of a human language, the dataset is still a model, and will always be 'imperfect'.

As written language is regarded as 'data' today, data-mining techniques 'read' written text to return 'information'. It's a constructive truth instead, and datasets as WordNet are functioning as the 'norm' of such truths.

By looking at WordNet and a related case-study, i hope to reflect on methods in which meaningless data is transformed into semantic data, according to WordNet's norms. How can this transformation be a design tool? And where would we like to apply it onto? And would it be possible to reflect on the question why there is still such an aim to built universal systems?

→ other elements i have been working on are documented on this page

→ little glossary of topics

practical steps

  • selecting a WordNet case study (to speak about a specific context)
  • set the specific topic of #0; (approach towards WordNet)
  • collecting related material
  • ask around / contact specific researchers or writers that are working in the field
  • prototype publishing formats (related to the structure of WordNet)
  • mapping WordNet's structure
  • using WordNet as a writing filter?
  • WordNet as structure for a collection (similar to the way i've used the SUN database)
  • start a collective platform (wiki?) to document the publications & related material

Thesis intention

I would like to integrate my thesis in my graduation project, to let my research become my thesis, and as well the content of the publication(s). This could take multiple forms, for example:

  • interview with creators of datasets or lexicons like WordNet
  • close reading of a piece of software, like we did during the workshop at Relearn. Options could be: text-mining software Pattern (Relearn), or Weka 3.0; or WordNet, ConceptNet, OpenCyc



Relation to previous practice

In the last year, i've been looking at different tools that process natural language. From speech-to-text software to text-mining tools, they all systemize language in various ways.

training common sense, work track at Relearn 2015

As a continutation of that i took part at the Relearn summerschool in Brussels last August (2015), to propose a work track in collaboration with Femke Snelting on the field of text-mining. With a group of people we have been trying to deconstruct the 'truth-construction' in a text-mining software package called 'Pattern'. We deconstructed the mathematical models that are used, finding moments where semantics are mixed with mathematics, and trying to grasp what kind of cultural context is created around this field. The workshop during Relearn transformed into a project that we called '#!Pattern+, which will be a critical fork of the latest version of Pattern, including reflections and notes on the software and the culture it is surrounded within. The README file that has been written for #!PATTERN+ is online here, and more information is collected on this wiki page.

i will tell you everything (my truth is a constructed truth") catalog of "Encyclopedia of Media Object" in V2, June 2015

Another entrance to understanding what happens in algorithmic practises such as machine learning, is by looking at training sets that are used to train algorithms to recognize certain patterns in a set of data. These training sets could contain a large set of images, texts, 3d models, or video's. By looking at such datasets, and more specifically at the choices that have been made in terms of structure and hierarchy, steps of the construction a certain 'truth' are revealed. For the exhibition "Encyclopedia of Media Object" in V2 last June, i created a catalog, voice over and booklet, which placed the objects from the exhibition within the framework of the SUN database, a resource of images for image recognition purposes. (link to the "i-will-tell-you-everything (my truth is a constructed truth" interface)

There are a few datasets in the academic world that seem to be basic resources to built these training sets upon. In the field they are called 'knowledge bases'. They live on a more abstract level then the training sets do, as they try to create a 'knowlegde system' that could function as a universal structure. Examples are WordNet (a lexical dataset), ConceptNet, and OpenCyc (an ontology dataset). In the last months i've been looking into WordNet, worked on a WordNet Tour (still ongoing), and made an alternative browser interface (with cgi) for WordNet. It's all a process that is not yet transformed in an object/product, but untill now documented here and here on the Piet Zwart wiki.

references

  1. Using twitter to predict heart disease, Lyle Ungar at TEDxPenn; https://www.youtube.com/watch?v=FjibavNwOUI (June 2015)
  2. Introducing the automatic Flickr tagging bot, on the Flickr Forums; https://www.flickr.com/help/forum/en-us/72157652019487118/ (May 2015)
  3. Raw-data is like nature, it is the idea that nature will speak by itself. It is the idea that thanks to big data, the world speaks by itself without any transcription, symbolization, institutional mediation, political mediation or legal mediation — Antoinette Rouvroy; during her lecture as part of the panel 'All Watched Over By Algorithms', held during the Transmediale festival (2015); https://www.youtube.com/watch?v=4nPJbC1cPTE
  4. in many cases these debates may be missing the real point of what it means to live and think with forms of synthactic intelligence very different from our own. (...) To regard A.I. as inhuman and machinic should enable a more reality-based understanding of ourselves, our situation, and a fuller and more complex understanding of what 'intelligence' is and is not. — Benjamin Bratton, Outing A.I. Beyond the Turing Test (2015)
  5. Critical thinking about computers is not possible without an informed understanding. (...) Software as a whole is not only 'code' but a symbolic form involving cultural practices of its employment and appropriation. (...) It's upon critics to reflect on the contraints that computer control languages write into culture. — Florian Cramer, Langage (essay); published in Software Studies, edited by Matthew Fuller (2008)
  6. The issue we are dealing now with in machine learning and big data is that a heuristic approach is being replaced by a certainty of mathematics. And underneath those logics of mathematics, we know there is no certainty, there is no causality. Correlation doesn't mean causality. — Ramon Amaro, Colossal Data and Black Futures (lecture); as part of the Impakt festival (2015)

publishing experiments

current or former (related) magazines :

other publishing platforms :

datasets

* WordNet (Princeton)
* ConceptNet 5 (MIT Media)
* OpenCyc

texts & books & talks

  • All Watched Over By Algorithms, lectures by ie. Antoinette Rouvroy (Jan. 2015); → annotations
  • Software Studies. A lexicon. by Matthew Fuller (2008)
* Language, Florian Cramer; → annotations
* Algorithm, Andrew Goffey;
  • Code, the hidden languages and inner structures of computer hardware and software --- Charles Petzold (2000) → annotations
  • Outing A.I. - Beyond the Turing test, Benjamin Bratton (Feb. 2015) → annotations
  • Towards a Philosophy of Photography, Vilem Flusser (1983); → annotations
  • The Information, a History, a Theory, a Flood - by James Gleick (2008); → annotations
  • Colossal Data and Black Futures, lecture by Ramon Amaro (Okt. 2015); → annotations
  • the physics of data, lecture by Marissa Meyer (2009); → annotations

people

algorithmic culture

Antoinette Roivoy
Andrew Goffey
Ramon Amaro 

computational culture

Florian Cramer
Benjamin Bratton