Latest revision as of 16:09, 30 April 2016

outline - i could have written that

intro

text analytics < > systemization of language

This text originates from an interest in the systemization of language that is needed for computer software to be able to 'understand' and process written language.

vocabulary

buzzwords (machine learning, big data, data mining) (ref to Florian Cramer)
metaphor (too much ???)
one of the five KDD steps

problematic situation

...

Text mining seems to be a rather brutal way to deal with the aim to process natural language into useful information. To reflect on this brutality, tracing back a longer tradition of natural language processing could be usefull. Hopefully this will be a way to create some distance to the hurricanes of data that are mainly known as 'big', 'raw' or 'mined' these days.

audience

This thesis will aim for an audience that is interested in an alternative perspective on buzzwords like 'big data' and 'data-mining'. Also, this thesis will (hopefully!) offer a view from a computer-vision side: how software is written to understand the non-computer world of written text.

hypothesis

The results of data-mining software are not mined, results are constructed.

chapter 1: on what basis? three settings to highlight differences in text analytical ideologies

setting 1: PhD candidate's thesis defence, Faculty of Economics, Erasmus University Rotterdam
setting 2: Lyle Unger's TED Talk, World Well Being Project, Faculty of Psychology, University of Pennsylvania
setting 3: Guy de Pauw's introduction on text mining software, CLiPS, Faculty of Arts & Philosophy, Computational Linguistics & Psycholinguistics department, University of Antwerp

chapter 2: deriving information from written text → the material form of language

statistical text analytics is not 'read-only', it's writing
- to extract? → to derive
written language as source material
- analogy to typography, dealing with the optical materiality of words/sentences/text
- text analytics dealing with the quantifiable and structural materiality of words/sentences/text
  - word-counts
  - word-order/structure
what do these material analyses represent?
- key-value format (?)

chapter 3: information extraction / text categorization. diving into the software!

unsupervised
supervised

material

bibliography (five key texts)

Joseph Weizenbaum - Computer Power and Human Reason: From Judgement to Calculation (1976);
Winograd + Flores - Understanding Computers & Cognition (1987);
Vilem Flusser - Towards a Philosophy of Photography (1983); → annotations
Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); → annotations
The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); → abstract

annotations

Alan Turing - Computing Machinery and Intelligence (1936)
The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); → abstract
Ted Nelson - Computer Lib & Dream Machines (1974);
Joseph Weizenbaum - Computer Power and Human Reason (1976); → annotations
Water J. Ong - Orality and Literacy (1982);
Vilem Flusser - Towards a Philosophy of Photography (1983); → annotations
Christiane Fellbaum - WordNet, an Electronic Lexical Database (1998);
Charles Petzold - Code, the hidden languages and inner structures of computer hardware and software (2000); → annotations
John Hopcroft, Rajeev Motwani, Jeffrey Ullman - Introduction to Automata Theory, Languages, and Computation (2001);
James Gleick - The Information, a History, a Theory, a Flood (2008); → annotations
Matthew Fuller - Software Studies. A lexicon (2008);
- Language, Florian Cramer; → annotations
- Algorithm, Andrew Goffey;
Marissa Meyer - the physics of data, lecture (2009); → annotations
Matthew Fuller & Andrew Goffey - Evil Media (2012); → annotations
Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); → annotations
Benjamin Bratton - Outing A.I., Beyond the Turing test (Feb. 2015) → annotations
Ramon Amaro - Colossal Data and Black Futures, lecture (Okt. 2015); → annotations
Benjamin Bratton - On A.I. and Cities : Platform Design, Algorithmic Perception, and Urban Geopolitics (Nov. 2015);

currently working on

* terminology: data 'mining'
* Knowledge Discovery in Data (KDD) in the wild, problem formulations
* KDD, applications
* KDD, workflow
* text-processing: simplification
* list of data mining parties

other

outline-thesis (2) → NLP

thesis in progress (overview)

@@ Line 1: / Line 1: @@
 <div style="width:100%;max-width:800px;">
-=outline=
+=outline - i could have written that=
-With 'i-could-have-written-that' i would like to look at technologies that process natural language (NLP). By regarding NLP software as cultural objects, i'll focus on the inner workings of their technologies: how do they systemize our natural language? For the occassion of graduating this year, i would like to look at data-mining, text-mining and machine learning, the technologies that are used to gain information from large amounts of data by recognizing patterns.
+==intro==
+===text analytics < > systemization of language===
+This text originates from an interest in the systemization of language that is needed for computer software to be able to 'understand' and process written language.
+===vocabulary===
+* buzzwords (machine learning, big data, data mining) (ref to Florian Cramer)
+* metaphor (too much ???)
+* one of the five KDD steps
+===problematic situation===
+...
+Text mining seems to be a rather brutal way to deal with the aim to process natural language into useful information. To reflect on this brutality, tracing back a longer tradition of natural language processing could be usefull. Hopefully this will be a way to create some distance to the hurricanes of data that are mainly known as 'big', 'raw' or 'mined' these days.
-=== intro===
+===audience===
-* NLP, natural language processing
+This thesis will aim for an audience that is interested in an alternative perspective on buzzwords like 'big data' and 'data-mining'. Also, this thesis will (hopefully!) offer a view from a computer-vision side: how software is written to understand the non-computer world of written text.
-* current focus: data-mining field (a data-fashion)
 ==hypothesis==
 The results of data-mining software are not mined, results are constructed. <br>
-What elements do allow for algorithmic agreeability?
-==project==
+==chapter 1: on what basis? three settings to highlight differences in text analytical ideologies==
-<small>voice: accessible for a wider public </small>
+* setting 1: PhD candidate's thesis defence, Faculty of Economics, Erasmus University Rotterdam
+* setting 2: Lyle Unger's TED Talk, World Well Being Project, Faculty of Psychology, University of Pennsylvania
+* setting 3: Guy de Pauw's introduction on text mining software, CLiPS, Faculty of Arts & Philosophy, Computational Linguistics & Psycholinguistics department, University of Antwerp
-===problem formulations:===
+==chapter 2: deriving information from written text &rarr; the material form of language==
-* terminology ('mining', 'data')
+* statistical text analytics is not 'read-only', it's writing
-* text-processing
+** to extract? &rarr; to derive
-** from: able to check results with senses (OCR), to: intuition (data-mining) ''[what are the differences?]''
+* written language as source material
-** parsing, how text is treated: as n-grams, chunks, bag-of-words, characters
+** analogy to typography, dealing with the optical materiality of words/sentences/text
-* use of wordclouds
+** text analytics dealing with the quantifiable and structural materiality of words/sentences/text
-** data as autonomous entity; from: information, to: data science ''[what are the differences?]''
+*** word-counts
+*** word-order/structure
+* what do these material analyses represent?
+** key-value format (?)
-===algorithmic agreeability case study objects (from the wild)===
+==chapter 3: information extraction / text categorization. diving into the software!==
-* terminology & anthropomorphism: data 'mining' [[User:Manetta/i-could-have-written-that/from-mining-minerals-to-mining-data | (wiki-page)]]
+* unsupervised
-* terminology & anthropomorphism: 'machine learning'
+* supervised
-* terminology: 'data'
-* wordclouds
-==thesis==
-<small>voice: more technical? + theoretical</small>
-===theory===
-* solutionism & techno optimism
-===algorithmic agreeability case study objects (field-specific)===
+=material=
-* workflow mining-software (eg. Pattern, Wecka)
-** software workflow diagram
-** the use of mathematical graphs & dimensions
+==bibliography (five key texts)==
+* Joseph Weizenbaum - Computer Power and Human Reason: From Judgement to Calculation (1976);
+* Winograd + Flores - Understanding Computers & Cognition (1987);
+* Vilem Flusser - Towards a Philosophy of Photography (1983); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/vilem-flusser_towards-a-philosophy-of-photography.html &rarr; annotations]
+* Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/transmediale_all-watched-over-by-algorithms_2015.html &rarr; annotations]
+* The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html &rarr; abstract]
+{{#widget:YouTube|id=JFgsdzikVZU}}
-=research material=
+==annotations==
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/ &rarr; filesystem interface, collecting research related material] [[User:Manetta/i-could-have-written-that/filesystem-interface-related-material | (+ about the workflow)]]<br>
-[[User:Manetta/i-could-have-written-that | &rarr; wikipage for 'i-could-have-written-that' (list of prototypes & inquiries)]] <br>
-[[User:Manetta/i-could-have-written-that/little-glossary | &rarr; little glossary]]<br>
-===mining as ideology===
-[[User:Manetta/i-could-have-written-that/from-mining-minerals-to-mining-data | * from mining minerals to mining data]]<br>
-'''anthropomorphism'''
-[[User:Manetta/i-could-have-written-that/anthropomorphic-qualities | * anthropomorphic qualities of a computer (?)]]<br>
-[[User:Manetta/i-could-have-written-that/the-data-apparatus | * the photographic apparatus &rarr; the data apparatus (annotations)]] <br>
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/joseph-s_questions/joseph-s_questions.html * Joseph's (Weizenbaum) questions on Computer Power and Human Reason]<br>
-===text processing===
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/semantic-math-averaging/semantic-math-averaging.html * semantic math: averaging polarity rates in Pattern (text mining software package)]<br>
-[[User:Manetta/i-could-have-written-that/wordclouds | * notes on wordclouds]]<br>
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html * automatic reading machines; from encoding-decoding to constructed-truths]<br>
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/wordnet-skeleton/wordnet-skeleton.html * index of WordNet 3.0 (2006)]<br>
-===data as autonomous entity===
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/knowlegde-driven-by-the-data/knowlegde-driven-by-the-data.html * knowledge driven by data - ''whenever i fire a linguist, the results improve'']<br>
-===other===
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/i-am-sorry-but-these-are-the-words-laughter/i-am-sorry-but-these-are-the-words-laughter.html * (laughter) - ''it's embarrassing but these are the words'']<br>
-[[User:Manetta/i-could-have-written-that/syntactic-view | * call for a syntactic view; Florian Cramer & Benjamin Bratton (text)]] <br>
-[[User:Manetta/i-could-have-written-that/sentiment-analysis-phd-presentation | * EUR PhD presentation 'Sentiment Analysis of Text Guided by Semantics and Structure' (13-11-2015) ]]<br>
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/roget-s_thesaurus-of-english-words-and-phrases/roget-s_thesaurus-of-english-words-and-phrases.html * index of Roget's thesaurus (1805)]<br>
-[http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/classification_what-happened_roget---wordnet/classification_what-happened_roget---wordnet.html * comparing the classification of the word 'information' Thesaurus (1911) vs. WordNet 3.0 (2006)]<br>
-=annotations=
 * Alan Turing - Computing Machinery and Intelligence (1936)
 * The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html &rarr; abstract]
@@ Line 98: / Line 77: @@
 * Benjamin Bratton - [https://vimeo.com/145288035 On A.I. and Cities : Platform Design, Algorithmic Perception, and Urban Geopolitics] (Nov. 2015);
+==currently working on==
+[[User:Manetta/i-could-have-written-that/from-mining-minerals-to-mining-data | * terminology: data 'mining']]<br>
+[[User:Manetta/i-could-have-written-that/data-mining-in-the-wild | * ''Knowledge Discovery in Data'' (KDD) in the wild, problem formulations]]<br>
+[[User:Manetta/i-could-have-written-that/kdd-applications | * ''KDD'', applications]]<br>
+[[User:Manetta/i-could-have-written-that/knowledge-discovery-workflow | * ''KDD'', workflow]]<br>
+[[User:Manetta/i-could-have-written-that/text-processing/simplification | * text-processing: simplification]]<br>
+[[User:Manetta/i-could-have-written-that/data-mining-parties | * list of data mining parties]]<br>
-=bibliography (five key texts)=
+==other==
-* Language, Florian Cramer (2008); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/txt/florian-cramer_language.html &rarr; annotations]
+[[User:Manetta/thesis/thesis-outline-nlp | outline-thesis (2) &rarr; NLP]]
-* Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/transmediale_all-watched-over-by-algorithms_2015.html &rarr; annotations]
-* The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html &rarr; abstract]
+------------------------------
-*
+[[User:Manetta/thesis/thesis-in-progress | thesis in progress (overview)]]
+[[User:Manetta/thesis/chapter-intro | intro &+]]
+[[User:Manetta/thesis/chapter-1 | chapter 1]]
+[[User:Manetta/thesis/chapter-2 | chapter 2]]
+[[User:Manetta/thesis/chapter-3 | chapter 3]]

User:Manetta/thesis/thesis-outline: Difference between revisions