User:Manetta/thesis/thesis-outline: Difference between revisions
No edit summary |
No edit summary |
||
(15 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
<div style="width:100%;max-width:800px;"> | <div style="width:100%;max-width:800px;"> | ||
=outline= | =outline - i could have written that= | ||
==intro== | ==intro== | ||
===text analytics < > systemization of language=== | |||
This text originates from an interest in the systemization of language that is needed for computer software to be able to 'understand' and process written language. | |||
=== | ===vocabulary=== | ||
* buzzwords (machine learning, big data, data mining) (ref to Florian Cramer) | |||
* metaphor (too much ???) | |||
* one of the five KDD steps | |||
= | ===problematic situation=== | ||
... | |||
Text mining seems to be a rather brutal way to deal with the aim to process natural language into useful information. To reflect on this brutality, tracing back a longer tradition of natural language processing could be usefull. Hopefully this will be a way to create some distance to the hurricanes of data that are mainly known as 'big', 'raw' or 'mined' these days. | |||
Text mining | |||
===audience=== | |||
This thesis will aim for an audience that is interested in an alternative perspective on buzzwords like 'big data' and 'data-mining'. Also, this thesis will (hopefully!) offer a view from a computer-vision side: how software is written to understand the non-computer world of written text. | |||
==hypothesis== | ==hypothesis== | ||
The results of data-mining software are not mined, results are constructed. <br> | The results of data-mining software are not mined, results are constructed. <br> | ||
== | ==chapter 1: on what basis? three settings to highlight differences in text analytical ideologies== | ||
* setting 1: PhD candidate's thesis defence, Faculty of Economics, Erasmus University Rotterdam | |||
* setting 2: Lyle Unger's TED Talk, World Well Being Project, Faculty of Psychology, University of Pennsylvania | |||
== | * setting 3: Guy de Pauw's introduction on text mining software, CLiPS, Faculty of Arts & Philosophy, Computational Linguistics & Psycholinguistics department, University of Antwerp | ||
* | |||
==chapter 2: deriving information from written text → the material form of language== | |||
* statistical text analytics is not 'read-only', it's writing | |||
** to extract? → to derive | |||
* written language as source material | |||
** analogy to typography, dealing with the optical materiality of words/sentences/text | |||
** text analytics dealing with the quantifiable and structural materiality of words/sentences/text | |||
*** word-counts | |||
*** word-order/structure | |||
* what do these material analyses represent? | |||
** key-value format (?) | |||
==chapter 3: information extraction / text categorization. diving into the software!== | |||
* unsupervised | |||
* supervised | |||
Line 57: | Line 48: | ||
==bibliography (five key texts)== | ==bibliography (five key texts)== | ||
* Joseph Weizenbaum - Computer Power and Human Reason: From Judgement to Calculation (1976); | * Joseph Weizenbaum - Computer Power and Human Reason: From Judgement to Calculation (1976); | ||
* Winograd + Flores - Understanding Computers & Cognition (1987); | * Winograd + Flores - Understanding Computers & Cognition (1987); | ||
Line 63: | Line 53: | ||
* Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/transmediale_all-watched-over-by-algorithms_2015.html → annotations] | * Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); [http://pzwart1.wdka.hro.nl/~manetta/annotations/html/events%2btalks/transmediale_all-watched-over-by-algorithms_2015.html → annotations] | ||
* The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html → abstract] | * The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); [http://pzwart1.wdka.hro.nl/~manetta/i-could-have-written-that/elements/automatic-reading-machines/automatic-reading-machines.html → abstract] | ||
{{#widget:YouTube|id=JFgsdzikVZU}} | |||
==annotations== | ==annotations== | ||
Line 95: | Line 87: | ||
==other== | ==other== | ||
[[User:Manetta/thesis/thesis-outline-nlp | outline-thesis (2) → NLP]] | [[User:Manetta/thesis/thesis-outline-nlp | outline-thesis (2) → NLP]] | ||
------------------------------ | |||
[[User:Manetta/thesis/thesis-in-progress | thesis in progress (overview)]] | |||
[[User:Manetta/thesis/chapter-intro | intro &+]] | |||
[[User:Manetta/thesis/chapter-1 | chapter 1]] | |||
[[User:Manetta/thesis/chapter-2 | chapter 2]] | |||
[[User:Manetta/thesis/chapter-3 | chapter 3]] |
Latest revision as of 15:09, 30 April 2016
outline - i could have written that
intro
text analytics < > systemization of language
This text originates from an interest in the systemization of language that is needed for computer software to be able to 'understand' and process written language.
vocabulary
- buzzwords (machine learning, big data, data mining) (ref to Florian Cramer)
- metaphor (too much ???)
- one of the five KDD steps
problematic situation
...
Text mining seems to be a rather brutal way to deal with the aim to process natural language into useful information. To reflect on this brutality, tracing back a longer tradition of natural language processing could be usefull. Hopefully this will be a way to create some distance to the hurricanes of data that are mainly known as 'big', 'raw' or 'mined' these days.
audience
This thesis will aim for an audience that is interested in an alternative perspective on buzzwords like 'big data' and 'data-mining'. Also, this thesis will (hopefully!) offer a view from a computer-vision side: how software is written to understand the non-computer world of written text.
hypothesis
The results of data-mining software are not mined, results are constructed.
chapter 1: on what basis? three settings to highlight differences in text analytical ideologies
- setting 1: PhD candidate's thesis defence, Faculty of Economics, Erasmus University Rotterdam
- setting 2: Lyle Unger's TED Talk, World Well Being Project, Faculty of Psychology, University of Pennsylvania
- setting 3: Guy de Pauw's introduction on text mining software, CLiPS, Faculty of Arts & Philosophy, Computational Linguistics & Psycholinguistics department, University of Antwerp
chapter 2: deriving information from written text → the material form of language
- statistical text analytics is not 'read-only', it's writing
- to extract? → to derive
- written language as source material
- analogy to typography, dealing with the optical materiality of words/sentences/text
- text analytics dealing with the quantifiable and structural materiality of words/sentences/text
- word-counts
- word-order/structure
- what do these material analyses represent?
- key-value format (?)
chapter 3: information extraction / text categorization. diving into the software!
- unsupervised
- supervised
material
bibliography (five key texts)
- Joseph Weizenbaum - Computer Power and Human Reason: From Judgement to Calculation (1976);
- Winograd + Flores - Understanding Computers & Cognition (1987);
- Vilem Flusser - Towards a Philosophy of Photography (1983); → annotations
- Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); → annotations
- The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); → abstract
annotations
- Alan Turing - Computing Machinery and Intelligence (1936)
- The Journal of Typographic Research - OCR-B: A Standardized Character for Optical Recognition this article (V1N2) (1967); → abstract
- Ted Nelson - Computer Lib & Dream Machines (1974);
- Joseph Weizenbaum - Computer Power and Human Reason (1976); → annotations
- Water J. Ong - Orality and Literacy (1982);
- Vilem Flusser - Towards a Philosophy of Photography (1983); → annotations
- Christiane Fellbaum - WordNet, an Electronic Lexical Database (1998);
- Charles Petzold - Code, the hidden languages and inner structures of computer hardware and software (2000); → annotations
- John Hopcroft, Rajeev Motwani, Jeffrey Ullman - Introduction to Automata Theory, Languages, and Computation (2001);
- James Gleick - The Information, a History, a Theory, a Flood (2008); → annotations
- Matthew Fuller - Software Studies. A lexicon (2008);
- Language, Florian Cramer; → annotations
- Algorithm, Andrew Goffey;
- Marissa Meyer - the physics of data, lecture (2009); → annotations
- Matthew Fuller & Andrew Goffey - Evil Media (2012); → annotations
- Antoinette Rouvroy - All Watched Over By Algorithms - Transmediale (Jan. 2015); → annotations
- Benjamin Bratton - Outing A.I., Beyond the Turing test (Feb. 2015) → annotations
- Ramon Amaro - Colossal Data and Black Futures, lecture (Okt. 2015); → annotations
- Benjamin Bratton - On A.I. and Cities : Platform Design, Algorithmic Perception, and Urban Geopolitics (Nov. 2015);
currently working on
* terminology: data 'mining'
* Knowledge Discovery in Data (KDD) in the wild, problem formulations
* KDD, applications
* KDD, workflow
* text-processing: simplification
* list of data mining parties