User:Manetta/thesis/chapter-intro: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 1: Line 1:
<div style="width:750px;">
<div style="width:750px;">
__TOC__
__TOC__
=i could have written that - intro=
=intro=


== text analytics < > systemization of language ==
==hypothesis==
The results of text mining software are not 'mined', results are constructed.


This text originates from an interest in the systemization of language that is needed for computer software to be able to 'understand' and process written language.
== text mining as writing technique (structure) ==


The aim of this text is to perceive text analytics & statistical computing from another angle: to see, sense, feel and somehow understand what 'gestures' are applied to written words to make them more 'readable' for a computer software/program.
'''chapter 1 - raw language'''


as typography does for the human eye so to say.
the non-man paradox
text as data
parsing excercise
- split (tokenize)
- count (bag-of-words)
- tag (part-of-speech, POS)
the non-text?
the non-text paradox, no context
levels of rawness
ideals of rawness


i try to formulate a lot of questions that arose while working with the software, listening to presentations or video's and reading about the technology in academic papers, books and online articles.
'''chapter 2 - various approaches - 3 case studies'''


with these questions i hope to give an insight in the particular way that these techniques are looking at language, in which fields they are applied, and with what ideologies they seem to be embraced.
manager (economy PhD candidate)
- using raw data to make decisions


questions that hopefully lift up some layers that cover the techniques, to take a sneak peak into their strength and persuasiveness.
magician (psychologist)
- using the rawness of data as a smoke screen, making use of common sense, clichés and assumptions


==hypothesis==
archaeologist (comp. linguist)
The results of text mining software are not 'mined', results are constructed.
- using the rawness of the words as material to work with, to carefully derive information from, by following different standards and procedures
 
'''chapter 3 - from 'mining' to KDD'''
 
examples of the use of the term 'mining' in popular articles!
KDD 1989 version, initial people that coined the term: elements of subjectivity + loops involved
(KDD 2013 version)
 
+ parts of the modality.py close reading could maybe illustrate some of the KDD steps in more detail
 
'''conclusion'''
 
the practice of mining is dirty, messy and contains many gray areas that are tweaked until the results match certain preset expectations.





Revision as of 15:17, 30 April 2016

intro

hypothesis

The results of text mining software are not 'mined', results are constructed.

text mining as writing technique (structure)

chapter 1 - raw language

the non-man paradox text as data parsing excercise - split (tokenize) - count (bag-of-words) - tag (part-of-speech, POS) the non-text? the non-text paradox, no context levels of rawness ideals of rawness

chapter 2 - various approaches - 3 case studies

manager (economy PhD candidate) - using raw data to make decisions

magician (psychologist) - using the rawness of data as a smoke screen, making use of common sense, clichés and assumptions

archaeologist (comp. linguist) - using the rawness of the words as material to work with, to carefully derive information from, by following different standards and procedures

chapter 3 - from 'mining' to KDD

examples of the use of the term 'mining' in popular articles! KDD 1989 version, initial people that coined the term: elements of subjectivity + loops involved (KDD 2013 version)

+ parts of the modality.py close reading could maybe illustrate some of the KDD steps in more detail

conclusion

the practice of mining is dirty, messy and contains many gray areas that are tweaked until the results match certain preset expectations.


links

thesis in progress (overview)

intro &+

chapter 1

chapter 2