intro

hypothesis

The results of text mining software are not 'mined', results are constructed.

text mining as writing technique (structure)

chapter 1 - raw language

the non-man paradox text as data parsing excercise - split (tokenize) - count (bag-of-words) - tag (part-of-speech, POS) the non-text? the non-text paradox, no context levels of rawness ideals of rawness

chapter 2 - various approaches - 3 case studies

manager (economy PhD candidate)

using raw data to make decisions

magician (psychologist)

using the rawness of data as a smoke screen, making use of common sense, clichés and assumptions

archaeologist (comp. linguist)

using the rawness of the words as material to work with, to carefully derive information from, by following different standards and procedures

chapter 3 - from 'mining' to KDD

examples of the use of the term 'mining' in popular articles! KDD 1989 version, initial people that coined the term: elements of subjectivity + loops involved (KDD 2013 version)

+ parts of Pattern's close reading could maybe illustrate some of the KDD steps in more detail

conclusion

the practice of mining is dirty, messy and contains many gray areas that are tweaked until the results match certain preset expectations.

links

thesis in progress (overview)

User:Manetta/thesis/chapter-intro

Contents

intro

hypothesis

text mining as writing technique (structure)

links