User:Manetta/thesis/chapter-intro: Difference between revisions
Line 38: | Line 38: | ||
(KDD 2013 version) | (KDD 2013 version) | ||
+ parts of | + parts of Pattern's close reading could maybe illustrate some of the KDD steps in more detail | ||
'''conclusion''' | '''conclusion''' |
Latest revision as of 15:10, 30 April 2016
intro
hypothesis
The results of text mining software are not 'mined', results are constructed.
text mining as writing technique (structure)
chapter 1 - raw language
the non-man paradox text as data parsing excercise - split (tokenize) - count (bag-of-words) - tag (part-of-speech, POS) the non-text? the non-text paradox, no context levels of rawness ideals of rawness
chapter 2 - various approaches - 3 case studies
manager (economy PhD candidate)
- using raw data to make decisions
magician (psychologist)
- using the rawness of data as a smoke screen, making use of common sense, clichés and assumptions
archaeologist (comp. linguist)
- using the rawness of the words as material to work with, to carefully derive information from, by following different standards and procedures
chapter 3 - from 'mining' to KDD
examples of the use of the term 'mining' in popular articles! KDD 1989 version, initial people that coined the term: elements of subjectivity + loops involved (KDD 2013 version)
+ parts of Pattern's close reading could maybe illustrate some of the KDD steps in more detail
conclusion
the practice of mining is dirty, messy and contains many gray areas that are tweaked until the results match certain preset expectations.