User:Manetta/i-could-have-written-that/data-mining-in-the-wild
data mining in the wild
problem formulations
wider public: the user
- free labour; users are microworkers, they provide free labour to the services they use. After the Snowden affairs this got more attention to a wider public, and people slowly are aware of the presence of data analytics.
- user-data is sold to third parties (who are these parties? and how is the data offered? in what shape? pre-selected?)
- both hardware and software is more and more 'black-boxed', the possibility to check how something works is difficult or made impossible. as most of the software today is running as a service, the user relies on the software company behind the service.
- user-profiles & user-predictions & user-recommendations; data mining is a technology fashion. because of the amount of user-data, data mining is an interesting tool for customized advertisements, search results & sales recommendations.
academic, enterprises, technological, specific
- data is framed to be 'raw', and regarded to be a natural resource. due to the term data 'mining', data is seen as a material that easily can be extracted from the web. but when looking closer, data mining results rather seem to be constructed.
- text-processing
- text mining software aims to be a universal text processing system.
- written text can only be processed when it is strongly simplified (into eg. ngrams, bag-of-words or vector-space-models).
- to search for meaningful information in meaningless data, reference datasets as WordNet are functioning as a norm to extract semantic relations in data.
notes
- cross overlap between computer scientists, statisticians & data scientists
- data mining companies take over the role of (academic?) staticians, with their agenda (driven by profit and efficiency? ..... speculations here).