User:Manetta/i-could-have-written-that/data-mining-in-the-wild

From XPUB & Lens-Based wiki

data mining in the wild

problem formulations

wider public: the user

  • free labour; users are microworkers, they provide free labour to the services they use. After the Snowden affairs this got more attention to a wider public, and people slowly are aware of the presence of data analytics.
  • user-data is sold to third parties (who are these parties? and how is the data offered? in what shape? pre-selected?)
  • user-profiles & user-predictions & user-recommendations; data mining is a technology fashion. because of the amount of user-data, data mining is an interesting tool for customized advertisements, search results & sales recommendations. how such results and recommendations are constructed is unclear, which makes it very difficult to disagree with or critize them. how can we speak back to the construction of these results?
  • both hardware and software is more and more 'black-boxed', the possibility to check how something works is difficult or made impossible. as most of the software today is running as a service, the user relies on the information from software company behind the service.

academic, enterprises, technological, specific

  • data is framed to be 'raw', and regarded to be a natural resource. due to the term data 'mining', data is seen as a material that easily can be extracted from the web. but when looking closer, data mining results rather seem to be constructed.
  • text-processing
    • text mining software aims to be a universal text processing system.
    • written text can only be processed when it is strongly simplified (into eg. ngrams, bag-of-words or vector-space-models).
    • to search for meaningful information in meaningless data, reference datasets as WordNet are functioning as a norm to extract semantic relations in data.

notes

cross overlap between computer scientists, statisticians & data scientists 
data mining companies take over the role of (academic?) staticians, 
with their agenda (driven by profit and efficiency? ..... speculations here).