User:Manetta/i-could-have-written-that/data-mining-in-the-wild

From XPUB & Lens-Based wiki

data mining in the wild

problem formulations

wider public: the user

  • free labour; users are microworkers, they provide free labour to the services they use. After the Snowden affairs this got more attention to a wider public, and people slowly are aware of the presence of data analytics.
  • user-data is sold to third parties (who are these parties? and how is the data offered? in what shape? pre-selected?)
  • both hardware and software is more and more 'black-boxed', the possibility to check how something works is difficult or made impossible. as most of the software today is running as a service, the user relies on the software company behind the service.
  • user-profiles & user-predictions & user-recommendations; data mining is a technology fashion. because of the amount of user-data, data mining is an interesting tool for customized advertisements, search results & sales recommendations.

academic, enterprises, technological, specific

  • data is framed to be 'raw', and regarded to be a natural resource. due to the term data 'mining', data is seen as a material that easily can be extracted from the web. but when looking closer, data mining results rather seem to be constructed.
  • text-processing
    • text mining software aims to be a universal text processing system.
    • written text can only be processed when it is strongly simplified (into eg. ngrams, bag-of-words or vector-space-models).
    • to search for meaningful information in meaningless data, reference datasets as WordNet are functioning as a norm to extract semantic relations in data.

notes

  • cross overlap between computer scientists, statisticians & data scientists
    • data mining companies take over the role of (academic?) staticians, with their agenda (driven by profit and efficiency? ..... speculations here).