User:Manetta/i-could-have-written-that/data-mining-in-the-wild

From XPUB & Lens-Based wiki

data mining in the wild

problem formulations

wider public: the user

  • free labour; users are microworkers, they provide free labour to the services they use. This behavior has been revealed since Edward Snowden's disclosures, and people slowly more aware of the presence of data analytics.
  • user-data is sold to third parties (who are these parties? and how is the data offered? in what shape? pre-selected?)
  • user-profiles & user-predictions & user-recommendations; data mining is a technology fashion. because of the amount of user-data, data mining is an interesting tool for customized advertisements, search results & sales recommendations. how such results and recommendations are constructed is unclear, which makes it very difficult to disagree with or critize them. how can we speak back to the construction of these results?
  • both hardware and software is more and more 'black-boxed', the possibility to check how something works is difficult or made impossible. as most of the software today is running as a service, the user relies on the information from the software company behind the service.

academic, enterprises, technological, specific

  • terminology: due to the term data 'mining', data is seen as a material that easily can be extracted from the web. data is framed to be 'raw', and regarded to be a natural resource. but when looking closer, data mining results rather seem to be constructed.
  • data mining results are easily accepted as objective truths that don't involve human made descisions. but data mining algorithms aren’t just technical artifacts, they’re fundamentally human in their design and their use.*
  • text-processing
    • text mining software aims to be a universal text processing system.
    • written text can only be processed when it is strongly simplified (into eg. ngrams, bag-of-words or vector-space-models).
    • to search for meaningful information in meaningless data, reference datasets as WordNet are functioning as a norm to extract semantic relations in data.

(quoted from: Critical Algorithm Studies: a Reading List)

notes

cross overlap between computer scientists, statisticians & data scientists 
data mining companies take over the role of (academic?) staticians, 
with their agenda (driven by profit and efficiency? ..... speculations here).