User:Manetta/i-could-have-written-that/data-mining-in-the-wild

From XPUB & Lens-Based wiki

'knowledge discovery in data' in the wild

problem formulations

hypothesis

The results of data-mining software are not mined, results are constructed.

wider public: the user

  • terminology: due to the term data 'mining', data is seen as a material that easily can be extracted from the web. data is framed to be 'raw', and regarded to be a natural resource. but when looking closer, data mining results rather seem to be constructed.
  • profiling: corporations create user-profiles to increase search results (which will make the user come back), and targeted advertisements. But because of these profiles, there is a risk for singularity (users will learn how to simplify their search queries) & profile circles (you can only find what your profiles allows you to)
  • in action; pattern recognition algorithms make mistakes (that makes them funny, which is a way how people get in touch with a certain technique); (derived from the Algopop blog by Matthew Plummer-Fernandez)

  • free labour; users are microworkers, they provide free labour to the services they use. This behavior has been revealed since Edward Snowden's disclosures, and people slowly more aware of the presence of data analytics.
  • privacy & free labour; user-data is sold to third parties (who are these parties? and how is the data offered? in what shape? pre-selected?)
  • black-boxing; user-profiles & user-predictions & user-recommendations; data mining is a technology fashion. because of the amount of user-data, data mining is an interesting tool for customized advertisements, search results & sales recommendations. how such results and recommendations are constructed is unclear, which makes it very difficult to disagree with or critize them. how can we speak back to the construction of these results?
  • black-boxing; both hardware and software is more and more 'black-boxed', the possibility to check how something works is difficult or made impossible. as most of the software today is running as a service, the user relies on the information from the software company behind the service.

popular culture

  • techno optimism: automatic detection systems are the future
  • solutionism
  • false expectations of technology
  • metaphors & anthropormophism
  • denial of human presence in software (no humans involved)

academic, enterprises, technological, specific

  • scienctific analytics meet computer analytics
    • from statistic to machine learning as new form of 'truth' construction, it's stated that 'no a priori position is needed aynomore'
    • results are easily accepted as objective truths that don't involve human made descisions. but (data mining) algorithms aren’t just technical artifacts, they’re fundamentally human in their design and their use.*
    • language is a cultural product, and so is computer language**; it's 'connotative' (ambiguous) → complexes of symbols, providing space for connotation***
    • industrial science in the commercial field of corporations (science through execution)
  • Knowledge Discovery in Data process (text processing)
    • abstraction: semantic --> mathematical system; (into eg. ngrams, bag-of-words or vector-space-models)
    • abstraction: deducation, causation, correlation
    • data is a representation (of actions / decisions), and not a direct siginificant object → it's rather a cultural by-product
    • text mining software aims to be a universal text processing system.
    • written text can only be processed when it is strongly simplified (into eg. ngrams, bag-of-words or vector-space-models).
    • to search for meaningful information in meaningless data, reference datasets as WordNet are functioning as a norm to extract semantic relations in data.
  • Knowledge Discovery in Data process (workflow)
    • how is the point of departure present in mining results? → it is not about being 'true' or not, it is rather engineered to be so. (Steve)
    • in the 80s (when employed by IBM), Frederick Jelinek stated: Every time I fire a linguist, the performance of the speech recognizer goes up. (it's labeled as a 'famous quote' on Wikipedia), which gives a certain autonomy to the data. But how does the researcher know when he is 'right'?
    • how are pre-trained algorithms easily re-applied in other contexts?

* (quoted from: Critical Algorithm Studies: a Reading List)

** (from: Florian Cramer, Language - (2008)

*** (quoted from: Vilem Flusser, Towards a Philosophy of Photography - (1983)

notes

algorithmic agreeability
cross overlap between computer scientists, statisticians & data scientists 
data mining companies take over the role of (academic?) staticians, 
with their agenda (driven by profit and efficiency? ..... speculations here).