User:Manetta/graduation-proposals/presentation-assessment-trimester-4: Difference between revisions
No edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
these are the results of a text mining process, created by researchers of the WWBP. | these are the results of a text mining process, created by researchers of the WWBP. | ||
the researcher feels the need to excuse himself for the results he presents, | the researcher feels the need to excuse himself for the results he presents, | ||
"he didn't make this up". he suggests to not have been involved in the process to come to these results. | "he didn't make this up". he suggests to not have been involved in the process to come to these results. | ||
Line 42: | Line 44: | ||
== | ==algorithmic agreeability== | ||
Line 125: | Line 127: | ||
==but== | ==but== | ||
[[File:Text-mining-process.png|300px|border]] | [[File:Text-mining-technical-process.png|300px|border]] | ||
Line 169: | Line 171: | ||
--> when you pray every day, things will go well | --> when you pray every day, things will go well | ||
--> when you keep on 'mining', you will get the results you expect | --> when you keep on 'mining', you will get the results you expect | ||
==to conclude== | ==to conclude== |
Latest revision as of 19:50, 9 December 2015
presentation assessment - graduation proposal
pre
- needed to take a step back, and look at my proposal in a wider context
- to be able to formulate my critical position
intro
https://youtu.be/FjibavNwOUI?t=2m11s
these are the results of a text mining process, created by researchers of the WWBP.
the researcher feels the need to excuse himself for the results he presents,
"he didn't make this up". he suggests to not have been involved in the process to come to these results.
why does he do this?
why does he excuse himself?
why does he make a joke about it, while presenting the results as absolute truth?
context / previous practise
1. Cqrrelations
- introduction to text-mining
2. V2 exhibition - Encyclopedia of Media Object
- cataloging the object from the show in the structure of a trainingset for image recognition
→ how these categories reveal in which way the algorithm is trained to 'interpret' our world
3. relearn
- continuation of cqrrelations, looking at technical text mining process
- technical --> cultural focus
- culture around text mining --> fundaments created --> influence on how you can look at results
- culture of 'algorithm agreeability'
algorithmic agreeability
1. no human responsibility
- results are the 'direct data'
- absolute truth --> in a cloud (cloud = changing all the time, now results are fixed)
- most common words in the middle, bit like a heart of an entity
- made me think of 'death of the author', death of the text-miner
2. antropomorphism
- antropomorphic terms are used to describe the processes
- which might create the wrong expectations from these algorithms
- direct access to the data, without any form of mediation
- the data is true, because it is out there
3. assumptions are re-used
- from text mining discourse
- academic context, phd
- the assumptions from where these projects start (profiling people + sentiment = pos/neg)
4. geen linguist, maar data --> autonoom
- frederick jelinek, IBM 80s --> speech rec. software
- because the linguistic information was richer IN THE DATA
- no need for a linguistic interpretation
- believe in the data
i think this leads then to
→ data autonomy
data
- autonomous entity
- we accept how it is
- because it is there
→ what do these results mean????
- what happens at 40/45???
- do these results confirm your 'common sense'?
- how can we have an opinion about this?
these four points
1. no human responsibility 2. anthropomorphism 3. assumptions are re-used 4. data = direct resource, no need for linguistic interpretation
- show how the culture around text mining frames how we interpret text-mining results
- anthropomorhism --> mystifies the process
- data = framed as autonomous entity, because it is out there
but
if we look at the technical process, 3 aspects
- data documents
- humans
- software
a text mining process consists out of human descisions
big influence on the outcomes
but the algorithm agreeability frames that the human descisions are not involved
so: we have direct access to the data, without any mediating layer
but already the human involvement is a form of mediation:
- what are the features to mine for
- selecting the data to work with
- simplifying written text to data
- comparing results with the expectations
- if not satisfied --> through try and error redo the whole process
the 4 cultural points of agreeability make it possible
to regard the text mining process as a kind of data-religion
it starts to show similarities, such as
- human decisions are presented as absolute truth
- patterns in a set of abstract meaningless data are explained
--> when you pray every day, things will go well --> when you keep on 'mining', you will get the results you expect
to conclude
i feel skeptical about text-mining processes
- correlating → meaning to patterns in language usages --> raises many questions for me
- and the human involvement is presented as absolute truths
- feels like a data-religion, because of the strong sense of algorithmic agreeability
grad. project
how to work from these worries?
--> i think it can be useful to reveal this cultural agreeability --> it shows how we relate to data & automated processes
- practical next steps:
- choosing a case study
for example the World Well Being Project
- analyse cultural elements they use
- visual language they use to show results
- vocabulary use, definitions
- context → in what field active?
i can imagine a collection of these elements
i am thinking about making a a publication, or publication series
maybe not only about text-mining, but it could expand later to other field of natural language processing
prototypes
- list of departments where text-mining falls under
- ocr vs. text mining
- 'information' (thesaurus & wordnet)
- Joseph Weizenbaum questions