i could have written that - chapter 2

deriving information from written text → the material form of language

text analytics is rather a writing technique, not a reading technique

text analytics, text reading, reading? well, writing!

if text analytical software is regarded as a reading system, and the only object you can respond to is a set of results, it makes it difficult to formulate what the problem exactly is. as 'reading' implies that there is nothing in between the text and the reader. it implies that there are no moments of transformation or adjustments that happen between downloading large sets of data and presentating analytical results.

(consequences: 'objectiveness', claims that 'no humans are involved' in such automated processes because it is 'the data speaks' -- refs to Antoinette Rouvroy).

looking back at the TED presentation of Lyle Unger, people seem to agree with the results that were shown? could it be that they think, well "what is exactly the problem?", or "this is the data that speaks, right?" is it possible to reply to that by saying: “hey! this has not been a reading process that created these results!” “these truths are constructed”!

these are questions that need to be tackled to start a conversation about such results. but shouting back that the software is writing them would probably not work as impressive as the mysterious wordcloud does.

could the reaction of the TED audience be called a form of 'algorithmic agreeability'? the tendency to agree with text analytic results?

(more about algorithmic agreeability)

written language as source material

words, written language → a material attribute of language

text mining technologies are regarded as analytical 'reading' machines. where??? extracting information from large sets of written text.

example of Angie Keefer? 
computational linguistic's joke from text: No Brainer.

example of Matthew Fuller's Notebook 10?

written text is a representational object. [POEF, ref needed here, Saussure??] it refers to an academic writers's thoughts and experiments, an fiction writer's ideas and fantasies or to a journalist's experiences of a certain event. Written text then refers to an external world outside the text. Something that the words in the text in themselves are detached from. A something that is far away. Either in the sense of time or location, truthfulness or possibility.

for text analytics, written language is not processed as a representational source. the main source for text analytics is the materiality of written language.

analogy to typography, dealing with the optical materiality of words/sentences/text

Two types of material but optical gestures to written language come together in OCR-B. The font was designed in 1967 and touches the optical materiality of written text for two audiences. First the font is designed in such way, that every character is as much dintuingishable of the others as possible. This is done to make sure that an optical machine reading system would not mistake an '1' for an 'l'. In the same year, another font was designed for the same purposes: OCR-A. But the designer of OCR-B, the Swiss Adrian Frutiger, was given a special brief: to design a font that would be optimized for reading for both machine ánd human. The material form of the letters make it friendly for the human eye to read, and give another powerful function to the text in that time: being processable material for optical reading software.

text analytics departs from the assumption that the materiality of written words contains information. not in an optical sense, as is the case with OCR-B, but in a quantative and structural sense.

text analytics dealing with the quantifiable and structural materiality of words/sentences/text → word-counts and word-order.

where typography performs on the optical materiality of words, sentences and texts, text analytics approaches the material elements of language that are quantifiable and/or structural.

this information can be acquired by 'learning' a system how to recognize it. by using a system to create a model.

which is basically a mold, where only very specific information fits in.

→ training source / testing source problem, how is the data you train a model on, the only 'information' it returns? and how is the testing data basically giving confirmations that the model is 'right' according to that particular test data only? if a model is tested on certain data, it doesn't mean that it functions just as well on other new texts, right?

the system 'learns' by looking at many combinations of word-counts and word-order.

could these be called typographic gestures? done to enable/optimize a reading process? this time not a reading process performed by a human eye but rather by a computer program?

model that fits a very specific understanding of the world .........

but in order to create a model, the author of the model needs to have a clear goal. intention. he or she needs this goal in order to make the first descisions in the process.

what am i searching for? what source could offer me text that can show me that? how do i retreive a high amount of these texts on my computer? and what treatments do i give the text, in order to get the results that i am looking for?

after answering these questions, the process has only just begun.

typographical layers of text analytics

text analytics and statistical computing can be understood as a multilayered mixture of word-counts, syntactical rules, algorithms that detect patterns in large sets of numbers, a lot of comparisons, trial and errors, and targeted/goal-oriented search queries.

a process that does not 'extract' information from text. but rather 'derives' information from the large set of texts? (refs needed about deriving?)

extracting could be regarded as a reading process. take for example a large text and search for all sentences that end with a '?', for instance by using regular expressions. this will present to you a new version of the text that only exists out of questions. it can give a subtle indication where the text is about. in a sense, the new version of the text has just been 'written'. but there is not much changed to the questions themselves. they are still the questions they were before.

when a text is treated by word-counts and word-orderings, it does seem that there does not happen much to the text itself. word-counts in itself do not transform the text at all. but word-counts only also do not say so much at all. word-counts are used to sketch a 'profile' for a text. after a text is counted, each words is labelled with a certain 'weight'. a weight is basically nothing more than a number that shows the appearance of that word in the whole text, relatively to the total amount of all other words in the same text. it is basically a division between the word-count and text-count. [.....more about bag-of-words? name it like that?....]

what do these material analyses represent?

could these gestures still be called 'extractions'? there are some underlying assumptions that are still sleeping here, but almost ready to [ontwaken]. representing a text by its word-occurance is one thing. but what exactly is the text expected to 'represent' in the form of word-occurances? if the text is about exterrestrial life, does the set of word-weights than suddenly represent that topic? or do they only represent that specific piece of text? generalisations could wake up slippery areas of assumptions.

this is one point where text-analytical processes as 'mining' processes [omschakelen/omdraaien] from a reading machine to a writing machine.

writing in the sense that typography writes. optimizing the text for reading purposes, but also adding connotations and influencing subtle interpretations. descisions that originate from the designer.

links

thesis in progress (overview)

intro &+

chapter 1

chapter 2

User:Manetta/thesis/chapter-2

Contents