little glossary

language (as a resource)

It is discussable if language itself could be regarded as a technology or not. For my project i will follow James Gleick's statement in his book 'The Information: a Theory, a History, a Flood'^[1], where he states: Language is not a technology, (...) it is not best seen as something separate from the mind; it is what the mind does. (...) but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools and it is a tool.

natural language?

For this project i would like to look at 'natural language' from a perspective grounded in computer science, computational linguistics and artificial intelligence (AI), where the term 'natural language' is mostly used in the context of 'natural language processing' (NLP), a field of studies that researches the transformation of natural (human) language into information(?), a format(?) that can be processed by a computer. (language-(...)-information)

systemization

I'm interested in that moment of systemization: the moment that language is transformed into a model. These models are materialized in lexical-datasets, text-parsers, or data-mining algorithms. They reflect an aim of understanding the world through language (similar how the invention of geometry made it possible to understand the shape of the earth).

automation

Such linguistic models are needed to write software that automates reading processes, which is a more specific example of natural language processing. It aims to automate the steps of information processing, for example to generate new information (in data-mining) or to perform processes on a bigger scale (which is the case for translation engines).

reading (automatic reading machines)

In 1967, The Journal for Typographic Research expressed already high expectations of such 'automatic reading machines', as they would widen the bounds of the field of data processing. The 'automatic reading machine' they refered to used an optical reading process, that would be optimized thanks to the design of a specific font (called OCR-B). It was created to optimize reading both for the human eye and the computer (using OCR software).

reading-writing (automatic reading-writing-machines)

An optical reading process starts by recognizing a written character by its form, and transforming it into its digital 'version'. This optical reading process could be compared to an encoding and decoding process (like Morse code), in the sense that the process could also be excecuted in reverse, without getting different information. The translation process is a direct process.

But technologies like data-mining are processing data less direct. Every result that has been 'read' by the algorithmic process, is one version, one interpretation, one information process 'route' of the written text. Therefore could data-mining not be called a 'read-only' process, but better be labeled as a 'reading-writing' process. Where does the data-mining-process write?

tools & technologies

Tools & examples to look at are: WordNet (a lexical dataset), Pattern (text-mining software), programming languages (high-level languages like Python or markup-languages like HTML), text-parsing (turning text into number), ngrams (common word combinations);

↑ James Gleick's personal webpage, The Information: a Theory, a History, a Flood - James Gleick (2011)

[gleick-1] James Gleick's personal webpage, The Information: a Theory, a History, a Flood - James Gleick (2011)

[1]

User:Manetta/i-could-have-written-that/little-glossary

Contents