User:Manetta/i-could-have-written-that/little-glossary
little glossary
language (as a resource)
It is discussable if language itself could be regarded as a technology or not. For my project i will follow James Gleick's statement in his book 'The Information: a Theory, a History, a Flood'[1], where he states: Language is not a technology, (...) it is not best seen as something separate from the mind; it is what the mind does. (...) but when the word is instantiated in paper or stone, it takes on a separate existence as artifice. It is a product of tools and it is a tool.
writing as technology
...
alphabet as a tool
A very primary writing technology is the latin alphabet. The main set of 26 characters is a toolbox that enables us to systemize language into characters, into words, into sentences. When considering these tools as technologies, it makes it possible to follow a line from natural language to a computer language that computers can take as input, via multiple forms of mediation.
natural language?
For this project i would like to look at 'natural language' from a perspective grounded in computer science, computational linguistics and artificial intelligence (AI), where the term 'natural language' is mostly used in the context of 'natural language processing' (NLP), a field of studies that researches the transformation of natural (human) language into information(?), a format(?) that can be processed by a computer. (language-(...)-information)
systemization
I'm interested in that moment of systemization: the moment that language is transformed into a model. These models are materialized in lexical-datasets, text-parsers, or data-mining algorithms. They reflect an aim of understanding the world through language (similar how the invention of geometry made it possible to understand the shape of the earth).
automation
Such linguistic models are needed to write software that automates reading processes, which is a more specific example of natural language processing. It aims to automate the steps of information processing, for example to generate new information (in data-mining) or to perform processes on a bigger scale (which is the case for translation engines).
heteromation
(...)
reading (automatic reading machines)
In 1967, The Journal for Typographic Research expressed already high expectations of such 'automatic reading machines', as they would widen the bounds of the field of data processing. The 'automatic reading machine' they refered to used an optical reading process, that would be optimized thanks to the design of a specific font (called OCR-B). It was created to optimize reading both for the human eye and the computer (using OCR software).
reading-writing (automatic reading-writing-machines)
An optical reading process starts by recognizing a written character by its form, and transforming it into its digital 'version'. This optical reading process could be compared to an encoding and decoding process (like Morse code), in the sense that the process could also be excecuted in reverse, without getting different information. The translation process is a direct process.
But technologies like data-mining are processing data less direct. Every result that has been 'read' by the algorithmic process, is one version, one interpretation, one information process 'route' of the written text. Therefore could data-mining not be called a 'read-only' process, but better be labeled as a 'reading-writing' process. Where does the data-mining-process write?
tools & technologies
(...)
Tools & examples to look at are: WordNet (a lexical dataset), Pattern (text-mining software), programming languages (high-level languages like Python or markup-languages like HTML), text-parsing (turning text into number), ngrams (common word combinations);
data
(...)
document
(...)
workflow
(...)
data driven
(...) An example of a data-driven system is d3 (Data Driven Documents), which documents only 'happen' in the browser. Therefore d3 is not ideal for publishing aims.
document driven
(...) For example: MongoDB is called a document-oriented database, which probably refers to its flexibility (as Mongo does not use relational tables like we know from spreadsheets for example), but is confusing to me as it doesn't 'take' document files on its elemental level.
template
(...)
authoring
(...)
serialization
→ (for programmers:) a written form of something
→ to serialize = to put in order
Serialization is an in between state, and therefore not a writing form. It contains the idea of semantic structured data, and standardization to use these structures in between documents. Examples of serializations are xml, rdf & microdata.
'mining'
references
- ↑ James Gleick's personal webpage, The Information: a Theory, a History, a Flood - James Gleick (2011)