OuNuPo
Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle
Project Description
Main partner: WORM (WORM's Pirate Bay to be precise)
Special guests: Manetta Berends and Cristina Cochior (Algolit group)
The outcome of the special issue will be the following things:
- 2 book scanners (one to stay at XPUB, one to stay at WORM)
- one unique (as in unique copy) reader in the form of an artist's book.
The reader will be a collection of texts curated by students and staff on the topic of book scanner, text mobility, constraint writing, algorithmic literature, and I also hope the culture and politics of OCR, text analysis, AI in the context of text processing and generation.
- a collection of different software back-ends for the book scanner, so
as to reconfigure its functionality, ie you might scan a book, and get a pdf, the content of which has been manipulated in poetical or critical way, or you could also get something else entirely, a sound file, an collection of images, etc.
- a gigantic collection of files produced by combining the reader as
input source into the scanner making use of the plethora of different back-ends. IMPORTANT: the reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.
- an evening launch at WORM with presentation of the reader, back-ends,
results, etc.
Sessions
- 09.01.2018 Intro to the 'Feminist Reader' pad
- 10.01.2018 The Alphabet as Software - R&W
- 15.01.2018 Optical character recognition - pad
- 17.01.2018 Python and N+7 - pad
- 18.01.2018 Publishing Pipelines - OuLi/NuPo, natural language processing, materiality (pad)
- 30.01.2018 Publishing Pipelines - methodologies, considerations & examples + a short pdf field study @ pirate libraries (pad)
- 29.01.2018 Sonification - pad
- 30.01.2018 Publishing Pipelines - inter-annotator (dis)agreements, counting techniques & training a binary classifier (pad)
Independent Research
week 2
Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. Andre Castro (talk) 20:01, 15 January 2018 (CET)
- Zalan: page segmentation; image detection
- Alex: Web ocr implementation: WebOCR
- Angeliki + Alice: training Tesseract to detect new font
- Tash: retraining Tesseract to misinterpret text (hidding messages in plain-sight - Steganography ) here
- Joca: image manipulation to optimize OCR
week 5
Raw data sonification/visualization research: a recipe, a work (with documentation on how you got to the result), a survey of tools, a workflow, or whatever you feel curious about, concerning that topic.
week 8 (feb 20 2018)
Natasha
First scan, OCR results and take the 50 most common words, and this should then somehow affects what happens in next scan.
- Where do you store things like a text file,
- How does it get exported / postprocessed
- When does OCR occur?
- What hooks can we use?
Alice
Desire to go back to print, maybe a poster. Select words based on their length (example of OuLiPo, Carl Andre)
- How to extract patterns from words
- How to produce new outputs (build a poster)
- What "leftovers" exist in the processing of the images?
- Word searches
Zalan
Needs to find a rule to constraining. Reading in a non-linear way
> Borges ... L
OCR Materializing text in 3D ... blender. 360 projection ... spatialize?
Alex
- Scanned data as chatbot?
- Database back to narrative ...
- Integrating with ReportLab/Platypus generated layouts
- Including code in results
Links/References/Reading List
The Archivist — DIY Book Scanner http://diybookscanner.org/archivist/indexee7f.html
Archivist Book Scanner (looks more like our parts)
- includes 2 Canon Power shot running Canon Hack Development Kit
building notes
- Software: pad
- Uploaded the preinstalled .img to the pzi-server (temporarily): scannerpi.img.xz (decompress with xz, and partition maybe needs to be resized)
- Installed new image on PI: https://github.com/Tenrec-Builders/pi-scan
- Bookscanner makefile
- Hardware:
Scanning software
- post-processing:
- scan-tailor: interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
links
Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14 (1/2). https://doi.org/10.1045/january2008-klijn.
DIY Book Scanner forum: Hardware & building