OuNuPo: Difference between revisions
Andre Castro (talk | contribs) |
|||
Line 48: | Line 48: | ||
* Zalan: page segmentation; image detection | * Zalan: page segmentation; image detection | ||
* Alex: Web ocr implementation | * Alex: Web ocr implementation: [[http://pzwiki.wdka.nl/mediadesign/User:Alexander_Roidl/WebOCR|WebOCR]] | ||
* Angeliki + Alice: training Tesseract to detect new font | * Angeliki + Alice: training Tesseract to detect new font | ||
** [[Install_Tesseract_4.0-Ubuntu]] | ** [[Install_Tesseract_4.0-Ubuntu]] |
Revision as of 11:34, 29 January 2018
Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle
Project Description
Main partner: WORM (WORM's Pirate Bay to be precise)
Special guests: Manetta Berends and Cristina Cochior (Algolit group)
The outcome of the special issue will be the following things:
- 2 book scanners (one to stay at XPUB, one to stay at WORM)
- one unique (as in unique copy) reader in the form of an artist's book.
The reader will be a collection of texts curated by students and staff on the topic of book scanner, text mobility, constraint writing, algorithmic literature, and I also hope the culture and politics of OCR, text analysis, AI in the context of text processing and generation.
- a collection of different software back-ends for the book scanner, so
as to reconfigure its functionality, ie you might scan a book, and get a pdf, the content of which has been manipulated in poetical or critical way, or you could also get something else entirely, a sound file, an collection of images, etc.
- a gigantic collection of files produced by combining the reader as
input source into the scanner making use of the plethora of different back-ends. IMPORTANT: the reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.
- an evening launch at WORM with presentation of the reader, back-ends,
results, etc.
Sessions
- 09.01.2018 Intro to the 'Feminist Reader' pad
- 10.01.2018 The Alphabet as Software - R&W
- 15.01.2018 Optical character recognition - pad
- 17.01.2018 Python and N+7 - pad
- 18.01.2018 OuLi/NuPo, natural language processing - pad
- 29.01.2018 Sonification
Independent Research
week 2
Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. Andre Castro (talk) 20:01, 15 January 2018 (CET)
- Zalan: page segmentation; image detection
- Alex: Web ocr implementation: [[1]]
- Angeliki + Alice: training Tesseract to detect new font
- Tash: retraining Tesseract to misinterpret text (hidding messages in plain-sight - Steganography ) here
- Joca: image manipulation to optimize OCR
Links/References/Reading List
Archivist Quill Book Scanner (Base Kit)
gPhoto a free, redistributable, ready to use set of digital camera software applications for Unix-like systems, written by a whole team of dedicated volunteers around the world. It supports more than 2300 cameras
Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14 (1/2). https://doi.org/10.1045/january2008-klijn.
DIY Book Scanner forum: Hardware & building
Planning Overview
weeks 2,3,4, 5: January
- @Steve and Delphine: develop the reader = select texts in Jan and edit and work on form in Feb. documentation on the wiki steve writing and notation
- @Andre: OCR
- @Michael
- @ Aymeric
- @ Cristina&Manetta
weeks 6,7,8,9: February
- week 6: delivery of the scanner parts
- weeks 7, 8: @Aymeric + Frederic/ Worm: scanner assemblage
Weeks 10,11,12,13: March
- week 11,12: 2nd and 3rd week of March
- testing scanner in
- 9-17 March barcode Dj at Worm http://kubriel.servus.at/index.php?s=barcodedjsfrom Hungarian artists group that produce bar code DJ sets ... could they be invited to perform with the bookscanner.
- 15-16 March Algoliterary Encounters at VARIA with WS on the 16th (could be a good moment to beta the platform / test present in public)
week 13: 4th week of March
- 28 March (wed) - Launch
- 29 March (thu) - Assessment