Revision as of 18:54, 22 February 2018

Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle

Project Description

Main partner: WORM (WORM's Pirate Bay to be precise)

Special guests: Manetta Berends and Cristina Cochior (Algolit group)

The outcome of the special issue will be the following things:

2 book scanners (one to stay at XPUB, one to stay at WORM)
one unique (as in unique copy) reader in the form of an artist's book.

The reader will be a collection of texts curated by students and staff on the topic of book scanner, text mobility, constraint writing, algorithmic literature, and I also hope the culture and politics of OCR, text analysis, AI in the context of text processing and generation.

a collection of different software back-ends for the book scanner, so

as to reconfigure its functionality, ie you might scan a book, and get a pdf, the content of which has been manipulated in poetical or critical way, or you could also get something else entirely, a sound file, an collection of images, etc.

a gigantic collection of files produced by combining the reader as

input source into the scanner making use of the plethora of different back-ends. IMPORTANT: the reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.

an evening launch at WORM with presentation of the reader, back-ends,

results, etc.

Sessions

09.01.2018 Intro to the 'Feminist Reader' pad
10.01.2018 The Alphabet as Software - R&W
15.01.2018 Optical character recognition - pad
17.01.2018 Python and N+7 - pad
18.01.2018 Publishing Pipelines - OuLi/NuPo, natural language processing, materiality (pad)
30.01.2018 Publishing Pipelines - methodologies, considerations & examples + a short pdf field study @ pirate libraries (pad)
29.01.2018 Sonification - pad
30.01.2018 Publishing Pipelines - inter-annotator (dis)agreements, counting techniques & training a binary classifier (pad)

Independent Research

week 2

Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. Andre Castro (talk) 20:01, 15 January 2018 (CET)

Zalan: page segmentation; image detection
Alex: Web ocr implementation: WebOCR
Angeliki + Alice: training Tesseract to detect new font
- Install_Tesseract_4.0-Ubuntu
- Alice image training (incomplete)
Tash: retraining Tesseract to misinterpret text (hidding messages in plain-sight - Steganography ) here
Joca: image manipulation to optimize OCR

week 5

Raw data sonification/visualization research: a recipe, a work (with documentation on how you got to the result), a survey of tools, a workflow, or whatever you feel curious about, concerning that topic.

week 8 (feb 20 2018)

Natasha

First scan, OCR results and take the 50 most common words, and this should then somehow affects what happens in next scan.

Where do you store things like a text file,
How does it get exported / postprocessed
When does OCR occur?
What hooks can we use?

Alice

Desire to go back to print, maybe a poster. Select words based on their length (example of OuLiPo, Carl Andre)

How to extract patterns from words
How to produce new outputs (build a poster)
What "leftovers" exist in the processing of the images?
Word searches

Zalan

Needs to find a rule to constraining. Reading in a non-linear way

> Borges ... L

OCR Materializing text in 3D ... blender. 360 projection ... spatialize?

Alex

Scanned data as chatbot?
Database back to narrative ...
Integrating with ReportLab/Platypus generated layouts
Including code in results

Links/References/Reading List

The Archivist — DIY Book Scanner http://diybookscanner.org/archivist/indexee7f.html

[1]

Archivist Book Scanner (looks more like our parts)

includes 2 Canon Power shot running Canon Hack Development Kit

building notes

Software: pad
- Uploaded the preinstalled .img to the pzi-server (temporarily): scannerpi.img.xz (decompress with xz, and partition maybe needs to be resized)
- Installed new image on PI: https://github.com/Tenrec-Builders/pi-scan
- Bookscanner makefile
Hardware:

Scanning software

post-processing:
- scan-tailor: interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.

links

Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14 (1/2). https://doi.org/10.1045/january2008-klijn.

DIY Book Scanner forum: Hardware & building

@@ Line 111: / Line 111: @@
 * Hardware:
+=Scanning software=
+* post-processing:
+** [https://github.com/scantailor/scantailor/wiki scan-tailor]:  interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
+**
+=links=
 Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14
@@ Line 120: / Line 123: @@
 DIY Book Scanner forum: [https://forum.diybookscanner.org/viewforum.php?f=12 Hardware & building]
-=Planning Overview=
-==weeks 2,3,4, 5: January==
-* @Steve and Delphine: develop the reader  = select texts in Jan and edit and work on form in Feb. documentation on the wiki steve writing and notation
-* @Andre: OCR
-* @Michael
-* @ Aymeric
-* @ Cristina&amp;Manetta
-==weeks 6,7,8,9: February==
-* week 6: delivery of the scanner parts
-* weeks 7, 8: @Aymeric + Frederic/ Worm: scanner assemblage
-==Weeks 10,11,12,13: March==
-* week 11,12:  2nd and 3rd week of March
-** testing  scanner in
-*  9-17 March barcode Dj at Worm http://kubriel.servus.at/index.php?s=barcodedjsfrom Hungarian artists group that produce bar code DJ sets ... could they be invited to perform with the bookscanner.
-*  15-16 March Algoliterary Encounters at VARIA with WS on the 16th (could be a good moment to beta the platform / test present in public)
-==week 13: 4th week of March==
-* 28 March (wed) - Launch
-* 29 March (thu) - Assessment
 [[Category: XPUB]]
 [[Category: Special Issue]]

OuNuPo: Difference between revisions