Latest revision as of 21:48, 29 October 2019

Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle

Project Description

Main partner: WORM (WORM's Pirate Bay to be precise)

Special guests: Manetta Berends and Cristina Cochior (Algolit group)

Launch date: 28 March (wed)

Assessment date: 29 March (thu)

The outcome of the special issue will be the following things:

2 book scanners (one to stay at XPUB, one to stay at WORM)
one unique (as in unique copy) reader in the form of an artist's book.

The reader will be a collection of texts curated by students and staff on the topic of book scanner, text mobility, constraint writing, algorithmic literature, and I also hope the culture and politics of OCR, text analysis, AI in the context of text processing and generation.

a collection of different software back-ends for the book scanner, so

as to reconfigure its functionality, ie you might scan a book, and get a pdf, the content of which has been manipulated in poetical or critical way, or you could also get something else entirely, a sound file, an collection of images, etc.

a gigantic collection of files produced by combining the reader as

input source into the scanner making use of the plethora of different back-ends. IMPORTANT: the reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.

an evening launch at WORM with presentation of the reader, back-ends,

results, etc.

Sessions

09.01.2018 Intro to the 'Feminist Reader' pad
10.01.2018 The Alphabet as Software - R&W
15.01.2018 Optical character recognition - pad
17.01.2018 Python and N+7 - pad
18.01.2018 Publishing Pipelines - OuLi/NuPo, natural language processing, materiality (pad)
30.01.2018 Publishing Pipelines - methodologies, considerations & examples + a short pdf field study @ pirate libraries (pad)
29.01.2018 Sonification - pad
30.01.2018 Publishing Pipelines - inter-annotator (dis)agreements, counting techniques & training a binary classifier (pad)
05.03.2018 OuNuPo Publishing Experiments
15.03.2018 - VARIA - Presentation OuNuPo beta at Varia's Algolit
23.03.2018 - Presentation Pad / Makefile
28.03.2018 - WORM - final presentation
29.03.2018 - assessments

Independent Research

week 2

Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. Andre Castro (talk) 20:01, 15 January 2018 (CET)

Zalan: page segmentation; image detection
Alex: Web ocr implementation: WebOCR
Angeliki + Alice: training Tesseract to detect new font
- Install_Tesseract_4.0-Ubuntu
- Alice image training (incomplete)
Tash: retraining Tesseract to misinterpret text (hidding messages in plain-sight - Steganography ) here
Joca: image manipulation to optimize OCR

week 5

Raw data sonification/visualization research: a recipe, a work (with documentation on how you got to the result), a survey of tools, a workflow, or whatever you feel curious about, concerning that topic.

week 8 (feb 20 2018)

Natasha

First scan, OCR results and take the 50 most common words, and this should then somehow affects what happens in next scan.

Where do you store things like a text file,
How does it get exported / postprocessed
When does OCR occur?
What hooks can we use?

Alice

Desire to go back to print, maybe a poster. Select words based on their length (example of OuLiPo, Carl Andre)

How to extract patterns from words
How to produce new outputs (build a poster)
What "leftovers" exist in the processing of the images?
Word searches

Zalan

Needs to find a rule to constraining. Reading in a non-linear way

> Borges ... L

OCR Materializing text in 3D ... blender. 360 projection ... spatialize?

Alex

Scanned data as chatbot?
Database back to narrative ...
Integrating with ReportLab/Platypus generated layouts
Including code in results

The Feminist Reader

The Feminist Reader is a compilation of 6 books on scanning cultures, edited, designed and produced by the XPUB students. Starting from a set of questions and using critical and feminist research methodologies, each publication constitutes a chapter of the Reader; What culture do we reproduce when we scan books? How do we share and reproduce knowledge online? Who has access and who is excluded in the narrative and the Canon? Who is the librarian? Are data sets biased? Does software have a gender? does it have a voice? How do weaving and programming connect historically? Building from a collective knowledge database online, each Reader is a compilation of 5 to 10 annotated texts addressing a specific question. The 6 books are wrapped in an Ouvroir, scarf embroidered automatically, and folded according to the Japanese art of Furoshiki. The Feminist Reader project was initiated by Delphine Bedel for the XPUB Special Issue 5. For the second par tof the project, after building two DIY book scanners with Worm in Rotterdam, the students and tutors further expanded the Feminist Reader and their research is a series of Python scripts and algorithms to scan books. The Reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.

Scanner Building notes

Software: pad
- Uploaded the preinstalled .img to the pzi-server (temporarily): scannerpi.img.xz (decompress with xz, and partition maybe needs to be resized)
- Installed new image on PI: https://github.com/Tenrec-Builders/pi-scan
- Bookscanner makefile
Hardware:

Software

Camera firmware:
- CHDK

Image acquisition:
- Pi Scan -RaspberryPi read-only image. It tries to take the best

possible shots and save them to an external SD Card, restart the camera if crashed, and resuming work if done in several sessions.

- ~~gphoto2~~: to control cameras
- CHDKPTP: to control of cameras with CHDK firmware via USB

Post-processing:
- scan-tailor: interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
- Unpaper Unpaper is able to deskew scanned pages and optionally combine single pages onto spreads. It uses .ppm .pbm and .pnm files as input and will output the same formats.
- Textcleaner script is an imagemagick script for cropping, grayscaling and de-noising of images. I don't need the full script, but maybe I can reconstruct it and make a script that just does the grayscaling and denoising.

All-in-one
- ~~Spreads~~: Spreads is a scanner workflow tool for book scanning. It's written in python and has a command-line and web-based interface.

Makefile

links

Pad from ERG Bookscan session: details on updating CHDK firmware
Notes on CHDK
CHDK firmware

Links/References/Reading List

The Archivist — DIY Book Scanner http://diybookscanner.org/archivist/indexee7f.html

DIY Book Scanner forum: Hardware & building

[1]

Archivist Book Scanner (looks more like our parts)

Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14 (1/2). https://doi.org/10.1045/january2008-klijn.
pi scan installation

Media

Varia presentation photos: https://vvvvvvaria.org/archive/2018-03-16.17-Algologs/
Varia presentation audio: http://pzwart1.wdka.hro.nl/~amansoux/SP5%20Beta%20Launch%2048.ogg
WORM presentation audio: http://pzwart1.wdka.hro.nl/~acastro/SI5-recording-worm.mp3

@@ Line 1: / Line 1: @@
-Ouvroir de Numérisation Potentielle
+'''Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle
+'''
-=Schedule=
+[[File:Centmillemilliardsdepoemes.jpg]]
-==weeks 2,3,4, 5: January==
+=Project Description=
+Main partner: WORM (WORM's Pirate Bay to be precise)
-* @Steve and Delphine: develop the reader  = select texts in Jan and edit and work on form in Feb. documentation on the wiki steve writing and notation
+Special guests: Manetta Berends and Cristina Cochior (Algolit group)
-* @Andre
-* @Michael
-* @ Aymeric
-* @ Cristina&amp;Manetta
-==weeks 6,7,8,9: February==
+'''Launch date''': 28 March (wed)
-week 6
-* delivery of the scanner parts
-weeks 7, 8:
+'''Assessment date''': 29 March (thu)
-* @Aymeric + Frederic/ Worm: scanner assemblage
-==Weeks 10,11,12,13: March==
+The outcome of the special issue will be the following things:
-week 11,12:  2nd and 3rd week of March
+*2 book scanners (one to stay at XPUB, one to stay at WORM)
-* testing  scanner in
+* one unique (as in unique copy) reader in the form of an artist's book.
-*  9-17 March barcode Dj at Worm http://kubriel.servus.at/index.php?s=barcodedjsfrom Hungarian artists group that produce bar code DJ sets ... could they be invited to perform with the bookscanner.
+The reader will be a collection of texts curated by students and staff
-*  15-16 March Algoliterary Encounters at VARIA with WS on the 16th (could be a good moment to beta the platform / test present in public)
+on the topic of book scanner, text mobility, constraint writing,
+algorithmic literature, and I also hope the culture and politics of
+OCR, text analysis, AI in the context of text processing and generation.
-week 13: 4th week of March
+* a collection of different software back-ends for the book scanner, so
-* 28 March (wed) - Launch
+as to reconfigure its functionality, ie you might scan a book, and get
-* 29 March (thu) - Assessment
+a pdf, the content of which has been manipulated in poetical or critical
+way, or you could also get something else entirely, a sound file, an
+collection of images, etc.
+* a gigantic collection of files produced by combining the reader as
+input source into the scanner making use of the plethora of
+different back-ends. IMPORTANT: the reader will never exist as a typical
+digital alter-ego of the analog original but only through a multitude of
+different digital interpretations.
+* an evening launch at WORM with presentation of the reader, back-ends,
+results, etc.
+= Sessions =
+* 09.01.2018 Intro to the 'Feminist Reader' [https://pad.pzimediadesign.nl/p/special-issue-090118 pad]
+* 10.01.2018 [[Calendars:Networked_Media_Calendar/Networked_Media_Calendar/10-01-2018_-Event_1|The Alphabet as Software]] - R&W
+* 15.01.2018 [[Optical character recognition]] - [https://pad.pzimediadesign.nl/p/ounupo_15-01-2018 pad]
+* 17.01.2018 Python and N+7 - [https://pad.pzimediadesign.nl/p/prototyping-17-01-18 pad]
+* 18.01.2018 Publishing Pipelines - [https://pad.pzimediadesign.nl/p/OuNuPo-1 OuLi/NuPo, natural language processing, materiality (pad)]
+* 30.01.2018 Publishing Pipelines - [https://pad.pzimediadesign.nl/p/OuNuPo-2 methodologies, considerations & examples + a short pdf field study @ pirate libraries (pad)]
+* 29.01.2018 [[Sonification]] - [https://pad.pzimediadesign.nl/p/ounupo-29012018 pad]
+* 30.01.2018 Publishing Pipelines - [https://pad.pzimediadesign.nl/p/OuNuPo-3 inter-annotator (dis)agreements, counting techniques & training a binary classifier (pad)]
+* 05.03.2018 [[OuNuPo Publishing]] Experiments
+* '''15.03.2018 - VARIA  - Presentation OuNuPo beta at Varia's Algolit'''
+* 23.03.2018 - [https://pad.pzimediadesign.nl/p/OuNuPo-presentation Presentation Pad] / Makefile
+* '''28.03.2018''' - WORM - final presentation'''
+* 29.03.2018 - assessments
+=Independent Research=
+===week 2===
+Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. [[User:Andre Castro|Andre Castro]] ([[User talk:Andre Castro|talk]]) 20:01, 15 January 2018 (CET)
+* Zalan: page segmentation; image detection
+* Alex: Web ocr implementation: [[User:Alexander_Roidl/WebOCR | WebOCR]]
+* Angeliki + Alice: training Tesseract to detect new font
+** [[Install_Tesseract_4.0-Ubuntu]]
+** [[User:Alice/Code Exercises | Alice image training (incomplete)]]
+* Tash: retraining Tesseract to misinterpret text (hidding messages in plain-sight  - Steganography ) [[User:Tash/Prototyping_02#Independent_Research:_Retraining_Tesseract | here]]
+* Joca: [[User:Joca/tesseract-preprocessing | image manipulation to optimize OCR]]
+===week 5===
+Raw data sonification/visualization research: a recipe, a work (with documentation on how you got to the result), a survey of tools, a workflow, or whatever you feel curious about, concerning that topic.
+=== week 8 (feb 20 2018) ===
+==== Natasha ====
+First scan, OCR results and take the 50 most common words, and this should then somehow affects what happens in next scan.
+* Where do  you store things like a text file,
+* How does it get exported / postprocessed
+* When does OCR occur?
+* What hooks can we use?
+==== Alice ====
+Desire to go back to print, maybe a poster. Select words based on their length (example of OuLiPo, Carl Andre)
+* How to extract patterns from words
+* How to produce new outputs (build a poster)
+* What "leftovers" exist in the processing of the images?
+* Word searches
+==== Zalan ====
+Needs to find a rule to constraining.
+Reading in a non-linear way
+> Borges ... L
+OCR
+Materializing text in 3D ... blender.
+projection ... spatialize?
+==== Alex ====
+* Scanned data as chatbot?
+* Database back to narrative ...
+* Integrating with ReportLab/Platypus generated layouts
+* Including code in results
+----
+----
+= The Feminist Reader=
+The Feminist Reader is a compilation of 6 books on scanning cultures, edited, designed and produced by the XPUB students. Starting from a set of questions and using critical and feminist research methodologies, each publication constitutes a chapter of the Reader; What culture do we reproduce when we scan books? How do we share and reproduce knowledge online? Who has access and who is excluded in the narrative and the Canon? Who is the librarian? Are data sets biased? Does software have a gender? does it have a voice? How do weaving and programming connect historically? Building from a collective knowledge database online, each Reader is a compilation of 5 to 10 annotated texts addressing a specific question. The 6 books are wrapped in an Ouvroir, scarf embroidered automatically, and folded according to the Japanese art of Furoshiki. The Feminist Reader project was initiated by Delphine Bedel for the XPUB Special Issue 5. For the second par tof the project, after building two DIY book scanners with Worm in Rotterdam, the students and tutors further expanded the Feminist Reader and their research is a series of Python scripts and algorithms to scan books. The Reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.
+----
+=Scanner Building notes=
+* Software: [https://pad.pzimediadesign.nl/p/bookscanner-software pad]
+** Uploaded the preinstalled .img to the pzi-server (temporarily): [http://pzwart1.wdka.hro.nl/~aroidl/scannerpi.img.xz scannerpi.img.xz] (decompress with xz, and partition maybe needs to be resized)
+** Installed new image on PI: https://github.com/Tenrec-Builders/pi-scan
+** Bookscanner [[makefile]]
+* Hardware:
+=Software=
+* '''Camera firmware''':
+** [http://chdk.wikia.com/wiki/CHDK CHDK]
+* '''Image acquisition''':
+** [https://github.com/Tenrec-Builders/pi-scan Pi Scan] -RaspberryPi read-only image. It tries to take the best
+possible shots and save them to an external SD Card, restart the camera
+if crashed, and resuming work if done in several sessions.
+** <s>[http://gphoto.org/ gphoto2]</s>: to control cameras
+** [https://app.assembla.com/spaces/chdkptp/wiki CHDKPTP]: to control of cameras with CHDK firmware via USB
+* '''Post-processing''':
+** [https://github.com/scantailor/scantailor/wiki scan-tailor]:  interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
+** [https://github.com/Flameeyes/unpaper/ Unpaper] Unpaper is able to deskew scanned pages and optionally combine single pages onto spreads. It uses .ppm .pbm and .pnm files as input and will output the same formats.
+** [http://www.fmwconcepts.com/imagemagick/textcleaner/index.php Textcleaner script] is an imagemagick script for cropping, grayscaling and de-noising of images. I don't need the full script, but maybe I can reconstruct it and make a script that just does the grayscaling and denoising.
+* '''All-in-one'''
+** <s>[[Spreads]]</s>: Spreads is a scanner workflow tool for book scanning. It's written in python and has a command-line and web-based interface.
+[[Makefile]]
+=links=
+[https://annuel.framapad.org/p/ergbookscan Pad from ERG Bookscan session: details on updating CHDK firmware]<br>
+[https://pzwiki.wdka.nl/mediadesign/CHDK Notes on CHDK]<br>
+[http://mighty-hoernsche.de/ CHDK firmware]
+=Links/References/Reading List=
+The Archivist — DIY Book Scanner http://diybookscanner.org/archivist/indexee7f.html
+DIY Book Scanner forum: [https://forum.diybookscanner.org/viewforum.php?f=12 Hardware & building]
+[https://store.diybookscanner.org/collections/frontpage/products/archivist-quill-book-scanner-base-kit ]
+[http://tenrec.builders/archivist-guide.html Archivist Book Scanner (looks more like our parts)]
+Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14
+(1/2). https://doi.org/10.1045/january2008-klijn.
+<br />
+[https://vimeo.com/150385938 pi scan installation]
+=Media=
+* Varia presentation photos: https://vvvvvvaria.org/archive/2018-03-16.17-Algologs/
+* Varia presentation audio: http://pzwart1.wdka.hro.nl/~amansoux/SP5%20Beta%20Launch%2048.ogg
+* WORM presentation audio: http://pzwart1.wdka.hro.nl/~acastro/SI5-recording-worm.mp3
 [[Category: XPUB]]
 [[Category: Special Issue]]

OuNuPo: Difference between revisions