OuNuPo: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
Line 118: Line 118:
** [https://github.com/scantailor/scantailor/wiki scan-tailor]:  interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
** [https://github.com/scantailor/scantailor/wiki scan-tailor]:  interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
** [https://github.com/Flameeyes/unpaper/ Unpaper] Unpaper is able to deskew scanned pages and optionally combine single pages onto spreads. It uses .ppm .pbm and .pnm files as input and will output the same formats.
** [https://github.com/Flameeyes/unpaper/ Unpaper] Unpaper is able to deskew scanned pages and optionally combine single pages onto spreads. It uses .ppm .pbm and .pnm files as input and will output the same formats.
** [http://www.fmwconcepts.com/imagemagick/textcleaner/index.php
** [http://www.fmwconcepts.com/imagemagick/textcleaner/index.php Textcleaner script] is an imagemagick script for cropping, grayscaling and de-noising of images. I don't need the full script, but maybe I can reconstruct it and make a script that just does the grayscaling and denoising.
Textcleaner script] is an imagemagick script for cropping, grayscaling and de-noising of images. I don't need the full script, but maybe I can reconstruct it and make a script that just does the grayscaling and denoising.


* All in one
* All in one
** <s>[[Spreads]]</s>: Spreads is a scanner workflow tool for book scanning. It's written in python and has a command-line and web-based interface.  
** <s>[[Spreads]]</s>: Spreads is a scanner workflow tool for book scanning. It's written in python and has a command-line and web-based interface.


=links=
=links=

Revision as of 19:10, 22 February 2018

Special Issue 5: OuNuPo, Ouvroir de Numérisation Potentielle

Centmillemilliardsdepoemes.jpg

Project Description

Main partner: WORM (WORM's Pirate Bay to be precise)

Special guests: Manetta Berends and Cristina Cochior (Algolit group)


The outcome of the special issue will be the following things:

  • 2 book scanners (one to stay at XPUB, one to stay at WORM)
  • one unique (as in unique copy) reader in the form of an artist's book.

The reader will be a collection of texts curated by students and staff on the topic of book scanner, text mobility, constraint writing, algorithmic literature, and I also hope the culture and politics of OCR, text analysis, AI in the context of text processing and generation.

  • a collection of different software back-ends for the book scanner, so

as to reconfigure its functionality, ie you might scan a book, and get a pdf, the content of which has been manipulated in poetical or critical way, or you could also get something else entirely, a sound file, an collection of images, etc.

  • a gigantic collection of files produced by combining the reader as

input source into the scanner making use of the plethora of different back-ends. IMPORTANT: the reader will never exist as a typical digital alter-ego of the analog original but only through a multitude of different digital interpretations.

  • an evening launch at WORM with presentation of the reader, back-ends,

results, etc.

Sessions

Independent Research

week 2

Look into your assigned topic, try, test, and on a wiki page write a recipe/report/tutorial based on your experiments and research. Andre Castro (talk) 20:01, 15 January 2018 (CET)

week 5

Raw data sonification/visualization research: a recipe, a work (with documentation on how you got to the result), a survey of tools, a workflow, or whatever you feel curious about, concerning that topic.

week 8 (feb 20 2018)

Natasha

First scan, OCR results and take the 50 most common words, and this should then somehow affects what happens in next scan.

  • Where do you store things like a text file,
  • How does it get exported / postprocessed
  • When does OCR occur?
  • What hooks can we use?

Alice

Desire to go back to print, maybe a poster. Select words based on their length (example of OuLiPo, Carl Andre)

  • How to extract patterns from words
  • How to produce new outputs (build a poster)
  • What "leftovers" exist in the processing of the images?
  • Word searches

Zalan

Needs to find a rule to constraining. Reading in a non-linear way

> Borges ... L

OCR Materializing text in 3D ... blender. 360 projection ... spatialize?

Alex

  • Scanned data as chatbot?
  • Database back to narrative ...
  • Integrating with ReportLab/Platypus generated layouts
  • Including code in results


Scanner Building notes

Software

  • Camera firmware:
  • Image acquisition:
    • Pi Scan -RaspberryPi read-only image. It tries to take the best

possible shots and save them to an external SD Card, restart the camera if crashed, and resuming work if done in several sessions.

    • gphoto2: to control cameras
    • CHDKPTP: to control of cameras with CHDK firmware via USB
  • post-processing:
    • scan-tailor: interactive post-processing (GUI) tool for scanned pages:page splitting, deskewing, adding/removing borders, selecting content.
    • Unpaper Unpaper is able to deskew scanned pages and optionally combine single pages onto spreads. It uses .ppm .pbm and .pnm files as input and will output the same formats.
    • Textcleaner script is an imagemagick script for cropping, grayscaling and de-noising of images. I don't need the full script, but maybe I can reconstruct it and make a script that just does the grayscaling and denoising.
  • All in one
    • Spreads: Spreads is a scanner workflow tool for book scanning. It's written in python and has a command-line and web-based interface.

links

Links/References/Reading List

The Archivist — DIY Book Scanner http://diybookscanner.org/archivist/indexee7f.html

DIY Book Scanner forum: Hardware & building

[1]

Archivist Book Scanner (looks more like our parts)


Klijn, Edwin. 2008. ‘The Current State-of-Art in Newspaper Digitization: A Market Perspective’. D-Lib Magazine 14 (1/2). https://doi.org/10.1045/january2008-klijn.