User:Pedro Sá Couto/Prototyping 5th/Text Launderette Scripts: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 8: Line 8:


This set of scripts was written for the Text Laundrette workshop. The workshop takes place in the Publication Station, WDkA building.<br> Rotterdam, 03-02-2020<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.<br>
This set of scripts was written for the Text Laundrette workshop. The workshop takes place in the Publication Station, WDkA building.<br> Rotterdam, 03-02-2020<br>It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.<br>
<br>
## About the Workshop
<em>DESCRIPTION</em>
<p>We will use a home-made, DIY book scanner, and open-source software to scan, process, and add digital features to printed texts brought by the participants to the workshop. Ultimately, we will include them in the “bootleg library”, a shadow library accessible over a local network.</p>
<p>Shadow libraries operate outside of legal copyright frameworks, in response to decreased open access to knowledge. This workshop aims to extend our research on libraries, their sociability, and methods by which we can add provenance to texts included in public or private, legal or extra-legal collections.</p>
<p>Participants should bring: a printed text, which they’d like to digitize and share.</p>
<br><br>
<br><br>
##Dependencies
##Dependencies

Revision as of 12:49, 28 January 2020

Scripts

From the git

https://git.xpub.nl/pedrosaclout/Text_Launderette_Scripts

DIY Book Scanner Workflow

    1. Getting started

This set of scripts was written for the Text Laundrette workshop. The workshop takes place in the Publication Station, WDkA building.
Rotterdam, 03-02-2020
It is a workflow to turn the pictures from the DIY Book Scanner into a final OCRed PDF.


    1. Dependencies
      1. Brew (MAC) or apt-get (LINUX)

You’ll need the command-line tools for Xcode installed.

```bash xcode-select --install ```

After install Homebrew.

```bash ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" ```

Run the following command once you’re done to ensure Homebrew is installed and working properly:

```bash brew doctor ```

```bash sudo apt-get install python3 python3-pip imagemagick poppler pdfunite ```

```bash brew install python3 python3-pip imagemagick poppler pdfunite ```

      1. PIP3

sudo pip3 install pdf2image Pillow time logging opencv-python pytesseract


    1. How to use

Add your pictures from the book scanner to the folder "/scans"

Make all the files executable.

```bash sudo chmod 777 merge_scans.sh workshop_stream.sh marge_files.sh ```

In case you want to skip any of the scripts just comment out in the shell code, workshop_stream.sh.

Run ./workshop_stream.sh


Wait :)



    1. Aditional information

The workflow follows these scripts, by successive order:

      1. Create 5 directories

```bash mkdir split mkdir rotated mkdir ocred mkdir bounding_box mkdir cropped ```

      1. Merge the files in the directory scans

All the scans will be appended to one pdf called out.pdf

```bash ./merge_scans.sh ```

      1. Burst the pdf in scans

Burst this pdf, renaming all the files so they can be iterated later.

```bash python3 burstpdf.py ```

      1. Rotate the pdfs

The book scanner takes pictures of the pdfs, this scrip iterates through the odd and even pages rotating them to their original position.

```bash python3 rotation.py ```

      1. Cropping the bounding boxes

The pages are now in their original position, but they have a bounding box. This script iterates through them and crops the highest contrast area found.

```bash python3 bounding_box.py ```

      1. Cropping the mirror

The pages are now cropped, but the mirror is still visible in the middle.

```bash python3 mirror_crop.py ```

      1. OCR

In this part we OCR the jpg, turning these into PDFs.

```bash python3 tesseract_ocr.py ```

      1. Merge all the files and create the pdf

The OCRed pages are now joined into their final PDF, your book is ready :)

```bash ./merge_files.sh ```

    1. License

The package is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).

About Text Launderette

TITLE

XPUB workshops – Text Launderette

STATION

Publication Station

LOCATION

BL.00.4

TUTORS

Simon Browne & Pedro Sá Couto

DESCRIPTION

We will use a home-made, DIY book scanner, and open-source software to scan, process, and add digital features to printed texts brought by the participants to the workshop. Ultimately, we will include them in the “bootleg library”, a shadow library accessible over a local network.
Shadow libraries operate outside of legal copyright frameworks, in response to decreased open access to knowledge. This workshop aims to extend our research on libraries, their sociability, and methods by which we can add provenance to texts included in public or private, legal or extra-legal collections.
Participants should bring: a printed text, which they’d like to digitize and share.

PRACTICAL INFORMATION

Under the name of .py.rate.chnic sessions, the second-year students from the Experimental Publishing Master program invite you to participate in a series of hands-on workshops, related to the topics of their graduation projects. Each workshop offers the participants an opportunity to engage with the students’ research by partaking in their processes, experiments, and discussions.

MINIMAL ENROLMENT

5

MAXIMUM ENROLMENT

15

NR OF SESSIONS

1