User:Artemis gryllaki/PrototypingIII: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
(30 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Publishing an “image gallery”==
==Publishing an “image gallery”==
Imagemagick’s suite of tools includes montage which is quite flexible and useful for making a quick overview page of image.
'''Imagemagick'''’s suite of tools includes montage which is quite flexible and useful for making a quick overview page of image.


* mogrify
* mogrify
Line 8: Line 8:
Warning: MOGRIFY MODIES THE IMAGES – ERASING THE ORIGINAL – make a copy of the images before you do this!!!<br><br>
Warning: MOGRIFY MODIES THE IMAGES – ERASING THE ORIGINAL – make a copy of the images before you do this!!!<br><br>


mogrify -resize 1024x *.JPG <br>
==Montages==
Fixing the orientation of images<br>
[[File:montage-try.png | 500px | thumbnail | left | Workshop Images|link=]]
mogrify -auto-orient *.JPG<br>
[[File:montage-try3.png | 500px | thumbnail | center | Workshop Images|link=]]
Using Montage<br>
<br clear=all>
montage -label "%f" *.JPG \<br>
 
    -shadow \<br>
'''poster.py!'''
    -geometry 1000x1000+100+100 \<br>
<source lang="python">
    montage.caption.jpg<br>
#!/usr/bin/env python3
Using pdftk to put things together<br><br>
 
import os, datetime, sys
from argparse import ArgumentParser
from glob import glob


import os
os.system('imagemagick-converting-command filein fileout')
from PIL import Image
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A0


== OCR ==


'''simple tesseract:'''<br>
# p = ArgumentParser("")
tesseract nameofpicture.png outputbase<br><br>
# p.add_argument("--output", default="poster.pdf")
# p.add_argument("--interpolation", default="cubic", help="nearest,cubic")
# p.add_argument("--labels", default="labels_public.txt")
# args = p.parse_args()


[[File:scan_source.png | 600px | thumbnail | left | Scan of a book page]]
<br clear=all>
[[File:ocr_output.png | 600px | thumbnail | left | Output: recognition of the characters with tesseract-ocr and styled with javascript ]]
<br clear=all>


https://github.com/tesseract-ocr/tesseract
pagewidth, pageheight = A0


See pad: https://pad.xpub.nl/p/IFL_2018-05-14
c = canvas.Canvas("reportlab_image_poster2.pdf", pagesize=A0)
x, y = 0, 0
imagewidth = 200
imageheight = 300
aw = pagewidth - imagewidth
ah = pageheight - imageheight
images = (glob ("images/*.JPG"))
dx = aw/(len(images)-1)
dy = ah/(len(images)-1)


for image in images:
    print ("Adding an image to the PDF")
    print (image)
    im = Image.open(image)
    pxwidth, pxheight = im.size
    print ("Got the image, it's size is:", im.size)
    imageheight = imagewidth * (pxheight / pxwidth)
    c.drawInlineImage(image, x, y, imagewidth, imageheight)
    print ("placing image {0} at {1}".format(image, (x,y)))
    x += dx
    y += dy


== Prototyping ==
c.showPage()
=== Image classifier for annotations ===
c.save()
sys.exit(0)


At the time of this special issue, a point of interest for everyone was annotations. We were reading and annotating texts together and debating the possibilities of sharing these notes. One particular discussion was about what could/should be considered as annotation: folding corners of pages, linking to other contents, highlighting, scribbling, drawing. I was curious if we could train a computer to see all of these traces, so I started prototyping some examples.


Aim: make the computer recognize "clean" pages of books or "annotated" pages of books.
#################
# GRID
# imsize = 96
# cols = int(A0[0] // imsize)
# rows = int(A0[1] // imsize)
# # calculate margins to center the grid on the page
# mx = (A0[0] - (cols*imsize)) / 2
# my = (A0[1] - (rows*imsize)) / 2
# print ("Grid size {0}x{1} (cols x rows)".format(cols, rows))
# print ("  (total size:", cols*imsize, rows*imsize, "margins:", mx, my, ")")
#################


Using the script from [[.py.rate.chnic_sessions#29.10.2018:_Zalan_.26_Alex|.py.rate.chnic session 2]], [https://pad.xpub.nl/p/pyrate2| pad notes here], and [https://git.xpub.nl/aaaa/learning_algorithms/src/branch/master/ImageClassificationPython| Alex's git here]. My data set [https://git.xpub.nl/rita/image_classifier_annotation here].
# for l in range(7):
#    print (LABELS[l])
#    col = 0
#    row = 0
#    with open(args.labels) as f:
#        f.readline()
#         for line in f:
#            path, label = line.split(",")
#            label = int(label)
#            if label == l:
#                image = Image.open(path)
#                print (image.size)
               
#                x = mx + (col*imsize)
#                y = my + imsize + (7-l)*(4*imsize) - ((row+1)*imsize)


[[File:annotated_eg.png | 600px | thumbnail | left | "Annotated" example from data set > test set]]
#                c.drawInlineImage(image, x, y, width=imsize, height=imsize)
<br clear=all>
#                col += 1
#                if col >= cols:
#                    col = 0
#                    row +=1
#                if row >= 3:
#                    break


[[File:clean_eg.png | 600px | thumbnail | left | "Clean" example from data set > test set]]
# c.showPage()
<br clear=all>
# c.save()
</source>


Each set (test and training) had 50 examples of "clean" pages and "annotated" pages, it makes sense to add more in the future.<br>
== OCR | Optical character recognition with Tesseract==
The results were not very accurate. Pages with hand-written text gave better results while highlighting and computer notes were often misinterpreted.
It’s useful to try to see what the computer is looking for, understand if the script is breaking the image in parts, and try other scripts.


Some results:
In command line: tesseract nameofpicture.png outputbase<br><br>
<gallery | widths=200px heights=200px>
test10.jpg.predicted.png|Right prediction
test5.jpg.predicted.png |Right prediction
test2.jpg.predicted.png|Wrong prediction
test6.jpg.predicted.png|Wrong prediction
</gallery>


=== Computer categorization for text files ===
[[File:scan_source.png | 500px | thumbnail | left | Scanning a book page|link=]]
[[File:ocr_output.png | 500px | thumbnail | center | Output: character recognition with tesseract-ocr / styled with javascript |link=]]<br clear=all>
[[File:Screenshot2.png | 500px | thumbnail | left |link=]]
[[File:Screenshot3.png | 500px | thumbnail | center |link=]]<br clear=all>


The actions of categorizing and cataloging happen in the most mundane activities, but they are not innocent. They translate values and certain visions of the world.<br>
== Blurry Boundaries Workshop ==
In the Rietveld Academy Library, we saw how the librarians are challenging the Library of Congress classification. With Dušan we browsed in the Monoskop Index, an interesting combination of a “book index, library catalog, and tag cloud”.<br>
With this script, I was experimenting with an automated classification of text files. The script searches for the three most common words in the text and tries to match these words to a category. For example, if one of the most common words is “books” the category of the text is considered “Library Studies”. The same would happen with the word “archives”, “author”, “bibliographic”, “bibliotheca”, “book”, “bookcase”, etc. The script only has one category right now, but it would be easy to add more. By doing so, I would be making associations that are very personal, sometimes inaccurate, and I would be creating a bias in the catalog.


[[File:Common words.png|600px |thumbnail|left| Testing it with Balázs Bodó's text, Own Nothing ]]
[[File:blurry1.png | 400px | thumbnail | left |Scanning a book page|link=]]
<br clear=all>
[[File:blurry2.png | 400px | thumbnail | left |OCR-ing the book page|link=]]
[[File:blurry3.png | 800px | thumbnail | center |My hidden labour|link=]]

Latest revision as of 22:36, 8 July 2019

Publishing an “image gallery”

Imagemagick’s suite of tools includes montage which is quite flexible and useful for making a quick overview page of image.

  • mogrify
  • identify
  • convert
  • Sizing down a bunch of images

Warning: MOGRIFY MODIES THE IMAGES – ERASING THE ORIGINAL – make a copy of the images before you do this!!!

Montages

Workshop Images
Workshop Images


poster.py!

#!/usr/bin/env python3

import os, datetime, sys
from argparse import ArgumentParser
from glob import glob

import os
os.system('imagemagick-converting-command filein fileout')
from PIL import Image
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A0


# p = ArgumentParser("")
# p.add_argument("--output", default="poster.pdf")
# p.add_argument("--interpolation", default="cubic", help="nearest,cubic")
# p.add_argument("--labels", default="labels_public.txt")
# args = p.parse_args()


pagewidth, pageheight = A0

c = canvas.Canvas("reportlab_image_poster2.pdf", pagesize=A0)
x, y = 0, 0
imagewidth = 200
imageheight = 300
aw = pagewidth - imagewidth
ah = pageheight - imageheight
images = (glob ("images/*.JPG"))
dx = aw/(len(images)-1)
dy = ah/(len(images)-1)

for image in images:
    print ("Adding an image to the PDF")
    print (image)
    im = Image.open(image)
    pxwidth, pxheight = im.size
    print ("Got the image, it's size is:", im.size)
    imageheight = imagewidth * (pxheight / pxwidth)
    c.drawInlineImage(image, x, y, imagewidth, imageheight)
    print ("placing image {0} at {1}".format(image, (x,y)))
    x += dx
    y += dy

c.showPage()
c.save()
sys.exit(0)


#################
# GRID
# imsize = 96
# cols = int(A0[0] // imsize)
# rows = int(A0[1] // imsize)
# # calculate margins to center the grid on the page
# mx = (A0[0] - (cols*imsize)) / 2
# my = (A0[1] - (rows*imsize)) / 2
# print ("Grid size {0}x{1} (cols x rows)".format(cols, rows))
# print ("  (total size:", cols*imsize, rows*imsize, "margins:", mx, my, ")")
#################

# for l in range(7):
#     print (LABELS[l])
#     col = 0
#     row = 0
#     with open(args.labels) as f:
#         f.readline()
#         for line in f:
#             path, label = line.split(",")
#             label = int(label)
#             if label == l:
#                 image = Image.open(path)
#                 print (image.size)
                
#                 x = mx + (col*imsize)
#                 y = my + imsize + (7-l)*(4*imsize) - ((row+1)*imsize)

#                 c.drawInlineImage(image, x, y, width=imsize, height=imsize)
#                 col += 1
#                 if col >= cols:
#                     col = 0
#                     row +=1
#                 if row >= 3:
#                     break

# c.showPage()
# c.save()

OCR | Optical character recognition with Tesseract

In command line: tesseract nameofpicture.png outputbase

Scanning a book page
Output: character recognition with tesseract-ocr / styled with javascript


Screenshot2.png
Screenshot3.png


Blurry Boundaries Workshop

Scanning a book page
OCR-ing the book page
My hidden labour