Tesseract: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "Free software for OCR (optical character recognition). See https://github.com/tesseract-ocr . apt install tesseract-ocr NB you need to install the language specific pa...")
 
No edit summary
Line 1: Line 1:
Free software for OCR (optical character recognition). See https://github.com/tesseract-ocr .
Free software for OCR (optical character recognition). See https://github.com/tesseract-ocr .


    sudo apt install tesseract-ocr


    apt install tesseract-ocr
NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).


NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).
For instance for converting Dutch language documents:
 
    sudo apt install tesseract-ocr-nld


[https://guides.library.illinois.edu/c.php?g=347520&p=4121426 Good Tutorial]
[https://guides.library.illinois.edu/c.php?g=347520&p=4121426 Good Tutorial]

Revision as of 14:05, 11 June 2019

Free software for OCR (optical character recognition). See https://github.com/tesseract-ocr .

   sudo apt install tesseract-ocr 

NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).

For instance for converting Dutch language documents:

   sudo apt install tesseract-ocr-nld

Good Tutorial