Tesseract: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
Line 4: Line 4:


NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).
NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).
    sudo apt install tesseract-eng


For instance for converting Dutch language documents:
For instance for converting Dutch language documents:

Latest revision as of 19:12, 7 January 2020

Free software for OCR (optical character recognition). See https://github.com/tesseract-ocr .

   sudo apt install tesseract-ocr 

NB you need to install the language specific package for whatever language text you are trying to convert. Tesseract languages are identified by three letter codes like "eng" (for english, the default).

   sudo apt install tesseract-eng

For instance for converting Dutch language documents:

   sudo apt install tesseract-ocr-nld

Good Tutorial