Install Tesseract 4.0-Ubuntu: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
 
(One intermediate revision by the same user not shown)
Line 29: Line 29:


Download the language files you will use from [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here] in the tessdata folder.
Download the language files you will use from [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here] in the tessdata folder.
To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font):
<source lang=bash>
training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only  \
--noextract_font_properties --langdata_dir ../langdata  --tessdata_dir ./tessdata --output_dir ./testoutput/
</source>

Latest revision as of 21:14, 28 January 2018

1. Installing tesseract 4.0 with training.

Make a folder called Tesseract4 where you will install everything. Get in there. Follow the instructions on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux until you reach the leptonica step.

For leptonica:

In the folder Tesseract4, git clone the following:

git clone https://github.com/danbloomberg/leptonica

Then go to the leptonica documentation, according to your OS. Follow step 2:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically:

./autobuild and then ./configure


After this step, follow the instructions with sudo, making sure you are in the Tesseract4 folder. The executable tesseract should now be located in /usr/local/bin.


2. Building the training tools:

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools (only make training, sudo make...)

they are in /usr/local/bin/ (check if you have the files there)

3. TrainingTesseract 4.00

Create a folder for the training. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start

Download the language files you will use from here in the tessdata folder.

To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font):

training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only  \
--noextract_font_properties --langdata_dir ../langdata   --tessdata_dir ./tessdata --output_dir ./testoutput/