Install Tesseract 4.0-Ubuntu: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "=== 1. Installing tesseract 4.0 with training. === Make a folder where you install all these and go there Follow the instructions: https://github.com/tesseract-ocr/tesseract...")
 
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=== 1. Installing tesseract 4.0 with training. ===  
=== 1. Installing tesseract 4.0 with training. ===  
Make a folder where you install all these and go there
Make a folder called Tesseract4 where you will install everything. Get in there. Follow the instructions on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux until you reach the leptonica step.


Follow the instructions:
For leptonica:
https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux


Leptonica:
In the folder Tesseract4, git clone the following:
2nd step:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically:
<source lang=bash>
git clone https://github.com/danbloomberg/leptonica
</source>
 
Then go to the leptonica [https://tpgit.github.io/UnOfficialLeptDocs/leptonica/index.html documentation], according to your OS.
Follow step 2:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically:


./autobuild and then ./configure
./autobuild and then ./configure




After Leptonica:
After this step, follow the [https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation instructions] with sudo, making sure you are in the Tesseract4 folder. The executable tesseract should now be located in /usr/local/bin.
https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation
with Sudo and make sure you are in the tesseract folder
 
 
Now the executable tesseract is inb ls /usr/local/bin




=== 2. Building the training tools: ===
=== 2. Building the training tools: ===
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools
(only make training
(only make training, sudo make...)
sudo make...)


they are in /usr/local/bin/ (check if you have the files there)
they are in /usr/local/bin/ (check if you have the files there)


=== 3. TrainingTesseract 4.00 ===
=== 3. TrainingTesseract 4.00 ===
create a folder for the training (next to tesseract for example)
Create a folder for the training.
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start
 
Download the language files you will use from [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here] in the tessdata folder.
 
To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font):
 
<source lang=bash>
training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only  \
--noextract_font_properties --langdata_dir ../langdata  --tessdata_dir ./tessdata --output_dir ./testoutput/
</source>

Latest revision as of 21:14, 28 January 2018

1. Installing tesseract 4.0 with training.

Make a folder called Tesseract4 where you will install everything. Get in there. Follow the instructions on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux until you reach the leptonica step.

For leptonica:

In the folder Tesseract4, git clone the following:

git clone https://github.com/danbloomberg/leptonica

Then go to the leptonica documentation, according to your OS. Follow step 2:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically:

./autobuild and then ./configure


After this step, follow the instructions with sudo, making sure you are in the Tesseract4 folder. The executable tesseract should now be located in /usr/local/bin.


2. Building the training tools:

https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools (only make training, sudo make...)

they are in /usr/local/bin/ (check if you have the files there)

3. TrainingTesseract 4.00

Create a folder for the training. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start

Download the language files you will use from here in the tessdata folder.

To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font):

training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only  \
--noextract_font_properties --langdata_dir ../langdata   --tessdata_dir ./tessdata --output_dir ./testoutput/