Install Tesseract 4.0-Ubuntu: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=== 1. Installing tesseract 4.0 with training. === | === 1. Installing tesseract 4.0 with training. === | ||
Make a folder where you install | Make a folder called Tesseract4 where you will install everything. Get in there. Follow the instructions on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux until you reach the leptonica step. | ||
For leptonica: | |||
In the folder Tesseract4, git clone the following: | |||
In the folder Tesseract4, git clone | |||
<source lang=bash> | <source lang=bash> | ||
git clone https://github.com/danbloomberg/leptonica | git clone https://github.com/danbloomberg/leptonica | ||
</source> | </source> | ||
Then go to the leptonica | |||
Then go to the leptonica [https://tpgit.github.io/UnOfficialLeptDocs/leptonica/index.html documentation], according to your OS. | |||
Follow step 2:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically: | |||
./autobuild and then ./configure | ./autobuild and then ./configure | ||
After | After this step, follow the [https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation instructions] with sudo, making sure you are in the Tesseract4 folder. The executable tesseract should now be located in /usr/local/bin. | ||
https://github.com/tesseract-ocr/tesseract/wiki/Compiling-%E2%80%93-GitInstallation | |||
with | |||
=== 2. Building the training tools: === | === 2. Building the training tools: === | ||
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools | https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools | ||
(only make training | (only make training, sudo make...) | ||
sudo make...) | |||
they are in /usr/local/bin/ (check if you have the files there) | they are in /usr/local/bin/ (check if you have the files there) | ||
=== 3. TrainingTesseract 4.00 === | === 3. TrainingTesseract 4.00 === | ||
Create a folder for the training. | |||
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start | https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start | ||
Download the language files you will use from [https://github.com/tesseract-ocr/tesseract/wiki/Data-Files here] in the tessdata folder. | |||
To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font): | |||
<source lang=bash> | |||
training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only \ | |||
--noextract_font_properties --langdata_dir ../langdata --tessdata_dir ./tessdata --output_dir ./testoutput/ | |||
</source> |
Latest revision as of 21:14, 28 January 2018
1. Installing tesseract 4.0 with training.
Make a folder called Tesseract4 where you will install everything. Get in there. Follow the instructions on https://github.com/tesseract-ocr/tesseract/wiki/Compiling#linux until you reach the leptonica step.
For leptonica:
In the folder Tesseract4, git clone the following:
git clone https://github.com/danbloomberg/leptonica
Then go to the leptonica documentation, according to your OS. Follow step 2:Using autoconf. Run ./configure in this directory to build Makefiles here and in src. Autoconf handles the following automatically:
./autobuild and then ./configure
After this step, follow the instructions with sudo, making sure you are in the Tesseract4 folder. The executable tesseract should now be located in /usr/local/bin.
2. Building the training tools:
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract#building-the-training-tools (only make training, sudo make...)
they are in /usr/local/bin/ (check if you have the files there)
3. TrainingTesseract 4.00
Create a folder for the training. https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#before-you-start
Download the language files you will use from here in the tessdata folder.
To build the basis for your training data (a.i. the files you need to have) using tesstrain.sh, run the following command (you can change the font):
training/tesstrain.sh --fonts_dir /usr/share/fonts --fontlist "Ubuntu Mono" --lang eng --linedata_only \
--noextract_font_properties --langdata_dir ../langdata --tessdata_dir ./tessdata --output_dir ./testoutput/