User:Pedro Sá Couto/TW/REPUBLISHING FLOW: Difference between revisions
< User:Pedro Sá Couto | TW
No edit summary |
No edit summary |
||
Line 31: | Line 31: | ||
<br> | <br> | ||
==1. | ==1. Moving the book from the webserver to a work place== | ||
<source lang="python"> | <source lang="python"> | ||
</source> | |||
<br> | |||
==2. Creating the watermark from the gathered form in Tactical Watermarks== | |||
<source lang="python"> | |||
</source> | |||
<br> | |||
==3. Append the watermark to the pdf== | |||
<source lang="python"> | |||
</source> | </source> | ||
<br> | <br> | ||
== | ==4. OCR the pdf if not OCRed already== | ||
<source lang="python"> | <source lang="python"> | ||
</source> | </source> | ||
<br> | <br> | ||
== | ==5. Save the file in a directory open to Library Genesis Staff== | ||
<source lang="python"> | <source lang="python"> | ||
</source> | </source> | ||
<br> | <br> | ||
== | ==6. Delete all the unwanted traces== | ||
<source lang="python"> | <source lang="python"> | ||
</source> | </source> | ||
<br> | <br> |
Revision as of 04:01, 6 June 2020
STEPS
Republishing is separated into 6 steps:
1. Moving the book from the webserver to a work place
- 1.1 Replacing all spaces with underscores
2. Creating the watermark from the gathered form in Tactical Watermarks
- 2.1 Create the watermark in pdf with reportlab
- 2.2 Convert to a png
3. Append the watermark to the pdf
- 3.1 Burst the pdf into pages
- 3.2 Rotate the watermark with PIL
- 3.3 Overlay the watermark with PIL
- 3.4 Merge all images into a PDF
4. OCR the pdf if not OCRed already
5. Save the file in a directory open to Library Genesis Staff
6. Delete all the unwanted traces
FLOW
RUN.SH
To activate the stream I use ./run.sh
sudo chmod 777 *
./movebookfolder.sh
./watermarkformtxt.sh
./appendwatermarktopdf.sh
./republish.sh
./deletetraces.sh
1. Moving the book from the webserver to a work place
2. Creating the watermark from the gathered form in Tactical Watermarks
3. Append the watermark to the pdf
4. OCR the pdf if not OCRed already
5. Save the file in a directory open to Library Genesis Staff
6. Delete all the unwanted traces
RESULTS IN EACH STEP
0. Starting with a Paper from JSTOR
File:42938075.pdf
1. Bursting the PDF into PNGs
PDF is seperated into pages
2. Overlaying the cover
The cover is overlayed and dewatermarked
3. Overlaying the pages
The pages are overlayed and dewatermarked
4. OCR again