User:Pedro Sá Couto/TW/JSTOR De-watermarking: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 1: Line 1:
The process to dewatermark is separated into 4 steps:
The process to dewatermark is separated into 4 steps:<br>
1. Bursting the PDF into png
1. Bursting the PDF into png<br>
2. Overlaying the cover
2. Overlaying the cover<br>
3. Overlaying the pages
3. Overlaying the pages<br>
4. OCR again
4. OCR again<br>





Revision as of 02:55, 6 June 2020

The process to dewatermark is separated into 4 steps:
1. Bursting the PDF into png
2. Overlaying the cover
3. Overlaying the pages
4. OCR again


JSTOR.SH

To activate the stream I use ./jstor.sh

cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay
cd /Users/PSC/Desktop/JSTOR/overlay
for name in *; do mv "$name" "${name// /_}"; done
mv /Users/PSC/Desktop/JSTOR/overlay/*.pdf target.pdf
mkdir -p split
python3 burstpdf.py
python3 overlaylogo_cover.py
python3 overlaylogo_page.py
rm target.pdf
convert "split/*.{png,jpeg,pdf}" -quality 100 name.pdf
var1=`ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/*.pdf | head -n 1`
mv name.pdf $var1
rm -r split
mv `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready
cd /Users/PSC/Desktop/JSTOR/ready
ocrmypdf `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1`
mv `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready/ocred