User:Pedro Sá Couto/TW/JSTOR De-watermarking

From XPUB & Lens-Based wiki
< User:Pedro Sá Couto‎ | TW
Revision as of 03:54, 6 June 2020 by Pedro Sá Couto (talk | contribs) (Created page with "The process to dewatermark is separated into 4 steps: 1. Bursting the PDF into png 2. Overlaying the cover 3. Overlaying the pages 4. OCR again =The process is activated thr...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The process to dewatermark is separated into 4 steps: 1. Bursting the PDF into png 2. Overlaying the cover 3. Overlaying the pages 4. OCR again


The process is activated through ./jstor.sh


cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay
cd /Users/PSC/Desktop/JSTOR/overlay
for name in *; do mv "$name" "${name// /_}"; done
mv /Users/PSC/Desktop/JSTOR/overlay/*.pdf target.pdf
mkdir -p split
python3 burstpdf.py
python3 overlaylogo_cover.py
python3 overlaylogo_page.py
rm target.pdf
convert "split/*.{png,jpeg,pdf}" -quality 100 name.pdf
var1=`ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/*.pdf | head -n 1`
mv name.pdf $var1
rm -r split
mv `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready
cd /Users/PSC/Desktop/JSTOR/ready
ocrmypdf `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1`
mv `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready/ocred