User:Pedro Sá Couto/TW/JSTOR De-watermarking: Difference between revisions
< User:Pedro Sá Couto | TW
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
=JSTOR.SH= | =JSTOR.SH= | ||
==To activate the stream I use ./jstor.sh== | ====To activate the stream I use ./jstor.sh==== | ||
<pre> | <pre> |
Revision as of 02:55, 6 June 2020
The process to dewatermark is separated into 4 steps: 1. Bursting the PDF into png 2. Overlaying the cover 3. Overlaying the pages 4. OCR again
JSTOR.SH
To activate the stream I use ./jstor.sh
cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay cd /Users/PSC/Desktop/JSTOR/overlay for name in *; do mv "$name" "${name// /_}"; done mv /Users/PSC/Desktop/JSTOR/overlay/*.pdf target.pdf mkdir -p split python3 burstpdf.py python3 overlaylogo_cover.py python3 overlaylogo_page.py rm target.pdf convert "split/*.{png,jpeg,pdf}" -quality 100 name.pdf var1=`ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/*.pdf | head -n 1` mv name.pdf $var1 rm -r split mv `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready cd /Users/PSC/Desktop/JSTOR/ready ocrmypdf `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` mv `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready/ocred