User:Pedro Sá Couto/TW/JSTOR De-watermarking: Difference between revisions
< User:Pedro Sá Couto | TW
(Created page with "The process to dewatermark is separated into 4 steps: 1. Bursting the PDF into png 2. Overlaying the cover 3. Overlaying the pages 4. OCR again =The process is activated thr...") |
No edit summary |
||
Line 8: | Line 8: | ||
=The process is activated through ./jstor.sh= | =The process is activated through ./jstor.sh= | ||
<pre | <pre> | ||
cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay | cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay |
Revision as of 02:54, 6 June 2020
The process to dewatermark is separated into 4 steps: 1. Bursting the PDF into png 2. Overlaying the cover 3. Overlaying the pages 4. OCR again
The process is activated through ./jstor.sh
cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay cd /Users/PSC/Desktop/JSTOR/overlay for name in *; do mv "$name" "${name// /_}"; done mv /Users/PSC/Desktop/JSTOR/overlay/*.pdf target.pdf mkdir -p split python3 burstpdf.py python3 overlaylogo_cover.py python3 overlaylogo_page.py rm target.pdf convert "split/*.{png,jpeg,pdf}" -quality 100 name.pdf var1=`ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/*.pdf | head -n 1` mv name.pdf $var1 rm -r split mv `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready cd /Users/PSC/Desktop/JSTOR/ready ocrmypdf `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` mv `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready/ocred