User:Pedro Sá Couto/TW/JSTOR De-watermarking: Difference between revisions
< User:Pedro Sá Couto | TW
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
=STREAM= | |||
The process to dewatermark is separated into 4 steps:<br> | The process to dewatermark is separated into 4 steps:<br> | ||
1. Bursting the PDF into png<br> | 1. Bursting the PDF into png<br> |
Revision as of 02:55, 6 June 2020
STREAM
The process to dewatermark is separated into 4 steps:
1. Bursting the PDF into png
2. Overlaying the cover
3. Overlaying the pages
4. OCR again
JSTOR.SH
To activate the stream I use ./jstor.sh
cp `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/overlay cd /Users/PSC/Desktop/JSTOR/overlay for name in *; do mv "$name" "${name// /_}"; done mv /Users/PSC/Desktop/JSTOR/overlay/*.pdf target.pdf mkdir -p split python3 burstpdf.py python3 overlaylogo_cover.py python3 overlaylogo_page.py rm target.pdf convert "split/*.{png,jpeg,pdf}" -quality 100 name.pdf var1=`ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/*.pdf | head -n 1` mv name.pdf $var1 rm -r split mv `ls -td -- /Users/PSC/Desktop/JSTOR/jstorpaper/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready cd /Users/PSC/Desktop/JSTOR/ready ocrmypdf `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` mv `ls -td -- /Users/PSC/Desktop/JSTOR/ready/* | head -n 1` /Users/PSC/Desktop/JSTOR/ready/ocred