User:Francg/expub/thesis/hackpackt: Difference between revisions

Revision as of 17:56, 22 October 2017

Diffengine prototypes
- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html

Forthcoming prototypes:
- Install Twarc to extract content updated content from social media specific users or groups
- Try installing atom/rss feed finder?
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python to count specific words from txt file and extract a list of most used.
- Compare ranking words on each txt file (rss feed article)
- Upload files (jpg, html or txt) to a database autom as they happen to be created by diffengine
- Install raspberry pi
- Create a database (maybe also a wiki page?)

RSS Feed Hunginfton post

rssfeed-elperiodico

What RSS/Atom feed you would you like to monitor?
Would you like to set up tweeting edits? [Y/n]

rssfeed-process-diff

rundiff.sh > config.yaml > folders > files

newsfeeds stored and tracking

rss newsfeed folders where jpg & html files are created

"You have mail" in Terminal

bbcnews

huffingtonpost

rssfeed-20minutos

rssfeed-aljazeera

rssfeed-ara

rssfeed-bbcnews

rssfeed-dailymail

rssfeed-einnews-multifeeds

rssfeed-elconfidencial

rssfeed-eldiario

rssfeed-eldiario2-thumb

rssfeed-elperiodico

rssfeed-kaosenlared

rssfeed-larazon

rssfeed-lavanguardia

rssfeed-reuters-world

rssfeed-segre

rssfeed-skynews

rssfeed-ara

Beautiful Soup prototypes: Extracting structured content and output to txt file

@@ Line 1: / Line 1: @@
 <div style="background-color:#ffffff; float: left; padding:10px;>
+Diffengine prototypes
+<br>- Installing Diffengine
+<br>- Selecting RSS news feed (online journalism)
+<br>- Configuring diffengine to check out feeds every 5 min
+<br>- Monitor RSS feeds and download content changes (if there are) as jpg & html
+<br>
+Forthcoming prototypes:
+<br>- Install Twarc to extract content updated content from social media specific users or groups
+<br>- Try installing atom/rss feed finder?
+<br>- Convert autom html files to txt files using python, as they happen to be created by diffengine
+<br>- Use Python to count specific words from txt file and extract a list of most used.
+<br>- Compare ranking words on each txt file (rss feed article)
+<br>- Upload files (jpg, html or txt) to a database autom as they happen to be created by diffengine
+<br>- Install raspberry pi
+<br>- Create a database (maybe also a wiki page?)
 https://pzwiki.wdka.nl/mw-mediadesign/images/7/78/Rss-huffington.png
@@ Line 78: / Line 96: @@
 </div>
+<br>
+<br>
+'''Beautiful Soup prototypes: Extracting structured content and output to txt file'''
+https://pzwiki.wdka.nl/mw-mediadesign/images/6/6a/Bsf4-testcontent.png