User:Francg/expub/thesis/hackpackt: Difference between revisions

Latest revision as of 20:42, 24 October 2017

Diffengine prototypes

- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html

Forthcoming prototypes:

- Install "Twarc" to extract content updated content from Twitter
- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups but not single users)
- Try to monitor with diffengine these new RSS feed created with feedme, so we can see their content changing.
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
- Use Python + Beautiful Soup to count number of changes.
- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
- Install raspberry pi
- Create a database (MongoDB or SQL like databases), small forms, javascript snippets, etc. Example: "Wikileaks" structure
- Database offers access to raw materials/content, and to additional features such as collaborations from invited figures (they could be essays, articles, comments, interviews, a peace of code, or anything really).
- Find these collaborations (e.g."Arxivem el moment", "docnow", 1-O computer scientists, social organizations, ONGs, independent journalists, politicians, students, newspaper directors in relation to specific news,...)
- Upload files to this database autom as they happen to be created by diffengine
- Design database!
- Use onion browser like Tor to keep the users and the database IP protected from online surveillance, and as a way to reinforce and support freedom of speech against network censorship from state power.

RSS Feed Hunginfton post

rssfeed-elperiodico

What RSS/Atom feed would you like to monitor?
Would you like to set up tweeting edits? [Y/n]

rssfeed-process-diff

rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)

rundiff.sh: sources monitoring, and path to device

rss newsfeed folders where jpg & html files are created

"You have mail" in Terminal

bbcnews

huffingtonpost

rssfeed-20minutos

rssfeed-aljazeera

rssfeed-ara

rssfeed-bbcnews

rssfeed-dailymail

rssfeed-einnews-multifeeds

rssfeed-elconfidencial

rssfeed-eldiario

rssfeed-eldiario2-thumb

rssfeed-elperiodico

rssfeed-kaosenlared

rssfeed-larazon

rssfeed-lavanguardia

rssfeed-reuters-world

rssfeed-segre

rssfeed-skynews

rssfeed-ara

Beautiful Soup prototypes: Extracting structured content and output to txt file

@@ Line 1: / Line 1: @@
+<div style="background-color:#ffffff; float: left; padding:5px;>
+<br>
+'''Diffengine prototypes'''
+- Installing Diffengine
+<br>- Selecting RSS news feed (online journalism)
+<br>- Configuring diffengine to check out feeds every 5 min
+<br>- Monitor RSS feeds and download content changes (if there are) as jpg & html
+<br>
+'''Forthcoming prototypes:'''
+- Install "Twarc" to extract content updated content from Twitter
+<br>- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups but not single users)
+<br>- Try to monitor with diffengine these new RSS feed created with feedme, so we can see their content changing.
+<br>- Convert autom html files to txt files using python, as they happen to be created by diffengine
+<br>- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
+<br>- Use Python + Beautiful Soup to count number of changes.
+<br>- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
+<br>- Install raspberry pi
+<br>- Create a database (MongoDB or SQL like databases), small forms, javascript snippets, etc.  Example: "Wikileaks" structure
+<br>- Database offers access to raw materials/content, and to additional features such as collaborations from invited figures (they could be essays, articles, comments, interviews, a peace of code, or anything really).
+<br>- Find these collaborations (e.g."Arxivem el moment", "docnow", 1-O computer scientists, social organizations, ONGs, independent journalists, politicians, students, newspaper directors in relation to specific news,...)
+<br>- Upload files to this database autom as they happen to be created by diffengine
+<br>- Design database!
+<br>- Use onion browser like Tor to keep the users and the database IP protected from online surveillance, and as a way to reinforce and support freedom of speech against network censorship from state power.
+== ==
+<br>
 https://pzwiki.wdka.nl/mw-mediadesign/images/7/78/Rss-huffington.png
@@ Line 4: / Line 35: @@
 <br>[[File:Difftest1.png|1,696 × 1113 px|thumb|left|rssfeed-elperiodico]]
-<br>What RSS/Atom feed you would you like to monitor?
+<br><br><br><br><br><br><br><br><br><br><br><br>
-<br> Would you like to set up tweeting edits? [Y/n]
+What RSS/Atom feed would you like to monitor?
+<br>Would you like to set up tweeting edits? [Y/n]
+<br>[[File:Process-diff.png|1,696 × 1113 px|thumb|left|rssfeed-process-diff]]
+<br><br><br><br><br><br><br><br><br><br>
+rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)
+<br>
-<br>https://pzwiki.wdka.nl/mw-mediadesign/images/b/ba/Process-diff.png
-<br>rundiff.sh > config.yaml > folders > files
 <br>https://pzwiki.wdka.nl/mw-mediadesign/images/3/31/Stored-newsfeed_bash.png
-<br>newsfeeds stored and tracking
+<br>rundiff.sh: sources monitoring, and path to device
 <br>https://pzwiki.wdka.nl/mw-mediadesign/images/8/86/Rssfeed-sources.png
@@ Line 25: / Line 60: @@
 <br>huffingtonpost
+</div>
-<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 350px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
+<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 300px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
 <br>[[File:20minutos-61.png|1,400 × 2,238 px|thumb|left|rssfeed-20minutos]]
@@ Line 40: / Line 76: @@
 </div>
-<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 350px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
+<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 300px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
 <br>[[File:einnews-209.png|1,400 × 1,978 px|thumb|left|rssfeed-einnews-multifeeds]]
@@ Line 52: / Line 88: @@
 <br>[[File:elperiodico-22.png|1,400 × 1,278  px|thumb|left|rssfeed-elperiodico]]
-<br>[[File:elperiodico-22.png|800 × 400 px|thumb|left|rssfeed-elperiodico]]
+<br>[[File:kaosenlared-2.png|1,400 × 1,298 px|thumb|left|rssfeed-kaosenlared]]
 </div>
-<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 350px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
+<div style="background-color:#ffffff; float: left; padding:10px; margin: 0 15px 0 0; width: 300px; font-size:85%; line-height: 1.3em; letter-spacing: 0.8px;">
+<br>[[File:abc-73-thumb.png|800 × 1,127 px|thumb|left|rssfeed-larazon]]
+<br>[[File:lavanguardia-251-thumb.png|800 × 480 px|thumb|left|rssfeed-lavanguardia]]
+<br>[[File:reuters-508.png|1,400 × 2,398 px|thumb|left|rssfeed-reuters-world]]
+<br>[[File:segre-133-thumb.png|1,455 × 898 px|thumb|left|rssfeed-segre]]
+<br>[[File:Skynews-from-html.png|1,280 × 800 px|thumb|left|rssfeed-skynews]]
+<br>[[File:ara-256.png|1,400 × 1,918 px|thumb|left|rssfeed-ara]]
+</div>
+<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
+== ==
+<div style="background-color:#ffffff; float: left; padding:10px;>
+<br>'''Beautiful Soup prototypes: Extracting structured content and output to txt file'''
+<br>https://pzwiki.wdka.nl/mw-mediadesign/images/6/6a/Bsf4-testcontent.png
+<br>https://pzwiki.wdka.nl/mw-mediadesign/images/0/06/Bs4-content_div_class.png
-<br>[[File:Elconfidencial-from-html.png|1,280 × 800 px|thumb|left|rssfeed-Elconfidencial]]
+<br>https://pzwiki.wdka.nl/mw-mediadesign/images/4/4e/Bs4-output_txtfile.png
 </div>