User:Francg/expub/thesis/hackpackt: Difference between revisions

Revision as of 20:36, 24 October 2017

Diffengine prototypes

- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html

Forthcoming prototypes:

- Install "Twarc" to extract content updated content from Twitter
- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups but not single users)
- Try to monitor with diffengine these new RSS feed created with feedme, so we can see their content changing.
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
- Use Python + Beautiful Soup to count number of changes.
- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
- Install raspberry pi
- Create a database (MongoDB or SQL like databases), small forms, javascript snippets, etc. Example: "Wikileaks" structure
- Database offers access to raw materials/content, and to additional features such as collaborations from invited figures (essays, articles, comments, interviews, etc.).
- Upload files to this database autom as they happen to be created by diffengine
- Design database!
- Use onion browser like Tor to keep the users and the database IP protected from online surveillance, and as a way to reinforce and support freedom of speech against network censorship from state power.

RSS Feed Hunginfton post

rssfeed-elperiodico

What RSS/Atom feed would you like to monitor?
Would you like to set up tweeting edits? [Y/n]

rssfeed-process-diff

rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)

rundiff.sh: sources monitoring, and path to device

rss newsfeed folders where jpg & html files are created

"You have mail" in Terminal

bbcnews

huffingtonpost

rssfeed-20minutos

rssfeed-aljazeera

rssfeed-ara

rssfeed-bbcnews

rssfeed-dailymail

rssfeed-einnews-multifeeds

rssfeed-elconfidencial

rssfeed-eldiario

rssfeed-eldiario2-thumb

rssfeed-elperiodico

rssfeed-kaosenlared

rssfeed-larazon

rssfeed-lavanguardia

rssfeed-reuters-world

rssfeed-segre

rssfeed-skynews

rssfeed-ara

Beautiful Soup prototypes: Extracting structured content and output to txt file

@@ Line 14: / Line 14: @@
 - Install "Twarc" to extract content updated content from Twitter
-<br>- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups)
+<br>- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups but not single users)
-<br>- Try to monitor with diffengine these new RSS created with feedme, so we can see their content changes.
+<br>- Try to monitor with diffengine these new RSS feed created with feedme, so we can see their content changing.
 <br>- Convert autom html files to txt files using python, as they happen to be created by diffengine
 <br>- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
-<br>- " " " same thing here, but only counting number of changes.
+<br>- Use Python + Beautiful Soup to count number of changes.
 <br>- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
 <br>- Install raspberry pi
-<br>- Create a database
+<br>- Create a database (MongoDB or SQL like databases), small forms, javascript snippets, etc.  Example: "Wikileaks" structure
+<br>- Database offers access to raw materials/content, and to additional features such as collaborations from invited figures (essays, articles, comments, interviews, etc.).
 <br>- Upload files to this database autom as they happen to be created by diffengine
+<br>- Design database!
+<br>- Use onion browser like Tor to keep the users and the database IP protected from online surveillance, and as a way to reinforce and support freedom of speech against network censorship from state power.
 == ==
 <br>