User:Francg/expub/thesis/hackpackt: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 31: Line 31:
<br>[[File:Difftest1.png|1,696 × 1113 px|thumb|left|rssfeed-elperiodico]]
<br>[[File:Difftest1.png|1,696 × 1113 px|thumb|left|rssfeed-elperiodico]]
<br><br><br><br><br><br><br><br><br><br><br><br>
<br><br><br><br><br><br><br><br><br><br><br><br>
What RSS/Atom feed you would you like to monitor?
What RSS/Atom feed would you like to monitor?
<br>Would you like to set up tweeting edits? [Y/n]
<br>Would you like to set up tweeting edits? [Y/n]



Revision as of 07:36, 23 October 2017


Diffengine prototypes

- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html


Forthcoming prototypes:

- Install "Twarc" to extract content updated content from Twitter
- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups)
- Try to monitor with diffengine these new RSS created with feedme, so we can see their content changes.
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
- " " " same thing here, but only counting number of changes.
- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
- Install raspberry pi
- Create a database
- Upload files to this database autom as they happen to be created by diffengine


Rss-huffington.png
RSS Feed Hunginfton post


rssfeed-elperiodico













What RSS/Atom feed would you like to monitor?
Would you like to set up tweeting edits? [Y/n]


rssfeed-process-diff











rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)



Stored-newsfeed_bash.png
rundiff.sh: sources monitoring, and path to device


Rssfeed-sources.png
rss newsfeed folders where jpg & html files are created


Mail-terminal.png
"You have mail" in Terminal


Diff-bbcnews.png
bbcnews


Dff-huffington.png
huffingtonpost


rssfeed-20minutos

rssfeed-aljazeera

rssfeed-ara

rssfeed-bbcnews

rssfeed-dailymail

rssfeed-einnews-multifeeds

rssfeed-elconfidencial

rssfeed-eldiario

rssfeed-eldiario2-thumb

rssfeed-elperiodico

rssfeed-kaosenlared

rssfeed-larazon

rssfeed-lavanguardia

rssfeed-reuters-world

rssfeed-segre

rssfeed-skynews

rssfeed-ara










































































































































Beautiful Soup prototypes: Extracting structured content and output to txt file


Bsf4-testcontent.png


Bs4-content_div_class.png


Bs4-output_txtfile.png