User:Francg/expub/thesis/hackpackt: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
<div style="background-color:#ffffff; float: left; padding: | <div style="background-color:#ffffff; float: left; padding:5px;> | ||
<br> | <br> | ||
'''Diffengine prototypes''' | '''Diffengine prototypes''' | ||
Line 28: | Line 28: | ||
<br>[[File:Difftest1.png|1,696 × 1113 px|thumb|left|rssfeed-elperiodico]] | <br>[[File:Difftest1.png|1,696 × 1113 px|thumb|left|rssfeed-elperiodico]] | ||
<br>What RSS/Atom feed you would you like to monitor? | <br><br><br><br><br><br><br><br><br><br><br><br> | ||
What RSS/Atom feed you would you like to monitor? | |||
<br>Would you like to set up tweeting edits? [Y/n] | <br>Would you like to set up tweeting edits? [Y/n] | ||
<br>[[File:Process-diff.png|1,696 × 1113 px|thumb|left|rssfeed-process-diff]] | <br>[[File:Process-diff.png|1,696 × 1113 px|thumb|left|rssfeed-process-diff]] | ||
rundiff.sh > config.yaml > folders > files | <br><br><br><br><br><br><br><br><br><br> | ||
rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html) | |||
<br> | <br> | ||
Line 101: | Line 101: | ||
</div> | </div> | ||
<br> | <br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> | ||
<br> | |||
== == | == == | ||
<div style="background-color:#ffffff; float: left; padding:10px;> | <div style="background-color:#ffffff; float: left; padding:10px;> | ||
<br>'''Beautiful Soup prototypes: Extracting structured content and output to txt file''' | <br>'''Beautiful Soup prototypes: Extracting structured content and output to txt file''' |
Revision as of 19:25, 22 October 2017
Diffengine prototypes
- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html
Forthcoming prototypes:
- Install Twarc to extract content updated content from social media specific users or groups
- Try installing atom/rss feed finder?
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
- Compare ranking words on each txt file (rss feed article)
- Upload files (jpg, html or txt) to a database autom as they happen to be created by diffengine
- Install raspberry pi
- Create a database (maybe also a wiki page?)
RSS Feed Hunginfton post
What RSS/Atom feed you would you like to monitor?
Would you like to set up tweeting edits? [Y/n]
rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)
newsfeeds stored and tracking
rss newsfeed folders where jpg & html files are created
"You have mail" in Terminal
bbcnews
huffingtonpost
Beautiful Soup prototypes: Extracting structured content and output to txt file