User:Francg/expub/thesis/hackpackt: Difference between revisions
No edit summary |
No edit summary |
||
Line 18: | Line 18: | ||
<br>- Convert autom html files to txt files using python, as they happen to be created by diffengine | <br>- Convert autom html files to txt files using python, as they happen to be created by diffengine | ||
<br>- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used. | <br>- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used. | ||
<br>- Compare ranking words on each txt file (rss feed article) | <br>- " " " same thing here, but only counting number of changes. | ||
<br>- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically? | |||
<br>- Install raspberry pi | <br>- Install raspberry pi | ||
<br>- Create a database | <br>- Create a database | ||
<br>- Upload files to this database autom as they happen to be created by diffengine | |||
== == | == == | ||
<br> | <br> |
Revision as of 06:36, 23 October 2017
Diffengine prototypes
- Installing Diffengine
- Selecting RSS news feed (online journalism)
- Configuring diffengine to check out feeds every 5 min
- Monitor RSS feeds and download content changes (if there are) as jpg & html
Forthcoming prototypes:
- Install "Twarc" to extract content updated content from Twitter
- Install "Feedme" to create RSS for any website that doesn't provide it (it can monitor Facebook groups)
- Try to monitor with diffengine these new RSS created with feedme, so we can see their content changes.
- Convert autom html files to txt files using python, as they happen to be created by diffengine
- Use Python + Beautiful Soup to count specific words from txt file and extract a list of most used.
- " " " same thing here, but only counting number of changes.
- Compare ranking words & changes on each txt file (rss feed article). Maybe this table can be updated automatically?
- Install raspberry pi
- Create a database
- Upload files to this database autom as they happen to be created by diffengine
RSS Feed Hunginfton post
What RSS/Atom feed you would you like to monitor?
Would you like to set up tweeting edits? [Y/n]
rundiff.sh (paths to each news feed in device)> config.yaml (RSS feed to phantomjs path) > folders (where input data is stored)> files (jpg & html)
rundiff.sh: sources monitoring, and path to device
rss newsfeed folders where jpg & html files are created
"You have mail" in Terminal
bbcnews
huffingtonpost
Beautiful Soup prototypes: Extracting structured content and output to txt file