User:Francg/expub/thesis/thesis-outline2

Thesis Outline

Screen-scrapping technology for data exposure

(for activism purposes?)

This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping. With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. The information here is subjected amongst different views and therefore is subjective, not neutral and sometimes highly speculative. In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place. Ideally, I will be running a script that will fetch all the needed web pages, screen-scrap the updated HTMLs to get the results, in form of content articles, and finally update this content to a website (which will function as an online archive or database). Simultaneously, I will also be working with “diffengine”, another tool that tracks RSS web feeds in a computer readable way, which will allow me to to see when content changes. When new content is found a snapshot can be saved to the website (feeds archive) that I will be using to lively store & track news. This way of experiencing information can help on drawing attention on data transformation and how news are constantly being morphed, without being aware of it, which can be quite useful for researching. In a way, this can work as a sort of conscious live streaming, updating every targeted news change. This data could also be updated and formatted as PDF documents. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want. A book (or series of diff books arranged chronologically or by web sources) could be printed by converting all this ongoing updated data into an pdf, epub or other format file.

- - - Thesis: 7000 – 8000 words. What is it? Description What is the aim of it? Can be transmitted through different mediums or publishing formats? Which articles, references are used to write it? Refer back to the project. How it relates to your actual research? Conclusion?

- - -
Reading sources:
Read Where I am - Exploring New Information Cultures
Networks without a cause - A critique of Social Media
Cyburbia - The Dangerous Idea that's changing how we live and Who we are
Pandora's Hope - Essays on the reality of Science Studies
- - -
Websites:
https://twitter.com/guardian_diff
http://www.b-list.org/weblog/2010/nov/02/news-done-broke/
http://la3.org/~kilburn/blog/catalan-government-bypass-ipfs/
- - -

Session 2 thesis outline + prototype