User:Francg/expub/thesis/thesis-outline

Thesis Outline

Screen-scrapping technology for data exposure.

This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping.

With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. This data reaches an online user, who instantaneously becomes an important network-actor by sharing this content to another user, while hitting uncountable websites and news headers. The information here is subjected among different views and therefore is subjective, not neutral and sometimes highly speculative.

In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place.

Ideally, I will be running a script that will fetch all the needed web pages, update them onto a server (pzwart) and screen-scrap the updated HTMLs to get the results (Unicode standard text encoding). This could be e.g. a whole article or just news headers. The next step will be formatting all this info in a layout that can be printed in form of book. Furthermore, this data could even be screen-scrap, selected and split within two opposite groups automatically, which would employ some sort of of complex syntax recognition. This means I could interestingly end up with two opposite data bodies focused on one identical issue.

This could also work as a conscious live streaming by updating every new data modification into a website (ideally hosted by pzwart server), where users could track data, read information and go to the original source if needed. Perhaps there could be data visualization, where all updated data can be illustrated graphically and also counted, which could lively create an interesting infographic pattern. Even better, new data could be updated as a new single page of this ongoing book. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want.

What is exciting about this is the formatting from web based data, to live-stream and printed matter, which transforms completely the way we experience information, being aware from what you might have seen or heard (or not), to what is really out there being published online. May this form of data exposure bring out a dialogue between man and machine, highlighting the potential of using code without loosing the quality and craft of a handmade work.

- - -

https://twitter.com/guardian_diff
https://www.crummy.com/software/BeautifulSoup/
https://en.wikipedia.org/wiki/Generative_art
http://la3.org/~kilburn/blog/catalan-government-bypass-ipfs/
- - -

Session 2 thesis outline + prototype