User:Francg/expub/thesis/thesis-outline: Difference between revisions

Latest revision as of 17:27, 15 October 2017

Thesis Outline 5.10.17

Screen-scrapping technology for data exposure

This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping.

With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. This data reaches an online user, who instantaneously becomes an important network-actor by sharing this content to another user, while hitting uncountable websites and news headers. The information here is subjected among different views and therefore is subjective, not neutral and sometimes highly speculative.

In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place.

Ideally, I will be running a script that will fetch all the needed web pages, update them onto a server (pzwart) and screen-scrap the updated HTMLs to get the results (Unicode standard text encoding). This could be e.g. a whole article or just news headers. The next step will be formatting all this info in a layout that can be printed in form of book. Furthermore, this data could even be screen-scrap, selected and split within two opposite groups automatically, which would employ some sort of of complex syntax recognition. This means I could interestingly end up with two opposite data bodies focused on one identical issue.

This could also work as a conscious live streaming by updating every new data modification into a website (ideally hosted by pzwart server), where users could track data, read information and go to the original source if needed. Perhaps there could be data visualization, where all updated data can be illustrated graphically and also counted, which could lively create an interesting infographic pattern. Even better, new data could be updated as a new single page of this ongoing book. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want.

What is exciting about this is the formatting from web based data, to live-stream and printed matter, which transforms completely the way we experience information, being aware from what you might have seen or heard (or not), to what is really out there being published online. May this form of data exposure bring out a dialogue between man and machine, highlighting the potential of using code without loosing the quality and craft of a handmade work.

- - -
Reading sources:
Read Where I am - Exploring New Information Cultures
Networks without a cause - A critique of Social Media
Cyburbia - The Dangerous Idea that's changing how we live and Who we are
Pandora's Hope - Essays on the reality of Science Studies
- - -
Websites:
https://twitter.com/guardian_diff
http://www.b-list.org/weblog/2010/nov/02/news-done-broke/
http://la3.org/~kilburn/blog/catalan-government-bypass-ipfs/

*Thesis Outline after group review 5.10.17*
Screen-scrapping technology for data change exposure

This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping. With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. The information here is subjected amongst different views and therefore is subjective, not neutral and sometimes highly speculative. In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place. Ideally, I will be running a script that will fetch all the needed web pages, screen-scrap the updated HTMLs to get the results, in form of content articles, and finally update this content to a website (which will function as an online archive or database). Simultaneously, I will also be working with “diffengine”, another tool that tracks RSS web feeds in a computer readable way, which will allow me to to see when content changes. When new content is found a snapshot can be saved to the website (feeds archive) that I will be using to lively store & track news. This way of experiencing information can help on drawing attention on data transformation and how news are constantly being morphed, without being aware of it, which can be quite useful for researching. In a way, this can work as a sort of conscious live streaming, updating every targeted news change. This data could also be updated and formatted as PDF documents. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want. A book (or series of diff books arranged chronologically or by web sources) could be printed by converting all this ongoing updated data into an pdf, epub or other format file.

other notes: Thesis: 7000 – 8000 words. What is it? Description What is the aim of it? Can be transmitted through different mediums or publishing formats? Which articles, references are used to write it? Refer back to the project. How it relates to your actual research? Conclusion?

Qian: will u choose one way to show the project or multi? You want to transfer the online info to a more subjective perspective?

Catalina's comments: 1. How do you want to present the final result? That would be a website, a book, an installation? 2. Do you want to demonstrate or analyze how the new media is used or how it is manipulated in this particular case? 3. What move you to work on this political issue, why this is interesting for you and the audience?

@@ Line 3: / Line 3: @@
 <center>
 <br>
-'''Thesis Outline'''
+'''Thesis Outline 5.10.17'''
-<br><br>
+Screen-scrapping technology for data exposure
+<br>
+<br>
 </center>
-<br>'''What you want the thesis to be about?'''
+<br><br>This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping.
-I want the thesis to be a bridge between the digital and the handmade, showing a research focused on the study of the growth of techno-dependency in the evolution of medias, how our daily online environment triggers our senses, how far information can potentially be spread, how deeply embedded is our digital persona on us, or which future scenarios can be speculated from this ongoing issues; e.g.: "Who would be able to design a book in a post-apocalyptic digital era where Adobe no longer exists (neither other similar software replacements)? maybe only coders." This could be an interesting discussion that could be further explored and so interpreted by suggesting new possible directions, which can bring attention to the close connections between technology, politics, economy and design.
+With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. This data reaches an online user, who instantaneously becomes an important network-actor by sharing this content to another user, while hitting uncountable websites and news headers. The information here is subjected among different views and therefore is subjective, not neutral and sometimes highly speculative.
-For instance, the book "Conversations" shows how a book can be designed using markdown languages and to still keep a beautiful layout with code-based imagery. It offers a good example of a workflow (based on existing platforms + tools; namely etherpad (web based text editor; I personally like its numeric aesthetic and usability), latex (specific mdown reader) or bash (shell scripting), which in this case involves "sociality" with a group of participants. That is to say, the social aspect should be an important factor in the development of my thesis, deepening into a more concrete study case.
+In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place.
-Building a book through markup languages would be an inspiring challenge for people with complete different profiles and levels of specialization, such as writers, artists, activists, etc., but also a way to encourage myself personally to be a more self-sufficient designer, not to become a developer, but a more multidisciplinary designer by integrating code. Most importantly, by using these tools we will be questioning the process of creation, our active roles with technology, and their social significance as well:
+Ideally, I will be running a script that will fetch all the needed web pages, update them onto a server (pzwart) and screen-scrap the updated HTMLs to get the results (Unicode standard text encoding). This could be e.g. a whole article or just news headers.
+The next step will be formatting all this info in a layout that can be printed in form of book. Furthermore, this data could even be screen-scrap, selected and split within two opposite groups automatically, which would employ some sort of of complex syntax recognition. This means I could interestingly end up with two opposite data bodies focused on one identical issue.
-<br>Will there be data visualization?
+This could also work as a conscious live streaming by updating every new data modification into a website (ideally hosted by pzwart server), where users could track data, read information and go to the original source if needed. Perhaps there could be data visualization, where all updated data can be illustrated graphically and also counted, which could lively create an interesting infographic pattern. Even better, new data could be updated as a new single page of this ongoing book. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want.
-<br>Which are the digital mediums employed to produce such work?
-<br>What information sources (study case) are going to be researched?
-<br>Is there any particular digital culture type behind it?
-<br>What experimental publishing forms or collaborative spaces can this body incorporate?
-<br>Would this material be aimed for designers, non-designers, youth, politicians (what is the audience?)?
-<br>What is the pedagogical value of using such tools? Is it technological freedom?
-<br>Can documentation bring out a dialogue between man and machine, highlighting the potential of using code without loosing the quality and craft of a handmade work?
-Maybe this is a way to not become obsolete for me, having the need to find other ways to find more interesting and resourceful workflows that make us feel more useful.
+What is exciting about this is the formatting from web based data, to live-stream and printed matter, which transforms completely the way we experience information, being aware from what you might have seen or heard (or not), to what is really out there being published online. May this form of data exposure bring out a dialogue between man and machine, highlighting the potential of using code without loosing the quality and craft of a handmade work.
-What is exciting about this is that the transition and separation of the digital and analogue, or the digital and physical realms, can also highlight the results of this collide, whether digitally or as some sort of printed matter.
-This could be an initial connection to some previews essays on cybernetics and technology as human extensions.
+- - -
+<br>Reading sources:
+<br>Read Where I am - Exploring New Information Cultures
+<br>Networks without a cause - A critique of Social Media
+<br>Cyburbia - The Dangerous Idea that's changing how we live and Who we are
+<br>Pandora's Hope - Essays on the reality of Science Studies
+<br>- - -
+<br>Websites:
+<br>https://twitter.com/guardian_diff
+<br>http://www.b-list.org/weblog/2010/nov/02/news-done-broke/
+<br>http://la3.org/~kilburn/blog/catalan-government-bypass-ipfs/
-<br>'''Bibliography'''
-sarah garcin: the PJ machine (Publishing Jockey) -> https://www.youtube.com/watch?v=mvL6N168Dg4
+<br><br><br>
-<br>Ricardo Lafuente -> https://pzwiki.wdka.nl/mediadesign/Lettersoup
+<center>'''*Thesis Outline after group review 5.10.17'''*
-<br>http://conversations.tools
+<br>Screen-scrapping technology for data change exposure</center>
-<br>https://www.forkable.eu/generators/dit/o/free/A3/dit-A3-001.pdf
-<br>https://archive.org/details/designforbrain00ashb
-<br>Some tools:
+<br>This project began with the need to find resourceful workflows for more efficient research, data collection and data exposure, in relation to an existing socio-political event of some sort that could be seen as an opportunity for data-scrapping. With current socio-political issues of great significance internationally, such as the territorial conflict between Catalonia and Spain, information medias create a huge amount of data that is constantly updated and potentially spreadable and morphing. The information here is subjected amongst different views and therefore is subjective, not neutral and sometimes highly speculative. In order to get as much data as possible out of sources continuously updating material, I want to employ the so-called “generative techniques”. To do this, I will work with “Beautiful Soup”, which is a tool that allows to screen-scrap data from the Internet through generated code in Python, which will allow me to dissect and extract what’s important from a document. That is to say, there will be an important technological challenge in my research that will lead to new tools and working environments, in which programming languages will take place. Ideally, I will be running a script that will fetch all the needed web pages, screen-scrap the updated HTMLs to get the results, in form of content articles, and finally update this content to a website (which will function as an online archive or database). Simultaneously, I will also be working with “diffengine”, another tool that tracks RSS web feeds in a computer readable way, which will allow me to to see when content changes. When new content is found a snapshot can be saved to the website (feeds archive) that I will be using to lively store & track news. This way of experiencing information can help on drawing attention on data transformation and how news are constantly being morphed, without being aware of it, which can be quite useful for researching. In a way, this can work as a sort of conscious live streaming, updating every targeted news change. This data could also be updated and formatted as PDF documents. This would easily allow interested users; whether designers, non-designers, activists, politicians, writers or people with complete different profiles and levels of specialization, to select, download or print just what they want. A book (or series of diff books arranged chronologically or by web sources) could be printed by converting all this ongoing updated data into an pdf, epub or other format file.
-<br>https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
-<br>http://www.latex-project.org/
-<br>http://pandoc.org/
-<br>https://en.wikipedia.org/wiki/epub
-<br>https://www.scribus.net/
-<br>Tech blogs:
+<br>
-<br>[https://www.lifewire.com/big-ways-to-track-viral-trends-online-3486303 Five ways to track viral things Online]
+other notes: Thesis: 7000 – 8000 words. What is it? Description What is the aim of it? Can be transmitted through different mediums or publishing formats? Which articles, references are used to write it? Refer back to the project. How it relates to your actual research? Conclusion?
-https://www.newswhip.com/
-https://techcrunch.com/
-https://thenextweb.com/
-- - -
+Qian: will u choose one way to show the project or multi? You want to transfer the online info to a more subjective perspective?
+Catalina's comments: 1. How do you want to present the final result? That would be a website, a book, an installation? 2. Do you want to demonstrate or analyze how the new media is used or how it is manipulated in this particular case? 3. What move you to work on this political issue, why this is interesting for you and the audience?
-[https://pzwiki.wdka.nl/mw-mediadesign/index.php?title=Upload_thesis_outlines_2017-18_here&action=edit&redlink=1| Session 2 thesis outline + prototype]
 </div>