User:Francg/expub/thesis/project-draft-09.11.17

Graduate Proposal

Introduction

We live in an era where news media generates, changes and updates information almost instantaneously. Information is quickly expanded and facts spread worldwide across networks. These facts are shared and communicated through individuals interaction, creating huge quantities of data spinning around, which are also increasingly replicated and transformed. In this sense, news data is constantly rewritten and revised. However it is also vulnerable to state authorities, centralized platforms and institutionalized environments. That is to say, information is not free and open, most importantly, it is censored, manipulated and revised, yet this is invisible often.

[1] This domain name has been sized pursuant to a seizure warrant under the Judicial Authority and is under its administration

[2] The world's first internet war has begun, in Catalonia, as the people and government use it to organize an independence referendum on Sunday and Spanish intelligence attacks, freezing telecommunications links, occupying telecoms buildings, censors 100s of sites, protocols etc.

Context

This project will be contextualized surrounding the events of the current sociopolitical conflict between Catalonia and Spain. It will focus specially on the behavior and consequences of the news media, tracking, collecting and analyzing data. At the same time, it will reflect critically on network surveillance and media censorship issues, experimenting with algorithmic tools, in order to process quickly-changing information, diving into restricted zones where data becomes exclusive. This will allow observing any possible version of generated data, making information more visible, while empowering the user to become aware of its nature. The mass media gave very diverse views and attention to the institutionalized punishment happening before, during and after the 1st of October. State authorities censored hundreds of websites, occupied telecommunication buildings, raided newsprint and used body armor. It all happened to stop people manifesting themselves voting. The topic quickly expanded and took enough room in social networks to be daily commented in waves of extreme diverse manifestation, creating a massive flood of information, a lot of it manipulated. In the end, the mass media could no longer ignore this topic and it also spread internationally. Meanwhile, a vast quantity of news is constantly updated.

[3] "Steps to Apply Article 155 of the Spanish Constitution in Catalonia"

What is it?

The project will experiment with a series of tools, that will actively track and stream news media. This generated data-set will function as an open platform for data collection, attempting to make information visible and accessible without restrictions, becoming an archive. It aims to empower the user (avoiding to create a centralized space), allowing to query individual concerns or needs in a functional way. It encourages users to participate in providing documentation; written essays, abstracts, thoughts, sources, or any other useful material. At the same time, it will also seek to maintain an ethical environment, meaning careful criteria concerning accessibility will need to be applied. In this way, access keys or invitation systems will be considered.

This platform will serve the necessary infrastructure, providing opportunities for decentralized forms of organization. Using RSS readers as main engines for content compilation, without appropriating it. Data will therefore be open and treated equally, attempting to provide a non-discriminatory access to the use of it, giving participants equal chances to provide material or participate in dialogue.

In this way, this social algorithmic experiment will continuously be spitting out data collected from a wide variety of news sources, embracing more democratic results. Data observation will be encouraged for practical things, for instance to strengthen research, verify news, create knowledge, inform, denounce, or combat misinformation. Emphasizing a continuous monitored observation of news media, will possibly draw some attention to both the audience watching news media, and news media changing information on the fly.

How do you plan to make it?

To realize this idea, I will create a news station using programming tools to track and scrape data from news sites. Concretely, diffengine will allow me to extract the primary content of pages, outputting html files to my device. Using 'cron', a time-based job scheduler, diffengine will be set to run at timed intervals. This will allow me to document more material, for instance quickly changing articles. Using algorithmic processes, only articles concerning the mentioned topic will be filtered and collected. The system will be set up on Linux OS open source software and possibly on a Raspberry Pi. Additional hardware will be integrated into this news station.

File:Sketchinterface.jpg

sketch-interface

Although there is no legitimate authority that can decide how relevant a piece of information is, the design setup and maintenance of new-coming data will however be the most important part of this platform to decide. The interface will enable a functional experience both using and visualizing data, which could for instance allow the user to query something and find it, or navigate through a timeline where data flows are recorded and organized, a visualization that illustrates the data stream from new generated article drafts that can be used to compare data production between sites. This idea is approached similarly by NewsDiff, archiving news while bringing to discussion the difficulties of revisions in the digital age.

NewsDiffs Tracking Online News Over Time / NewsDiffs archives changes in articles after publication. Currently, we track nytimes.com, cnn.com, politico.com, washingtonpost.com, and bbc.co.uk.

Each individually could be linked to comments, essays, or any other sort of intervention. Written communication can also play an important role; analyzing whether or not changes have any significance, or the total number of edited characters (deleted and added), the amount of times specific words are displayed, etc. In any case, I will be testing the platform with users, and adopt it to people’s necessities while using it.

Additionally, tracking large amounts of public and also restricted information locked behind paywalls, can also bring discussion on the topic of data appropriation. This would explain why some news media tend to inform quicker rather than accurately, aiming to reach a wider audience.

Timetable

September – November: this is the first stage of development when I acquire methods and experiment with tools in order to learn and get substantial results. This means, improving on bash/shell scripting, learning web scrapping tools, python libraries and other command line tools for streaming and archiving data.

December – March: the second stage will consist on putting all this generated material aside, and start designing a functional interface. This process has already began. Alongside this process, it will be necessary to go back and forth in order to apply criteria that design might demand for an efficient use of this data.

April – June: the third and last stage will be to implement design, tools and targets, creating the project’s main body.

Why do you want to make it?

[4]If you're running a Catalan site being shut down by Spanish authorities, contact me for anonymous hosting and domains. We've got you covered

I want to support an ethical collection of media content during the ongoing conflict between Catalonia and Spain. I want to offer provide more transparency to the happening facts, preserving all reviewed published material, and reinforce the importance of accurate information. At the same time, I want to share my insights regarding this political and social matter through pragmatic results, and reach an audience that can actively engage with the platform.

Who can help and how

Former students who are actively engaged politically could contribute providing resources. Henry Warwick: considering an offline space that could be accessed by anyone without online dependency. Transforming an active stream data-set into a static but regularly maintained offline archive. Ducan Barok (linking externally to Monoscope sources), Marcel Mars (perhaps a peace of code), Femke Snelting (essay, comments). Docnow (community developed around supporting the ethical collection, use, and preservation of social media content), NewsDiff, Archivers of Catalonia Associació d'Arxivers-Gestors documentals de Catalunya also creators of the phenomena #ArxivemelMoment (Archiving the now), Witness. Contacting Ricardo Gutierrez, director of the European Federation of Journalists, the Catalan Agency of News (ACN), they could provide some interesting insight as they were banned by the state. Human Rights Watch (denounce forcefully the extreme violence of police authorities during the 1st of October), the group of journalists specialized in tracking fake news Maldito Bulo, international newspapers such as The Independent (they said there was a military coup), amongst others. Independent journalist Arkaitz Zubiaga, specialized in the study of social media data in the context of journalism.

Relation to previous practice

The Autonomous Archive Booklet, Printed Archive, Instructions and Development

I've been recently more actively engaged into shell scripting and coding languages since the past two years, learning the fundamentals of new languages and building front-end prototypes that have allowed me to increasingly get more substantial results with these digital environments. Very concretely, my previous practice working on the “Autonomous Archive” gave me some insights into the problematic of online information; Who has access to it? What is the political position of the project? What information is included, collected and stored? Who hosts and under what conditions? etc.

Relation to a larger context

The world of news media is increasingly being datafied. Information is constantly morphing and a control over its archive is necessary in order to extract knowledge. In this sense, we have the responsibility to bring to discussion the ethics of online information. Michael Seemann says in his book “Digital Tailspin: Ten Rules of the Internet After Snowden”:

“We are the source, the hub, the database, and the interface of the other. Your existence, your data, and your offers of communication, co-determine the other's freedom. Declining of using a certain platform limits the other of using it. Not providing an encrypted channel prevents the other from communicating with you securely. Deleting online content restricts the other's querying privileges. The best strategy for end-to-end communication is to adhere to the ethics of the other”.

Relation with the thesis

* The thesis will work hand-in-hand with the project. 
* It can be a useful tool in order to position the audience into the project's sociopolitical context. 
* Furthermore, the thesis will provide documentation on the project's experimentation
* Presenting them together to the audience will reinforce and conclude their core arguments.

References

Books

Assange, J (2012) Cypherpunks: Freedom and the Future of the Internet, OR Books
Beaude, B (2016) The Ends of the Internet, Inst of Network Cultures
Eagle, N and Greene, K (2014) Reality Mining: Using Big Data to Engineer a Better World, MIT Press,
Heller, C (2011) Post Privacy, Beck C.H.
Hyde, A (2013) The Cryptoparty Handbook, Unglue.it
Levinas, E (1998) Entre Nous: On Thinking-of-theOther, Columbia University Press
Matthew, A (2013) Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, O'Reilly Media
Moon, D Ruffini, P and Segal, D (2012) Hacking Politics, OR Books
Padilla, M (2013) El Kit de la Lucha en Internet, Traficantes de Sueños
Schneier, B (2015) Data and Goliath, The Hidden Battles to Collect Your Data and Control Your World, W. W. Norton & Company
Seemann, M (2015) Digital Tailspin. Ten Rules of the Internet after Snowden., Network Notebook
Stewart, K (2015) The Tao of Open Source Cyber Intelligence, It Governance Publishing
Tech Tools for Activism (2012)
Toeffler, A, (1971) Future Shock, Random House

Current or former related projects

NewsDiffs <http://newsdiffs.org/>
guardian_diff <https://twitter.com/guardian_diff>
running on Twitter <https://github.com/docnow/diffengine#examples>

Tools

Diffengine <https://github.com/docnow/diffengine>
Python-readability <https://github.com/buriy/python-readability>
Beautiful Soup <https://pypi.python.org/pypi/beautifulsoup4/>
Scrapy <https://scrapy.org/>
Twarc <https://github.com/DocNow/twarc>
Feeds <https://github.com/nblock/feeds>
MongoDB <https://www.mongodb.com/>