Sniff, Scrape, Crawl (Thematic Project)

'Sniff, Scrape, Crawl…
Trimester 2, Jan.-March 2011
Thematic Project Tutors: Aymeric Mansoux, Michael Murtaugh, Renee Turner

Our society is one not of spectacle, but of surveillance…
Michel Foucault, Discipline and Punish: The Birth of the Prison, trans. Alan Sheridan (New York, 1979), p. 217

We are living in an age of unprecedented surveillance. But unlike the ominous specter of Orwell’s Big Brother, where power is clearly defined and always palpable, today’s methods of information gathering are much more subtle and woven into the fabric of our everyday life. Through the use of seemingly innocuous algorithms Amazon tells us which books we might like, our trusted browser tracks our searches and Last.fm connects us with people who have similar tastes in music. Immersed in social media, we commit to legally binding contracts by agreeing to ‘terms of use’. Having made the pact, we Twitter our subjective realities in less than 140 characters, wish dear friends happy birthday on facebook and mobile-upload our geotagged videos on youtube.

Where once surveillance technologies belonged to governmental agencies, the web has added another less optically-driven means of both monitoring and monetizing our lived experiences. As the line between public and private has become more blurred and the desire for convenience ever greater, our personal data has become a prized commodity upon which industries thrive. Perversely, we have become consumers who simultaneously produce the product through our own consumption.

Sniff, Scrape, Crawl… is a thematic project examining how surveillance and data-mining technologies shape and influence our lives, and what consequences they have on our civil liberties. We will look at the complexities of sharing information in exchange for waiving privacy rights. Next to this, we will look at how our fundamental understanding of private life has changed as public display has become more pervasive through social networks. Bringing together practical exercises, theoretical readings and a series of guest lectures, Sniff, Scrape, Crawl… will attempt to map the data trails we leave behind and look critically at the buoyant industries that track and commodify our personal information.

References: Michel Foucault, Discipline and Punish: The Birth of the Prison, trans. Alan Sheridan (New York, 1979) Wendy Hui Kyong Chun, Control and Freedom: power and paranoia in the age of fiber optics, (London, 2006)

Note: This Thematic Project will be organized and taught by Aymeric Mansoux, Michael Murtaugh, Renee Turner and will involve a series of related guest lectures and presentations.

Workshop 1: 20-21 Jan, 2011

Day 1

Morning: SNIFF, Web 0.0

The joy of basic client/server Netcat Chat
Internet geography http://hulu.com
Connect to pzwart3 "geoip" script with netcat like this TODO: put script on server, make a recipe and link it here
Repeat the same with a browser and witness "Software sorting"
Create generic user account on non discriminated machine, following these instructions
tunnel traffic through this machine with this trick
Wrap network traffic of almost any application with a SOCKS proxy
Sniff requested URL on the proxy machine with urlsnarf
A word on traceRoute & DNS, Internet topology (http://laramies.blogspot.com/2006/05/2d-and-3d-traceroute-with-scapy.html, http://www.visualcomplexity.com/vc/project.cfm?id=332)
A word on VPN, proxies and other delicatesses
Back to "geoblocking" script and explain the code and Python CGI
Unix permissions (basics, but also evt. link to API "permissions" in the Web 2.0 world)

Afternoon: SCRAPE, Web 1.0 / Simple Spider

Start with ipython. Getting documentation from ipython Using urllib, open a connection to a page (connect) Scan for images in a page. Loop

Do everything by hand....
Make Script, put in bin

(how to )

Open connections, interrogate the response, (content_type / length / file), HTML5lib parsing...

HTTP, URL, CSS, get vs. post, query strings, urlencoding

Cookbook: Simple Web Spider in Python

Check: Django install

Day 2

Morning: BRAZIL Web 2.0 / API

Movement from the "surface scrape" to the database structure of a web 2.0 service.
Goal: Apply the same spider script -- now using connections available only via the API.
XML, JSON, RSS

links

Afternoon: DIY CRAWLing with Django

slowly but steadily, a new structure emerges.

Mapping JSON/API structure to a database. Code to map source to db. Using Django admin to visualize data.

Creating views...

>> assignment in teams: for instance:

get a list of tags from a userid

Building a web crawler with Django

quick note about scrapy

Assignment

Assignment is to move from a Survey to a Strategy, and to build a "tactical crawler" related to your research.

Case Study: Naked on Pluto

Assignment is:

selection of a particular "service"
Visualisation / Map of a "unit" of data from said service
Plan for a tactical database to make use of crawled data from the service.

(could be what if scenarios based on the structure of the data, what kind of connections might you make

exploring/fantasizin about possibilities *grounded in data*

Your assignment/plan will be assessed this coming Tuesday, 25 January Your crawler should collect data over the course of the trimester.