User:Lieven Van Speybroeck/Disoriented thoughts/detournement: Difference between revisions

From XPUB & Lens-Based wiki
Line 161: Line 161:
ext = '.jpg'
ext = '.jpg'


folder = 'blue'
folder = 'sky'
folderEdit= 'blue_edit'
folderEdit= 'sky_edit'




# what are you searching for?
# what are you searching for?
tagSearch = 'blue sky'
tagSearch = 'sky scraper'





Revision as of 02:13, 8 March 2011

Start

The initial idea at the start of this trimester was to create an 'empty' webpage that would, each time it gets refreshed, generate its content and layout by elements scraped from other sites on the web. This would result in endless unexpected combinations of text, imagery and layout through which new (possibly interesting or completely uninteresting) contexts and interrelations are established. A self-generating internet found-footage collage so to speak.

The 'limitlessness' characteristic of this idea asked for a more confined starting point that could then, if the results were satisfactory and it would be technically possible, be expanded. Somehow, the search and disclosure of 'hidden' images on the web came out as the kickoff.

I quickly found out that scraping for images based on their HTML markup and CSS-properties was more complicated than first anticipated. I think these were the necessary steps to get it done:

  1. Go to a (randomly picked) webpage (or link within a previously visited page)
  2. look for embedded, inline or external css properties
  3. look for classes or id's that had a display: none-property
  4. if not found: go back to step 1
  5. if found: remember the css-class or id
  6. go through the html page and look for elements with that class/id
  7. if the element is not an image or doesn't contain any: go back to step 1
  8. if the element is an image or contains any: save the image
  9. go back to step 1

Although this might look simple (and I reckon it's probably not thát hard), I somehow lacked the scraping skills to get it done. I simply didn't know how and where to start. I searched for some possibilities online – the main JavaScript-library of Firebug seemed to hold some parts that were useful for html and css processing – but I got stuck pretty fast. I needed some more scraping support...

Changing routes: API

During one of Michael's prototyping sessions I signed up for the "raw" scraping group (top left corner), since that sounded like my way to go. This cgi-script to scrape (visible) images on webpages was the result of an afternoon of tweaking, tuning and following Mr. Murtaugh's tips:

#!/usr/bin/env python
#-*- coding:utf-8 -*-

print('Content-type: text/html; charset=utf-8')
# need to divide the content type with a blank line from the head
print
print('<div style="text-align:center">')

import cgitb; cgitb.enable()
import random
import urllib2, urlparse, html5lib, lxml
from lxml.cssselect import CSSSelector
 
parser = html5lib.HTMLParser(tree=html5lib.treebuilders.getTreeBuilder("lxml"), namespaceHTMLElements=False)

nieuwsblad = []
standaard = []

request = urllib2.Request('http://www.nieuwsblad.be')
request.add_header("User-Agent", "Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5.5")
f=urllib2.urlopen(request)
page = parser.parse(f)

for links in CSSSelector('img[src]')(page):
    img = urlparse.urljoin(f.geturl(), links.attrib['src'])
    nieuwsblad.append(img)
    
request = urllib2.Request('http://www.degentenaar.be')
request.add_header("User-Agent", "Mozilla/5.0 (X11; U; Linux x86_64; fr; rv:1.9.1.5) Gecko/20091109 Ubuntu/9.10 (karmic) Firefox/3.5.5")
f=urllib2.urlopen(request)
page = parser.parse(f)

for links in CSSSelector('img[src]')(page):
    img = urlparse.urljoin(f.geturl(), links.attrib['src'])
    standaard.append(img)
    
for (n,s) in (zip(nieuwsblad,standaard)):
	print('<img style="border:solid 16px blue" src="'+n+'" />')
	print('<img style="border:solid 16px red" src="'+s+'" /><br/>')
print('</div>')

I had new hopes to get the hidden variant up and running, but again, I seemed to hit the wall on the technical part. As I noticed from the workshops (and from Michael's advice), API's offered very powerful ways to scrape content that involved a lot less technical knowledge. I wanted to give it a try to get a better understanding of how the scraping works and see if I could use this experience to actually accomplish my initial idea. Staying in the realm of images, I chose to look into the Flickr API.

Time and an awful lot of junk

At the same time, I was working on the (modest) development of an alternative time-system for a project at Werkplaats Typografie: Speeltijd. It's basically a clock that runs 12/7 faster than normal time, resulting in 12-day weeks. Since I knew I'd probably need JavaScript for my scraper (Michael mentioned JQuery as one of the tools I'd probably have to look at), I thought it might be a nice introduction to JavaScript and JQuery. All and all, this went pretty well, and I liked the JavaScript-way of coding! Also, the concept of time seemed like an interesting thing to keep in mind for my thematic project.

So, I started experimenting with the Flickr API and managed to get a scraper running:

from PIL import Image
import flickrapi, random, urllib2, time, os, sys
 
useragent = "Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101"
 
api_key = '109fc258301afdfa9ad746118b0c1f0f'
flickr = flickrapi.FlickrAPI(api_key)
start='2010-01-01'
stop='2010-01-02'
n = 0
i = 1
pg = 1
width = 1
height = 1
ext = '.jpg'

folder = 'sky'


# what are you searching for?
tagSearch = 'sky scraper'


total = flickr.photos_search(content_type='1', min_taken_date=start, max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",tag_mode="all")
perPage = flickr.photos_search(content_type='1',min_taken_date=start,max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",tag_mode="all",page=pg)

# amount of pages
pgs = total[0].attrib['pages']
print '=== total pages: '+str(pgs)+' ==='

while True:
	try:
		if pg <= int(pgs):
			for photo in perPage[0]:
				if n < 100: 
					p = flickr.photos_getInfo(photo_id=photo.attrib['id'])
	
					# http://farm{farm-id}.static.flickr.com/{server-id}/{id}_{secret}.jpg
					href = 'http://farm'+p[0].attrib['farm']+'.static.flickr.com/'+p[0].attrib['server']+'/'+p[0].attrib['id']+'_'+p[0].attrib['secret']+'.jpg'
					request = urllib2.Request(href, None, {'User-Agent': useragent})
					remotefile = urllib2.urlopen(request)
					print 'Downloading: photo '+str(i)+' - pg '+str(pg)+'\n('+href+')'
	
					# create/open folder save image
					if not os.path.exists(folder):
						os.makedirs(folder)
						os.makedirs(folderEdit)					
					localfile = open(folder+'/'+str(i)+ext, "wb")
					localfile.write(remotefile.read())
					localfile.close()
	
					i += 1
					n += 1
			n = 0
			print '--- page '+str(pg)+' processed ---\n--- starting page '+str(pg+1)+'---'
			pg += 1
			perPage = flickr.photos_search(content_type='1',min_taken_date=start,max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",page=pg)
	except Exception, theError:
		print '>>>>> Error >>>>>\n'+str(theError)+'\n<<<<< Error <<<<<'
		sys.exc_clear()

Due to a certain fixation on time (the Speeltijd was haunting me), I was looking at ways to create an "image" clock, based on the taken_date and geo information of the photos I scraped. The results gave me a nice overview on the amount of junk that gets uploaded every minute and the inaccuracy of the taken_date format. It was clear to me pretty fast that this route was a dead end.

Astounded by the vast amount of pictures that get added on Flickr each day, I got interested in somehow reducing that mass into a certain 'essence'. Working with Python Image Library, I added a piece to my code that reduced the images I scraped (based on tagging this time) to a 1x1 pixel. These pixels have the average color value of the whole image.

from PIL import Image
import flickrapi, random, urllib2, time, os, sys
 
useragent = "Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101"
 
api_key = '109fc258301afdfa9ad746118b0c1f0f'
flickr = flickrapi.FlickrAPI(api_key)
start='2010-01-01'
stop='2010-01-02'
n = 0
i = 1
pg = 1
width = 1
height = 1
ext = '.jpg'

folder = 'sky'
folderEdit= 'sky_edit'


# what are you searching for?
tagSearch = 'sky scraper'


total = flickr.photos_search(content_type='1', min_taken_date=start, max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",tag_mode="all")
perPage = flickr.photos_search(content_type='1',min_taken_date=start,max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",tag_mode="all",page=pg)

# amount of pages
pgs = total[0].attrib['pages']
print '=== total pages: '+str(pgs)+' ==='

while True:
	try:
		if pg <= int(pgs):
			for photo in perPage[0]:
				if n < 100: 
					p = flickr.photos_getInfo(photo_id=photo.attrib['id'])
	
					# http://farm{farm-id}.static.flickr.com/{server-id}/{id}_{secret}.jpg
					href = 'http://farm'+p[0].attrib['farm']+'.static.flickr.com/'+p[0].attrib['server']+'/'+p[0].attrib['id']+'_'+p[0].attrib['secret']+'.jpg'
					request = urllib2.Request(href, None, {'User-Agent': useragent})
					remotefile = urllib2.urlopen(request)
					print 'Downloading: photo '+str(i)+' - pg '+str(pg)+'\n('+href+')'
	
					# create/open folder save original image
					if not os.path.exists(folder):
						os.makedirs(folder)
						os.makedirs(folderEdit)					
					localfile = open(folder+'/'+str(i)+ext, "wb")
					localfile.write(remotefile.read())
					localfile.close()
	
					# open image and resize to 1x1px
					im = Image.open(folder+'/'+str(i)+ext)
					im1 = im.resize((width,height), Image.ANTIALIAS)
					im1.save(folderEdit+'/'+str(i)+ext)
	
					i += 1
					n += 1
			n = 0
			print '--- page '+str(pg)+' processed ---\n--- starting page '+str(pg+1)+'---'
			pg += 1
			perPage = flickr.photos_search(content_type='1',min_taken_date=start,max_taken_date=stop,tags=tagSearch,sort="date-taken-asc",page=pg)
	except Exception, theError:
		print '>>>>> Error >>>>>\n'+str(theError)+'\n<<<<< Error <<<<<'
		sys.exc_clear()

<image of pixel folder>

This time, the results indicated people's love for tags, lots of them, the more the merrier, whether they apply to the image or not. The idea was to use these pixels to create an image that resembled the essence of a huge collection, and maybe distinguish patterns depending on time, place or tag. Although this was definitely possible, the concept of a mere (and probably not very successful) data-visualization occurred to me as rather insufficient and uninteresting. Also, I had clearly drifted away from my initial idea. It felt like I was pretty lost at sea (accompanied by Inge, who was looking for sea-monsters apparently) .

Crawling back

In an attempt to get back (or at least closer) to my initial idea, I started looking for possible routes that related to collage techniques and found-footage. Somehow, the words crawling and scraping occurred to me as something really physical, as if you would break something, get on the floor look for the scattered bits and pieces and scrape them back together. From this perspective, I thought it might be nice to look at things that get cut up and spread (over the Internet or through other means), and ways in which these pieces get 'glued' back together, sampled, remixed to new forms. With the concept of time still in the back of my head I stumbled upon Christian Marclay's The Clock, an artwork that combines clips from thousands of films into a 24-hour video collage that also functions as a working clock:

{{#ev:youtube|Y8svkK7d7sY|500}}


The re-usage of existing text and imagery into new artpieces brought me to the User Guide to Détournement of Guy Debord and Gil J Wolman. While reading this text, the idea of the self-generating webpage that creates new combinations out of found and decontextualized elements kept coming back to me.

Any elements, no matter where they are taken from, can be used to make new combinations. The discoveries of modern poetry regarding the analogical structure of images demonstrate that when two objects are brought together, no matter how far apart their original contexts may be, a relationship is always formed. (Debord & Wolman, 1956)

Since this text was written in a pre-digital era, I wondered what the implications are when you apply it to a world in which digital images and text are omnipresent. Particularly when you look at the description of what Debord and Wolman call minor détournement:

Minor détournement is the détournement of an element which has no importance in itself and which thus draws all its meaning from the new context in which it has been placed. (Debord & Wolman, 1956)

This made me think of my 1px Flickr images. Ultimately, pixels are the most neutral elements to which a digital image can be reduced and that draw their meaning from the context (their relation to and combination with other pixels and their position in a grid) in which they have been placed.

Decision time

Currently I'm still figuring out which direction to take this. The web-détournement page still seems like a valuable option, but it has proven to be, for me at least, a hard nut to crack. I have also been experimenting (although it's still very green) with Python's Image Library to extract pixel data from images with the option of using this data to recompose other images. More specifically I was thinking of ways to use the pixels of frames from different movies to recompose frames of another movie. A kind of digital meta-détournement (awful term, i know).