User:Eleanorg/1.2/Forbidden Pixels/scraping pixel data

From XPUB & Lens-Based wiki
< User:Eleanorg‎ | 1.2/Forbidden Pixels
Revision as of 17:38, 18 March 2012 by Eleanorg (talk | contribs) (Created page with "Very simple scraping script. It goes to the webpage specified and scrapes a string which is analysed with regex. The important bits are assigned to variables after being extracte...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Very simple scraping script. It goes to the webpage specified and scrapes a string which is analysed with regex. The important bits are assigned to variables after being extracted (x position, y position, and color).

Questions:

  • How to specify what is printed when regex doesn't find a match? Currently just getting a 'foo is undefined' error message.
  • How to access the whole matched string of a regex, even when parts of it are within capture parentheses?


#!/usr/bin/python
#-*- coding:utf-8 -*-

import re, urllib2

# scrapes pixel data string from a web page and assigns the result to appropriate variables.

#------------- scrape webpage----------------------------#

url = "http://ox4.org/~nor/trials/hostedString.html"
text = urllib2.urlopen(url).read()
 
 
#-------------- extract the string with regex------------#


for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text):
    match = str(x)    # this currently only matches what is within capture parentheses. How to match whole string, even when bits within it are captured?
    xPos = str(x[0])  
    yPos = str(x[1])
    color= str(x[2])           
    
#-------------- print match -----------------------------#

if xPos:
  print "x position is: " + xPos
  print "y position is: " + yPos
  print "color is: " + color
else:						# not working - how to avoid 'match is undefined' error if regex fails to match anything?
  print "this pixel is not currently hosted"