User:Eleanorg/1.2/Forbidden Pixels/scraping pixel data
Very simple scraping script. It goes to the webpage specified and scrapes a string which is analysed with regex. The important bits are assigned to variables after being extracted (x position, y position, and color).
Questions:
- How to specify what is printed when regex doesn't find a match? Currently just getting a 'foo is undefined' error message.
- How to access the whole matched string of a regex, even when parts of it are within capture parentheses?
#!/usr/bin/python
#-*- coding:utf-8 -*-
import re, urllib2
# scrapes pixel data string from a web page and assigns the result to appropriate variables.
#------------- scrape webpage----------------------------#
url = "http://ox4.org/~nor/trials/hostedString.html"
text = urllib2.urlopen(url).read()
#-------------- extract the string with regex------------#
for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text):
match = str(x) # this currently only matches what is within capture parentheses. How to match whole string, even when bits within it are captured?
xPos = str(x[0])
yPos = str(x[1])
color= str(x[2])
#-------------- print match -----------------------------#
if xPos:
print "x position is: " + xPos
print "y position is: " + yPos
print "color is: " + color
else: # not working - how to avoid 'match is undefined' error if regex fails to match anything?
print "this pixel is not currently hosted"