User:Eleanorg/1.2/Forbidden Pixels/Receiving URL where data is hosted: Difference between revisions
(Created page with "The site will need to ask participants for the URL of the pixel data they're hosting.This is my first time doing form processing in python: the html form asks for the url; it pas...") |
No edit summary |
||
Line 25: | Line 25: | ||
</source> | </source> | ||
==Python script grabbing URL submitted== | ==Basic Python script grabbing URL submitted== | ||
Saved as .cgi within cgi-bin on server. | Saved as .cgi within cgi-bin on server. | ||
Line 53: | Line 53: | ||
print url | print url | ||
print """ | |||
</body> | |||
</html>""" | |||
</source> | |||
==Python script grabbing inputted URL then scraping it== | |||
<source lang="python"> | |||
#!/usr/bin/python | |||
#-*- coding:utf-8 -*- | |||
import cgi, re, urllib2 | |||
import cgitb; cgitb.enable() | |||
# scrapes pixel data string from a URL submitted by user in an html form; assigns the result to appropriate variables. | |||
#------------- get URL from input form -------------------# | |||
form = cgi.FieldStorage() # Grabs whatever input comes from form | |||
url = form['url'].value # assigns form's url field input to var 'url' | |||
#------------- scrape webpage----------------------------# | |||
text = urllib2.urlopen(url).read() # reads page at the specified URL | |||
#-------------- extract the string with regex------------# | |||
# string is in format: | |||
# Pixel position:500.001; Color:rgba(222,221,217,1) | |||
for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text): | |||
match = str(x) # only matches what is within capture parentheses. How to match whole string, even when bits within it are captured? | |||
xPos = str(x[0]) | |||
yPos = str(x[1]) | |||
color= str(x[2]) | |||
#-------------- print match -----------------------------# | |||
htmlHeader = """<!DOCTYPE html> | |||
<html> | |||
<head> | |||
<title>A form talking to a python script</title> | |||
<style type="text/css"> | |||
</style> | |||
</head> | |||
<body>""" | |||
print "Content-Type: text/html" | |||
print | |||
print htmlHeader | |||
if xPos: | |||
print | |||
print "x position is: " + xPos + "<br />" | |||
print "y position is: " + yPos + "<br />" | |||
print "color is: " + color | |||
else: # not working - how to avoid 'match is undefined' error if regex fails to match anything? | |||
print "this pixel is not currently hosted" | |||
print """ | print """ | ||
</body> | </body> | ||
</html>""" | </html>""" | ||
</source> | </source> |
Revision as of 14:46, 26 March 2012
The site will need to ask participants for the URL of the pixel data they're hosting.This is my first time doing form processing in python: the html form asks for the url; it passes it to a simple script. Next step is for that script to add it to the list of URLs, and to check whether there is a string there matching the regex.
URL input form
<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>
<form action="form.cgi" name="inputForm"> <!--pushes form input to form.py script -->
Paste url:
<input name="url">
<input type="submit">
</form>
</body>
</html>
Basic Python script grabbing URL submitted
Saved as .cgi within cgi-bin on server.
#!/usr/bin/python
import cgi
import cgitb; cgitb.enable() #what do these do?
htmlHeader = """<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>"""
print "Content-Type: text/html" //important - won't print in browser without these 2 lines
print
print htmlHeader
form = cgi.FieldStorage() //Grabs whatever input comes from form
url = form['url'].value // assigns form's url field input to var 'url'
print url
print """
</body>
</html>"""
Python script grabbing inputted URL then scraping it
#!/usr/bin/python
#-*- coding:utf-8 -*-
import cgi, re, urllib2
import cgitb; cgitb.enable()
# scrapes pixel data string from a URL submitted by user in an html form; assigns the result to appropriate variables.
#------------- get URL from input form -------------------#
form = cgi.FieldStorage() # Grabs whatever input comes from form
url = form['url'].value # assigns form's url field input to var 'url'
#------------- scrape webpage----------------------------#
text = urllib2.urlopen(url).read() # reads page at the specified URL
#-------------- extract the string with regex------------#
# string is in format:
# Pixel position:500.001; Color:rgba(222,221,217,1)
for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text):
match = str(x) # only matches what is within capture parentheses. How to match whole string, even when bits within it are captured?
xPos = str(x[0])
yPos = str(x[1])
color= str(x[2])
#-------------- print match -----------------------------#
htmlHeader = """<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>"""
print "Content-Type: text/html"
print
print htmlHeader
if xPos:
print
print "x position is: " + xPos + "<br />"
print "y position is: " + yPos + "<br />"
print "color is: " + color
else: # not working - how to avoid 'match is undefined' error if regex fails to match anything?
print "this pixel is not currently hosted"
print """
</body>
</html>"""