User:Eleanorg/1.2/Forbidden Pixels/Receiving URL where data is hosted
< User:Eleanorg | 1.2/Forbidden Pixels
Revision as of 23:00, 28 March 2012 by Eleanorg (talk | contribs) (→Python script grabbing inputted URL then scraping it)
The site will need to ask participants for the URL of the pixel data they're hosting.This is my first time doing form processing in python: the html form asks for the url; it passes it to a simple script. Next step is for that script to add it to the list of URLs, and to check whether there is a string there matching the regex.
Go here to test the input form: http://pzwart3.wdka.hro.nl/~egreenhalgh/inputForm.html
URL input form
<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>
<form action="form.cgi" name="inputForm"> <!--pushes form input to form.py script -->
Paste url:
<input name="url">
<input type="submit">
</form>
</body>
</html>
Basic Python script grabbing URL submitted
Saved as .cgi within cgi-bin on server.
#!/usr/bin/python
import cgi
import cgitb; cgitb.enable() #what do these do?
htmlHeader = """<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>"""
print "Content-Type: text/html" //important - won't print in browser without these 2 lines
print
print htmlHeader
form = cgi.FieldStorage() //Grabs whatever input comes from form
url = form['url'].value // assigns form's url field input to var 'url'
print url
print """
</body>
</html>"""
Python script grabbing inputted URL then scraping it
#!/usr/bin/python
#-*- coding:utf-8 -*-
import cgi, re, urllib2
import cgitb; cgitb.enable()
# scrapes pixel data string from a URL submitted by user in an html form; assigns the result to appropriate variables.
#------------- get URL from input form -------------------#
form = cgi.FieldStorage() # Grabs whatever input comes from form
url = form['url'].value # assigns form's url field input to var 'url'
#------------- scrape webpage----------------------------#
text = urllib2.urlopen(url).read() # reads page at the specified URL
#-------------- extract the string with regex------------#
# string is in format:
# Pixel position:500.001; Color:rgba(222,221,217,1)
for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text):
match = str(x) # only matches what is within capture parentheses. How to match whole string, even when bits within it are captured?
xPos = str(x[0])
yPos = str(x[1])
color= str(x[2])
#-------------- print match -----------------------------#
htmlHeader = """<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>"""
print "Content-Type: text/html"
print
print htmlHeader
if xPos:
print
print "x position is: " + xPos + "<br />"
print "y position is: " + yPos + "<br />"
print "color is: " + color
else: # not working - how to avoid 'match is undefined' error if regex fails to match anything?
print "this pixel is not currently hosted"
print """
</body>
</html>"""
Python script writing submitted URL to a text file, for later use
#!/usr/bin/python
#-*- coding:utf-8 -*-
import cgi
import cgitb; cgitb.enable()
# scrapes pixel data string from a URL submitted by user in an html form; assigns the result to appropriate variables.
#------------- get URL from input form -------------------#
form = cgi.FieldStorage() # Grabs whatever input comes from form
url = form.getvalue("url", "http://ox4.org/~nor/trials/hostedString.html") # assigns form's input to var 'url'. url on the right is a default value that will be printed for testing if nothing is recieved from the form
#------------- save url to a text file --------------------#
f = open("data/urls.txt", 'a') # opens text file in apend mode - 'a'
f.write(url + '\n')
f.close()
#------------- print acknowledgement ---------------------#
htmlHeader = """<!DOCTYPE html>
<html>
<head>
<title>A form talking to a python script</title>
<style type="text/css">
</style>
</head>
<body>"""
print "Content-Type: text/html"
print
print htmlHeader
print "Thanks, url submitted. The pixel hosted there will be added to the image soon."
print """
</body>
</html>"""