User:Eleanorg/1.2/Forbidden Pixels/Receiving URL where data is hosted

The site will need to ask participants for the URL of the pixel data they're hosting.This is my first time doing form processing in python: the html form asks for the url; it passes it to a simple script. Next step is for that script to add it to the list of URLs, and to check whether there is a string there matching the regex.

Go here to test the input form: http://pzwart3.wdka.hro.nl/~egreenhalgh/inputForm.html

URL input form

<!DOCTYPE html>
<html>
  <head>
    <title>A form talking to a python script</title>
    <style type="text/css">
    </style>
  </head>
  
<body>

  <form action="form.cgi" name="inputForm">  <!--pushes form input to form.py script -->
    Paste url: 
    <input name="url">
    <input type="submit">

  </form>
</body>

</html>

Basic Python script grabbing URL submitted

Saved as .cgi within cgi-bin on server.

#!/usr/bin/python

import cgi
import cgitb; cgitb.enable() #what do these do?



htmlHeader = """<!DOCTYPE html>
<html>
  <head>
    <title>A form talking to a python script</title>
    <style type="text/css">
    </style>
  </head>
  <body>"""

print "Content-Type: text/html" //important - won't print in browser without these 2 lines
print 
print htmlHeader

form =  cgi.FieldStorage()	//Grabs whatever input comes from form
url = form['url'].value		// assigns form's url field input to var 'url'
print url
 
print """
  </body>
</html>"""

Python script grabbing inputted URL then scraping it

#!/usr/bin/python
#-*- coding:utf-8 -*-

import cgi, re, urllib2
import cgitb; cgitb.enable()

 
# scrapes pixel data string from a URL submitted by user in an html form; assigns the result to appropriate variables.
 

#------------- get URL from input form -------------------#

form = cgi.FieldStorage()		# Grabs whatever input comes from form
url = form['url'].value			# assigns form's url field input to var 'url'

#------------- scrape webpage----------------------------#

text = urllib2.urlopen(url).read()	# reads page at the specified URL

 
#-------------- extract the string with regex------------#

# string is in format: 
# Pixel position:500.001; Color:rgba(222,221,217,1)
 
for x in re.findall(r"Pixel position:(\d\d\d).(\d\d\d)\;\ Color:(rgba\(.*\))", text):
    match = str(x)    # only matches what is within capture parentheses. How to match whole string, even when bits within it are captured?
    xPos = str(x[0])  
    yPos = str(x[1])
    color= str(x[2])           
 
#-------------- print match -----------------------------#

htmlHeader = """<!DOCTYPE html>
<html>
  <head>
    <title>A form talking to a python script</title>
    <style type="text/css">
    </style>
  </head>
  <body>"""

print "Content-Type: text/html"
print 
print htmlHeader
 
if xPos:
  print 
  print "x position is: " + xPos + "<br />"
  print "y position is: " + yPos + "<br />"
  print "color is: " + color
else:					# not working - how to avoid 'match is undefined' error if regex fails to match anything?
  print "this pixel is not currently hosted"
 


print """
  </body>
</html>"""