PythonEscaping

From XPUB & Lens-Based wiki

IMPORTED

Python provides a number of useful utilities to help "escape" data when placing in HTML (attributes / forms), URLs, or MySQL queries. So you don't have to write your own (plus these implementations ought to be more tested and secure than your usual quick and dirty solution ;).

Creating & working with URLs

The urllib library provides many useful functions for creating URLs (useful to generate a CGI link for instance), or unparsing an existing link into it's parts.

Mapping a python dictionary to a URL query

urllib.urlencode(query[, doseq])

imgsrc = "thumb.cgi?"+urllib.urlencode({'f': filepath, 't': 5})
print """<img src="%s" />""" % imgsrc


quote(string[, safe])

Replace special characters in string using the "%xx" escape. Letters, digits, and the characters "_.-" are never quoted. The optional safe parameter specifies additional characters that should not be quoted -- its default value is '/'.

Example: quote('/~connolly/') yields '/%7econnolly/'.

quote_plus(string[, safe])

Like quote(), but also replaces spaces by plus signs, as required for quoting HTML form values. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to '/'.

unquote(string)

Replace "%xx" escapes by their single-character equivalent.

Example: unquote('/%7Econnolly/') yields '/~connolly/'.

unquote_plus(string)

Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.

keep in mind

  • To make sure slashes get quoted, you need to override the default "safe" value, as in:
nexturl = "goto.cgi?path=%s" % urllib.quote(mypath, '')

source: http://docs.python.org/lib/module-urllib.html

Displaying text on a web page, escaping special HTML characters

cgi.escape

You want to display text on a web page that might contain special characters, like "<" or ">" and want these to appear as is without being mis-interpreted as parts of an HTML tag. Use cgi.escape:

cgi.escape(text)
escape(s[, quote])

Convert the characters "&", "<" and ">" in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character (") is also translated; this helps for inclusion in an HTML attribute value, as in <A HREF="...">. If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead.

source: http://docs.python.org/lib/node562.html

xml.sax.saxutils

quoteattr(data[, entities])
Similar to escape(), but also prepares data to be used as an attribute value. The return value is a quoted version of data with any additional required replacements. quoteattr() will select a quote character based on the content of data, attempting to avoid encoding any quote characters in the string. If both single- and double-quote characters are already in data, the double-quote characters will be encoded and data will be wrapped in double-quotes. The resulting string can be used directly as an attribute value.
This function is useful when generating attribute values for HTML or any SGML using the reference concrete syntax. New in version 2.2.

keep in mind

  • do not put your %s in quotes, the quoteattr adds them for you
  • quoteattr expects data to be a string, and freaks out if you give it, for instance, an integer; if you have data that might be numeric, wrap it in str() before passing to quoteattr, as in quoteattr(str(foo))


escape(data[, entities])
Escape "&", "<", and ">" in a string of data.
You can escape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value.
unescape(data[, entities])
Unescape "&", "<", and ">" in a string of data.
You can unescape other strings of data by passing a dictionary as the optional entities parameter. The keys and values must all be strings; each key will be replaced with its corresponding value.
New in version 2.3.

source: http://docs.python.org/lib/module-xml.sax.saxutils.html

MySQLdb

Python's MySQLdb library includes the ability to escape values in queries. You should definitely make use of them as they also protect against injection attacks when producing queries from say form/url supplied data.

a "real world" example:

q = "UPDATE "+MEDIA_TABLE+" SET mediatype='video', filesize=%s, filelastmod=%s, width=%s, height=%s, duration=%s, fps=%s, videoinfo=%s, audioinfo=%s WHERE filename=%s"

cursor.execute(q, (filesize, filelastmod, ffmpeg.get('width'), ffmpeg.get('height'), ffmpeg.get('duration'), ffmpeg.get('fps'), ffmpeg.get('video'), ffmpeg.get('audio'), path))

keep in mind

  • You use %s for all parameters regarless of their type (so also for numbers). The library handles the proper type conversion.
  • You do NOT use actual quotes (' or ") in the query. The library adds these as necessary.