Prototyping 21 May 2013
Tree
File:Yggdrasil.jpgFile:Esquema del universo segun la mitologia nordica.pngFile:Norse Nine Worlds.jpg
Tree
A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:
- File systems (folders and files)
- "Decision trees" used to classify / sort
- 3D graphics (for efficiently drawing surfaces in a realistic way)
- Documents, such as a web page (via ElementTree)
ElementTree
From the documentation:
Each element has a number of properties associated with it:
- a tag which is a string identifying what kind of data this element represents (the element type, in other words).
- a number of attributes, stored in a Python dictionary.
- a text string.
- an optional tail string.
- a number of child elements, stored in a Python sequence
To create an element instance, use the Element constructor or the SubElement() factory function.
http://docs.python.org/2/library/xml.etree.elementtree.html
ElementTree
In an ElementTree the fundamental unit is the "Element"
An Element has:
- .tag (a string representing the name of the tag, like "p" or "script")
- .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
- .text (String of text contents of the node)
- .tail (if there's text after child tags, it'd be here)
In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.
- Iteration to access contained "child" elements
Tree Traversal
http://en.wikipedia.org/wiki/Tree_traversal
File:Sorted binary tree preorder.svgFile:Sorted binary tree breadth-first traversal.svg
Breadth-first
queue = deque([root])
while queue:
el = queue.popleft() # pop next element
queue.extend(el) # append its children
print(el.tag)
Depth-first traversal
Walking the tree
def walk (node):
print node.tag
for child in node:
walk(child)
A Style Browser
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
border: 1px solid black;
padding: 20px;
}
</style>
"""
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
def walk(e):
print "<div>"
print e.tag
style = e.get("style")
if style:
print "<span style='"+style+"'>",style,"</span>"
if e.text:
print cgi.escape(e.text).encode("utf-8")
if e.tail:
print cgi.escape(e.tail).encode("utf-8")
for child in e:
walk(child)
print "</div>"
walk(html)