Prototyping 21 May 2013

Tree

File:Yggdrasil.jpg File:Esquema del universo segun la mitologia nordica.png File:Norse Nine Worlds.jpg

Tree

A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:

File systems (folders and files)
"Decision trees" used to classify / sort
3D graphics (for efficiently drawing surfaces in a realistic way)
Documents, such as a web page (via ElementTree)

ElementTree

From the documentation:

Each element has a number of properties associated with it:

a tag which is a string identifying what kind of data this element represents (the element type, in other words).
a number of attributes, stored in a Python dictionary.
a text string.
an optional tail string.
a number of child elements, stored in a Python sequence

To create an element instance, use the Element constructor or the SubElement() factory function.

http://docs.python.org/2/library/xml.etree.elementtree.html

ElementTree

In an ElementTree the fundamental unit is the "Element"

An Element has:

.tag (a string representing the name of the tag, like "p" or "script")
.attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
.text (String of text contents of the node)
.tail (if there's text after child tags, it'd be here)

In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.

Iteration to access contained "child" elements

Tree Traversal

http://en.wikipedia.org/wiki/Tree_traversal

File:Sorted binary tree preorder.svg File:Sorted binary tree breadth-first traversal.svg

http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree

Breadth-first

http://lxml.de/api.html

queue = deque([root])
while queue:
    el = queue.popleft()  # pop next element
    queue.extend(el)      # append its children
    print(el.tag)

Depth-first traversal

Walking the tree

def walk (node):
    print node.tag
    for child in node:
        walk(child)

A Style Browser

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
 
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
    border: 1px solid black;
    padding: 20px;
}
</style>
"""
 
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
    def walk(e):
        print "<div>"
        print e.tag
        style = e.get("style")
        if style:
            print "<span style='"+style+"'>",style,"</span>"
        if e.text:
           print cgi.escape(e.text).encode("utf-8")
        if e.tail:
           print cgi.escape(e.tail).encode("utf-8")
        for child in e:
            walk(child)
        print "</div>"
    walk(html)