Prototyping 21 May 2013

From Media Design: Networked & Lens-Based wiki
Jump to navigation Jump to search




A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:

  • File systems (folders and files)
  • "Decision trees" used to classify / sort
  • 3D graphics (for efficiently drawing surfaces in a realistic way)
  • Documents, such as a web page (via ElementTree)


From the documentation:

Each element has a number of properties associated with it:

  • a tag which is a string identifying what kind of data this element represents (the element type, in other words).
  • a number of attributes, stored in a Python dictionary.
  • a text string.
  • an optional tail string.
  • a number of child elements, stored in a Python sequence

To create an element instance, use the Element constructor or the SubElement() factory function.


In an ElementTree the fundamental unit is the "Element"

An Element has:

  • .tag (a string representing the name of the tag, like "p" or "script")
  • .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
  • .text (String of text contents of the node)
  • .tail (if there's text after child tags, it'd be here)

In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.

  • Iteration to access contained "child" elements

Tree Traversal




queue = deque([root])
while queue:
    el = queue.popleft()  # pop next element
    queue.extend(el)      # append its children

Depth-first traversal

Walking the tree

def walk (node):
    print node.tag
    for child in node:

A Style Browser

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
print "Content-type: text/html;charset=utf-8"
print """<style>
div {
    border: 1px solid black;
    padding: 20px;
q = cgi.FieldStorage()
url = q.getvalue("url","")
f = urllib2.urlopen(url)
ct ="content-type")
if ct.startswith("text/html"):
    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
    def walk(e):
        print "<div>"
        print e.tag
        style = e.get("style")
        if style:
            print "<span style='"+style+"'>",style,"</span>"
        if e.text:
           print cgi.escape(e.text).encode("utf-8")
        if e.tail:
           print cgi.escape(e.tail).encode("utf-8")
        for child in e:
        print "</div>"