Prototyping 21 May 2013: Difference between revisions

Latest revision as of 17:40, 21 May 2013

Tree

File:Yggdrasil.jpg File:Esquema del universo segun la mitologia nordica.png File:Norse Nine Worlds.jpg

Tree

A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:

File systems (folders and files)
"Decision trees" used to classify / sort
3D graphics (for efficiently drawing surfaces in a realistic way)
Documents, such as a web page (via ElementTree)

ElementTree

From the documentation:

Each element has a number of properties associated with it:

a tag which is a string identifying what kind of data this element represents (the element type, in other words).
a number of attributes, stored in a Python dictionary.
a text string.
an optional tail string.
a number of child elements, stored in a Python sequence

To create an element instance, use the Element constructor or the SubElement() factory function.

http://docs.python.org/2/library/xml.etree.elementtree.html

ElementTree

In an ElementTree the fundamental unit is the "Element"

An Element has:

.tag (a string representing the name of the tag, like "p" or "script")
.attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
.text (String of text contents of the node)
.tail (if there's text after child tags, it'd be here)

In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.

Iteration to access contained "child" elements

Tree Traversal

http://en.wikipedia.org/wiki/Tree_traversal

File:Sorted binary tree preorder.svg File:Sorted binary tree breadth-first traversal.svg

http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree

Breadth-first

http://lxml.de/api.html

queue = deque([root])
while queue:
    el = queue.popleft()  # pop next element
    queue.extend(el)      # append its children
    print(el.tag)

Depth-first traversal

Walking the tree

def walk (node):
    print node.tag
    for child in node:
        walk(child)

A Style Browser

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
 
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
    border: 1px solid black;
    padding: 20px;
}
</style>
"""
 
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
    def walk(e):
        print "<div>"
        print e.tag
        style = e.get("style")
        if style:
            print "<span style='"+style+"'>",style,"</span>"
        if e.text:
           print cgi.escape(e.text).encode("utf-8")
        if e.tail:
           print cgi.escape(e.tail).encode("utf-8")
        for child in e:
            walk(child)
        print "</div>"
    walk(html)

@@ Line 1: / Line 1: @@
-* Tree
+== Tree ==
-* Walking a tree
+[[File:Yggdrasil.jpg|300px|link=http://en.wikipedia.org/wiki/Yggdrasil]][[File:Esquema_del_universo_segun_la_mitologia_nordica.png|300px]][[File:Norse_Nine_Worlds.jpg|300px]]
-* Recursive traversal
-* Depth-first traversal
+== Tree ==
-* Breadth-first traversal
+A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:
+* File systems (folders and files)
+* "Decision trees" used to classify / sort
+* 3D graphics (for efficiently drawing surfaces in a realistic way)
+* Documents, such as a web page (via ElementTree)
 == ElementTree ==
-http://docs.python.org/2/library/xml.etree.elementtree.html
+From the [http://docs.python.org/2/library/xml.etree.elementtree.html documentation]:
-== Tree Traversal ==
+Each element has a number of properties associated with it:
-http://en.wikipedia.org/wiki/Tree_traversal
+* a tag which is a string identifying what kind of data this element represents (the element type, in other words).
+* a number of attributes, stored in a Python dictionary.
+* a text string.
+* an optional tail string.
+* a number of child elements, stored in a Python sequence
-[[File:Sorted_binary_tree_preorder.svg|500px]][[Sorted_binary_tree_breadth-first_traversal.svg|500px]]
+To create an element instance, use the Element constructor or the SubElement() factory function.
-http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree
+http://docs.python.org/2/library/xml.etree.elementtree.html
-== Walking the tree ==
+== ElementTree ==
-ElementTree the fundamental unit is the Element
+In an ElementTree the fundamental unit is the "Element"
-And Element has:
+An Element has:
 * .tag (a string representing the name of the tag, like "p" or "script")
 * .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
@@ Line 30: / Line 39: @@
 * Iteration to access contained "child" elements
-== Walk ==
+== Tree Traversal ==
+http://en.wikipedia.org/wiki/Tree_traversal
+[[File:Sorted_binary_tree_preorder.svg|500px]][[File:Sorted_binary_tree_breadth-first_traversal.svg|500px]]
+http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree
+== Breadth-first ==
+http://lxml.de/api.html
+<source lang="python">
+queue = deque([root])
+while queue:
+    el = queue.popleft()  # pop next element
+    queue.extend(el)      # append its children
+    print(el.tag)
+</source>
+== Depth-first traversal ==
+Walking the tree
 <source lang="python">
@@ Line 39: / Line 70: @@
 </source>
-== Extracting the text of a node ==
+== A Style Browser ==
-(function exists?)
-== How to show the structure in a more tree way? ==
+<source lang="python">
-center... breadth first traversal!
+#!/usr/bin/env python
+#-*- coding:utf-8 -*-
+import cgi, urllib2, html5lib, urlparse
+import cgitb; cgitb.enable()
+print "Content-type: text/html;charset=utf-8"
+print
+print """<style>
+div {
+    border: 1px solid black;
+    padding: 20px;
+}
+</style>
+"""
+q = cgi.FieldStorage()
+url = q.getvalue("url","http://pajiba.com")
+f = urllib2.urlopen(url)
+ct = f.info().get("content-type")
+if ct.startswith("text/html"):
+    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
+    def walk(e):
+        print "<div>"
+        print e.tag
+        style = e.get("style")
+        if style:
+            print "<span style='"+style+"'>",style,"</span>"
+        if e.text:
+           print cgi.escape(e.text).encode("utf-8")
+        if e.tail:
+           print cgi.escape(e.tail).encode("utf-8")
+        for child in e:
+            walk(child)
+        print "</div>"
+    walk(html)
-== Cherry picking ==
+</source>
-(collecting things while walking the tree)