Prototyping 21 May 2013: Difference between revisions
(→Tree) |
No edit summary |
||
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Tree == | == Tree == | ||
[[File:Yggdrasil.jpg|300px|link=http://en.wikipedia.org/wiki/Yggdrasil]][[File:Esquema_del_universo_segun_la_mitologia_nordica.png|300px]] | [[File:Yggdrasil.jpg|300px|link=http://en.wikipedia.org/wiki/Yggdrasil]][[File:Esquema_del_universo_segun_la_mitologia_nordica.png|300px]][[File:Norse_Nine_Worlds.jpg|300px]] | ||
== Tree == | == Tree == | ||
Line 53: | Line 47: | ||
http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree | http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree | ||
== Walking the tree | == Breadth-first == | ||
http://lxml.de/api.html | |||
<source lang="python"> | |||
queue = deque([root]) | |||
while queue: | |||
el = queue.popleft() # pop next element | |||
queue.extend(el) # append its children | |||
print(el.tag) | |||
</source> | |||
== Depth-first traversal == | |||
Walking the tree | |||
<source lang="python"> | <source lang="python"> | ||
Line 62: | Line 70: | ||
</source> | </source> | ||
== | == A Style Browser == | ||
( | <source lang="python"> | ||
#!/usr/bin/env python | |||
#-*- coding:utf-8 -*- | |||
import cgi, urllib2, html5lib, urlparse | |||
import cgitb; cgitb.enable() | |||
print "Content-type: text/html;charset=utf-8" | |||
print | |||
print """<style> | |||
div { | |||
border: 1px solid black; | |||
padding: 20px; | |||
} | |||
</style> | |||
""" | |||
q = cgi.FieldStorage() | |||
url = q.getvalue("url","http://pajiba.com") | |||
f = urllib2.urlopen(url) | |||
ct = f.info().get("content-type") | |||
if ct.startswith("text/html"): | |||
html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False) | |||
def walk(e): | |||
print "<div>" | |||
print e.tag | |||
style = e.get("style") | |||
if style: | |||
print "<span style='"+style+"'>",style,"</span>" | |||
if e.text: | |||
print cgi.escape(e.text).encode("utf-8") | |||
if e.tail: | |||
print cgi.escape(e.tail).encode("utf-8") | |||
for child in e: | |||
walk(child) | |||
print "</div>" | |||
walk(html) | |||
</source> |
Latest revision as of 17:40, 21 May 2013
Tree
File:Yggdrasil.jpgFile:Esquema del universo segun la mitologia nordica.pngFile:Norse Nine Worlds.jpg
Tree
A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:
- File systems (folders and files)
- "Decision trees" used to classify / sort
- 3D graphics (for efficiently drawing surfaces in a realistic way)
- Documents, such as a web page (via ElementTree)
ElementTree
From the documentation:
Each element has a number of properties associated with it:
- a tag which is a string identifying what kind of data this element represents (the element type, in other words).
- a number of attributes, stored in a Python dictionary.
- a text string.
- an optional tail string.
- a number of child elements, stored in a Python sequence
To create an element instance, use the Element constructor or the SubElement() factory function.
http://docs.python.org/2/library/xml.etree.elementtree.html
ElementTree
In an ElementTree the fundamental unit is the "Element"
An Element has:
- .tag (a string representing the name of the tag, like "p" or "script")
- .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
- .text (String of text contents of the node)
- .tail (if there's text after child tags, it'd be here)
In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.
- Iteration to access contained "child" elements
Tree Traversal
http://en.wikipedia.org/wiki/Tree_traversal
File:Sorted binary tree preorder.svgFile:Sorted binary tree breadth-first traversal.svg
Breadth-first
queue = deque([root])
while queue:
el = queue.popleft() # pop next element
queue.extend(el) # append its children
print(el.tag)
Depth-first traversal
Walking the tree
def walk (node):
print node.tag
for child in node:
walk(child)
A Style Browser
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
border: 1px solid black;
padding: 20px;
}
</style>
"""
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
def walk(e):
print "<div>"
print e.tag
style = e.get("style")
if style:
print "<span style='"+style+"'>",style,"</span>"
if e.text:
print cgi.escape(e.text).encode("utf-8")
if e.tail:
print cgi.escape(e.tail).encode("utf-8")
for child in e:
walk(child)
print "</div>"
walk(html)