Prototyping 21 May 2013: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
* Tree
== Tree ==
* Walking a tree
[[File:Yggdrasil.jpg|300px|link=http://en.wikipedia.org/wiki/Yggdrasil]][[File:Esquema_del_universo_segun_la_mitologia_nordica.png|300px]][[File:Norse_Nine_Worlds.jpg|300px]]
* Recursive traversal
 
* Depth-first traversal
== Tree ==
* Breadth-first traversal
 
A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:
* File systems (folders and files)
* "Decision trees" used to classify / sort
* 3D graphics (for efficiently drawing surfaces in a realistic way)
* Documents, such as a web page (via ElementTree)


== ElementTree ==
== ElementTree ==


http://docs.python.org/2/library/xml.etree.elementtree.html
From the [http://docs.python.org/2/library/xml.etree.elementtree.html documentation]:


== Tree Traversal ==
Each element has a number of properties associated with it:


http://en.wikipedia.org/wiki/Tree_traversal
* a tag which is a string identifying what kind of data this element represents (the element type, in other words).
* a number of attributes, stored in a Python dictionary.
* a text string.
* an optional tail string.
* a number of child elements, stored in a Python sequence


[[File:Sorted_binary_tree_preorder.svg|500px]][[Sorted_binary_tree_breadth-first_traversal.svg|500px]]
To create an element instance, use the Element constructor or the SubElement() factory function.


http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree
http://docs.python.org/2/library/xml.etree.elementtree.html


== Walking the tree ==
== ElementTree ==


ElementTree the fundamental unit is the Element
In an ElementTree the fundamental unit is the "Element"


And Element has:
An Element has:
* .tag (a string representing the name of the tag, like "p" or "script")
* .tag (a string representing the name of the tag, like "p" or "script")
* .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
* .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
Line 30: Line 39:
* Iteration to access contained "child" elements
* Iteration to access contained "child" elements


== Walk ==
== Tree Traversal ==
 
http://en.wikipedia.org/wiki/Tree_traversal
 
[[File:Sorted_binary_tree_preorder.svg|500px]][[File:Sorted_binary_tree_breadth-first_traversal.svg|500px]]
 
http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree
 
== Breadth-first ==
 
http://lxml.de/api.html
 
<source lang="python">
queue = deque([root])
while queue:
    el = queue.popleft()  # pop next element
    queue.extend(el)      # append its children
    print(el.tag)
</source>
 
== Depth-first traversal ==
 
Walking the tree


<source lang="python">
<source lang="python">
Line 39: Line 70:
</source>
</source>


== Extracting the text of a node ==
== A Style Browser ==
(function exists?)


== How to show the structure in a more tree way? ==
<source lang="python">
center... breadth first traversal!
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
    border: 1px solid black;
    padding: 20px;
}
</style>
"""
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
    def walk(e):
        print "<div>"
        print e.tag
        style = e.get("style")
        if style:
            print "<span style='"+style+"'>",style,"</span>"
        if e.text:
          print cgi.escape(e.text).encode("utf-8")
        if e.tail:
          print cgi.escape(e.tail).encode("utf-8")
        for child in e:
            walk(child)
        print "</div>"
    walk(html)


== Cherry picking ==
</source>
(collecting things while walking the tree)

Latest revision as of 18:40, 21 May 2013

Tree

File:Yggdrasil.jpgFile:Esquema del universo segun la mitologia nordica.pngFile:Norse Nine Worlds.jpg

Tree

A tree, in a computer science sense, is a hierarchical representation and means of accessing information. It's used in relation to:

  • File systems (folders and files)
  • "Decision trees" used to classify / sort
  • 3D graphics (for efficiently drawing surfaces in a realistic way)
  • Documents, such as a web page (via ElementTree)

ElementTree

From the documentation:

Each element has a number of properties associated with it:

  • a tag which is a string identifying what kind of data this element represents (the element type, in other words).
  • a number of attributes, stored in a Python dictionary.
  • a text string.
  • an optional tail string.
  • a number of child elements, stored in a Python sequence

To create an element instance, use the Element constructor or the SubElement() factory function.

http://docs.python.org/2/library/xml.etree.elementtree.html

ElementTree

In an ElementTree the fundamental unit is the "Element"

An Element has:

  • .tag (a string representing the name of the tag, like "p" or "script")
  • .attrib (a "dictionary" with name=value pairs of the tag attributes, like id="foo", or style="color: blue")
  • .text (String of text contents of the node)
  • .tail (if there's text after child tags, it'd be here)

In addition (and this is why it's a tree), each element can be iterated / treated like a list of all sub-elements.

  • Iteration to access contained "child" elements

Tree Traversal

http://en.wikipedia.org/wiki/Tree_traversal

File:Sorted binary tree preorder.svgFile:Sorted binary tree breadth-first traversal.svg

http://localhost/doc/python2.7/html/library/xml.etree.elementtree.html?highlight=element#xml.etree.ElementTree

Breadth-first

http://lxml.de/api.html

queue = deque([root])
while queue:
    el = queue.popleft()  # pop next element
    queue.extend(el)      # append its children
    print(el.tag)

Depth-first traversal

Walking the tree

def walk (node):
    print node.tag
    for child in node:
        walk(child)

A Style Browser

#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
 
print "Content-type: text/html;charset=utf-8"
print
print """<style>
div {
    border: 1px solid black;
    padding: 20px;
}
</style>
"""
 
q = cgi.FieldStorage()
url = q.getvalue("url","http://pajiba.com")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
    html = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
    def walk(e):
        print "<div>"
        print e.tag
        style = e.get("style")
        if style:
            print "<span style='"+style+"'>",style,"</span>"
        if e.text:
           print cgi.escape(e.text).encode("utf-8")
        if e.tail:
           print cgi.escape(e.tail).encode("utf-8")
        for child in e:
            walk(child)
        print "</div>"
    walk(html)