ElementTree
While more sophisticated (and faster) libraries for working with HTML/XML exist (namely lxml), python's standard ElementTree implementation is quite capable (and using lxml requires a separate C module to be compiled on your platform which can sometimes complicate installation / distribution of your script).
ElementTree supports a small subset of the (more extensive) xpath query language:
Syntax | Meaning |
---|---|
tag | Selects all child elements with the given tag.
For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam. |
* | Selects all child elements. For example, */egg
selects all grandchildren named egg. |
. | Selects the current node. This is mostly useful
at the beginning of the path, to indicate that it’s a relative path. |
// | Selects all subelements, on all levels beneath the
current element. For example, .//egg selects all egg elements in the entire tree. |
.. | Selects the parent element. |
[@attrib] | Selects all elements that have the given attribute. |
[@attrib='value'] | Selects all elements for which the given attribute
has the given value. The value cannot contain quotes. |
[tag] | Selects all elements that have a child named
tag. Only immediate children are supported. |
[position] | Selects all elements that are located at the given
position. The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1). |
Source: http://docs.python.org/2/library/xml.etree.elementtree.html#supported-xpath-syntax