BeautifulSoup: Difference between revisions
No edit summary |
|||
Line 23: | Line 23: | ||
print soup.prettify() | print soup.prettify() | ||
</source> | |||
A function to wrap one tag inside of another one. | |||
<source lang="python"> | |||
import BeautifulSoup | |||
def wraptag (tag, wrapper): | |||
# wraps tag with wrapper | |||
tagIndex = tag.parent.contents.index(tag) | |||
tag.parent.insert(tagIndex, wrapper) | |||
wrapper.insert(0, tag) | |||
soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>") | |||
items = soup.findAll("li") | |||
for item in items: | |||
div = BeautifulSoup.Tag(soup, "div") | |||
wraptag(item, div) | |||
print soup.prettify() | |||
</source> | </source> | ||
Revision as of 12:24, 12 June 2008
Beautiful Soup is a Python library for manipulating HTML pages.
Code Examples
A function to replace the contents of a tag:
import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>")
def setcontents (tag, val):
# remove previous contents
for c in tag.contents:
c.extract()
# insert the new
tag.insert(0, val)
items = soup.findAll("li")
for item in items:
setcontents(item, "foo")
print soup.prettify()
A function to wrap one tag inside of another one.
import BeautifulSoup
def wraptag (tag, wrapper):
# wraps tag with wrapper
tagIndex = tag.parent.contents.index(tag)
tag.parent.insert(tagIndex, wrapper)
wrapper.insert(0, tag)
soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>")
items = soup.findAll("li")
for item in items:
div = BeautifulSoup.Tag(soup, "div")
wraptag(item, div)
print soup.prettify()
Code Questions
- absolutize function needs ability to patch url's in stylesheets.