BeautifulSoup: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
Line 23: Line 23:


print soup.prettify()
print soup.prettify()
</source>
A function to wrap one tag inside of another one.
<source lang="python">
import BeautifulSoup
def wraptag (tag, wrapper):
# wraps tag with wrapper
tagIndex = tag.parent.contents.index(tag)
tag.parent.insert(tagIndex, wrapper)
wrapper.insert(0, tag)
soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>")
items = soup.findAll("li")
for item in items:
div = BeautifulSoup.Tag(soup, "div")
wraptag(item, div)
print soup.prettify()
</source>
</source>



Revision as of 12:24, 12 June 2008

Beautiful Soup is a Python library for manipulating HTML pages.

Code Examples

A function to replace the contents of a tag:

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>")

def setcontents (tag, val):
	# remove previous contents
	for c in tag.contents:
		c.extract()
	# insert the new
	tag.insert(0, val)

items = soup.findAll("li")
for item in items:
	setcontents(item, "foo")

print soup.prettify()

A function to wrap one tag inside of another one.

import BeautifulSoup

def wraptag (tag, wrapper):
	# wraps tag with wrapper
	tagIndex = tag.parent.contents.index(tag)
	tag.parent.insert(tagIndex, wrapper)
	wrapper.insert(0, tag)

soup = BeautifulSoup.BeautifulSoup("<ul><li>one</li><li>two</li></ul>")
items = soup.findAll("li")
for item in items:	
	div = BeautifulSoup.Tag(soup, "div")
	wraptag(item, div)

print soup.prettify()

Code Questions

  • absolutize function needs ability to patch url's in stylesheets.