Feedparser.py: Difference between revisions
No edit summary |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Feedparser is a Python library that allows you to read RSS feeds. It helps to isolate your code from some of the differences (in format, version) of | Feedparser is a Python library that allows you to read RSS feeds. It helps to isolate your code from some of the differences (in format, version) of RSS feeds. | ||
http://feedparser.org/ | http://feedparser.org/ | ||
Line 5: | Line 5: | ||
<source lang="python"> | <source lang="python"> | ||
d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml") | d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml") | ||
print d.entries | |||
</source> | </source> | ||
== Installing == | |||
Typically (on Debian/Ubuntu), as easy as: | |||
sudo apt-get install python-feedparser | |||
== Examples == | |||
<source lang="python"> | |||
import feedparser | |||
url = "http://feeds.bbci.co.uk/news/rss.xml" | |||
feed = feedparser.parse(url) | |||
for e in feed.entries: | |||
print e.title.encode("utf-8") | |||
</source> | |||
<source lang="python"> | |||
import feedparser | |||
url = "http://search.twitter.com/search.atom?q=tomato" | |||
feed = feedparser.parse(url) | |||
for e in feed.entries: | |||
for word in e.title.split(): | |||
print word.encode("utf-8") | |||
</source> | |||
Read a feed with a URL from the command line. | |||
<source lang="python"> | |||
#!/usr/bin/env python | |||
import sys, feedparser | |||
try: | |||
url = sys.argv[1] | |||
except IndexError: | |||
url = "http://feeds.bbci.co.uk/news/rss.xml" | |||
feed = feedparser.parse(url) | |||
for e in feed.entries: | |||
print e.title.encode("utf-8") | |||
</source> | |||
== Reading feeds from local files == | |||
Feedparser supports reading from local files, simply give a (relative) path to the parse command. This is very useful when testing as: | |||
1. It's faster than loading the live feed every time. | |||
2. You can keep working, even if a feed is unavailable, or you have no network. | |||
3. When you encounter a problem, you can keep testing the same feed data over and over (keeping copies of "interesting" feeds as needed). | |||
For example, when processing a feed from the New York Times Online, I might use the following to load a live feed: | |||
<source lang="python"> | |||
feed = feedreader.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml") | |||
</source> | |||
To do this locally, I can first use wget to download the feed (note the -O option to pick the filename to save to): | |||
<source lang="bash"> | |||
wget http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml -O nytimes.xml | |||
</source> | |||
... and then load the feed in python with: | |||
<source lang="python"> | |||
feed = feedreader.parse("nytimes.xml") | |||
</source> | |||
[[Category:Cookbook]] |
Latest revision as of 22:12, 6 December 2010
Feedparser is a Python library that allows you to read RSS feeds. It helps to isolate your code from some of the differences (in format, version) of RSS feeds.
d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
print d.entries
Installing
Typically (on Debian/Ubuntu), as easy as:
sudo apt-get install python-feedparser
Examples
import feedparser
url = "http://feeds.bbci.co.uk/news/rss.xml"
feed = feedparser.parse(url)
for e in feed.entries:
print e.title.encode("utf-8")
import feedparser
url = "http://search.twitter.com/search.atom?q=tomato"
feed = feedparser.parse(url)
for e in feed.entries:
for word in e.title.split():
print word.encode("utf-8")
Read a feed with a URL from the command line.
#!/usr/bin/env python
import sys, feedparser
try:
url = sys.argv[1]
except IndexError:
url = "http://feeds.bbci.co.uk/news/rss.xml"
feed = feedparser.parse(url)
for e in feed.entries:
print e.title.encode("utf-8")
Reading feeds from local files
Feedparser supports reading from local files, simply give a (relative) path to the parse command. This is very useful when testing as:
1. It's faster than loading the live feed every time. 2. You can keep working, even if a feed is unavailable, or you have no network. 3. When you encounter a problem, you can keep testing the same feed data over and over (keeping copies of "interesting" feeds as needed).
For example, when processing a feed from the New York Times Online, I might use the following to load a live feed:
feed = feedreader.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml")
To do this locally, I can first use wget to download the feed (note the -O option to pick the filename to save to):
wget http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml -O nytimes.xml
... and then load the feed in python with:
feed = feedreader.parse("nytimes.xml")