Feedparser.py

From XPUB & Lens-Based wiki
Revision as of 22:12, 6 December 2010 by Michael Murtaugh (talk | contribs) (→‎Examples)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Feedparser is a Python library that allows you to read RSS feeds. It helps to isolate your code from some of the differences (in format, version) of RSS feeds.

http://feedparser.org/

d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
print d.entries

Installing

Typically (on Debian/Ubuntu), as easy as:

sudo apt-get install python-feedparser

Examples

import feedparser
 
url = "http://feeds.bbci.co.uk/news/rss.xml"
feed = feedparser.parse(url)
for e in feed.entries:
    print e.title.encode("utf-8")
import feedparser
 
url = "http://search.twitter.com/search.atom?q=tomato"
feed = feedparser.parse(url)
for e in feed.entries:
    for word in e.title.split():
        print word.encode("utf-8")

Read a feed with a URL from the command line.

#!/usr/bin/env python
import sys, feedparser

try:
    url = sys.argv[1]
except IndexError:
    url = "http://feeds.bbci.co.uk/news/rss.xml"

feed = feedparser.parse(url)
for e in feed.entries:
    print e.title.encode("utf-8")

Reading feeds from local files

Feedparser supports reading from local files, simply give a (relative) path to the parse command. This is very useful when testing as:

1. It's faster than loading the live feed every time.
2. You can keep working, even if a feed is unavailable, or you have no network.
3. When you encounter a problem, you can keep testing the same feed data over and over (keeping copies of "interesting" feeds as needed).

For example, when processing a feed from the New York Times Online, I might use the following to load a live feed:

feed = feedreader.parse("http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml")


To do this locally, I can first use wget to download the feed (note the -O option to pick the filename to save to):

wget http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml -O nytimes.xml


... and then load the feed in python with:

feed = feedreader.parse("nytimes.xml")