Syllabus 20100119

From XPUB & Lens-Based wiki

Using XML can be compared to stuffing information into envelopes and labelling them. Just as a postal system allows letters and packages to be sent between people, XML provides a system for information to be shared between computers on the Internet. By itself, it's a pretty simple concept to get. Where things get interesting (and complicated) is when you start talking about a particular application (the actual contents of the envelope, and the ways of packaging them).

Today, we will use the python library feedparser to read in RSS feeds, and use python to translate the contents of the feed into another markup format, namely HTML to be used in an ePub ebook.


#Looking inside the RSS Feed
import feedparser
# Calling the information from the website and giving it the name "newwork"
newwork = feedparser.parse("http://feeds.feedburner.com/newwork")
import pprint
#pprint.pprint(newwork)
#print(newwork.entries[4]["title"])
print """
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>feedbook</title>
    <link type="text/css" rel="stylesheet" media="all" href="stylesheet.css" />
  </head>
  <body>"""
for e in newwork.entries:
    print"<h1>"
    print e["title"].replace("&", "&amp;").encode("utf-8")
    print"</h1>"
    print"<p>"
    print e["summary"].replace("&", "&amp;").encode("utf-8")
    print"</p>"
    print
print"""
  </body>
</html>"""


Download a copy of the "epub" files (the zip at the bottom of the page), unzip it and rename the folder "feedbook" (or whatever), then from the Terminal:

python feed.py > feedbook/OEBPS/content.html
cd feedbook
zip -0Xq feedbook.epub mimetype
zip -Xr9Dq feedbook.epub *
lucidor feedbook.epub


Questions

In one case, the summary of a feed included a mismatched tag (an with no )... How can we use Python to "clean up" the XML.


Attachments