User:Riviera/Podcast rss: Difference between revisions

From XPUB & Lens-Based wiki
Line 3: Line 3:
</div>
</div>


On Monday 2nd, we discussed podcasts as the antithesis of radio. That radio is live whereas podcasts are prerecorded and that modes of engaging with podcasts and radio differ. Podcasts are built upon Really Simple Syndication (RSS). In other words, in terms of code, there's little difference between the XML for an blog feed and the XML for a podcast feed. The following command combines a regular expression with the grep command to retrieve a list of all the XML tags in RSS feed for the ''Call Me Mother'' podcast.
On Monday 2nd, we discussed podcasts as opposed to radio. That radio is live whereas podcasts are prerecorded and that modes of engaging with podcasts and radio differ. Podcasts are built upon Really Simple Syndication (RSS). In other words, in terms of code, there's little difference between the XML for an blog feed and the XML for a podcast feed. The following command combines a regular expression with the grep command to retrieve a list of all the XML tags in RSS feed for the ''Call Me Mother'' podcast.


<div style="font-family: Monospace; background-color: #dbe6f0;">$ grep -E <span class="color:olive">"&lt;<nowiki>[[</nowiki>:alpha:<nowiki>]]</nowiki>+"</span> rss
<div style="font-family: Monospace; background-color: #dbe6f0;">$ grep -E <span class="color:olive">"&lt;<nowiki>[[</nowiki>:alpha:<nowiki>]]</nowiki>+"</span> rss

Revision as of 22:05, 2 October 2023

Podcasts, RSS feeds, grep

On Monday 2nd, we discussed podcasts as opposed to radio. That radio is live whereas podcasts are prerecorded and that modes of engaging with podcasts and radio differ. Podcasts are built upon Really Simple Syndication (RSS). In other words, in terms of code, there's little difference between the XML for an blog feed and the XML for a podcast feed. The following command combines a regular expression with the grep command to retrieve a list of all the XML tags in RSS feed for the Call Me Mother podcast.

$ grep -E "<[[:alpha:]]+" rss

This command prints results such as

<pubDate>Fri, 02 Apr 2021 15:25:34 GMT</pubDate>
<title>Stephen Whittle</title>

and

These tags also appear in RSS feeds for written blogs. However, the command also prints results such as

<itunes:explicit>yes</itunes:explicit>

and

<acast:showId>62b087ec4f1d1f0014025b79</acast:showId>


Editing

What I want to produce is a list of the tags only. At the moment, I'm not interested in what's in between the tags. Ideally I'd like to use built in commands to generate a text which contains an outline of tags along the lines of the following listing.

    1 <tag>
    2 <subtag>
    3 </subtag>
    4 </tag>

First Attempt

Can this be achieved by making two files, open.txt and close.txt? The first file should contain all the opening tags and the latter all the closing tags.

$ grep -Eon "<[[:alpha:]]+>" rss > open.txt

$ grep -Eon "</[[:alpha:]]+>" rss > close.txt

It should then be possible to cat both files and sort the result by line number producing the desired outcome.

$ cat open.txt close.txt | sort -n > sorted.txt

Second Attempt

Whilst reading through the output of the first attempt (sorted.txt), I discovered that

  1. the </guid> tags had no correlating opening tag
  2. That the closing tags were sorted before the opening tags

On the one hand, Some information was missing. I had already deliberately omitted acast and itunes tags. In doing so I worked on the assumption that there were no other relevant, colon-separated tags in the rss file. Fortunately, retrieving the additional data was a quick fix:

$ grep -Eon "<[[:alpha:]]+>|<[[:alpha:]]+ [^z-A]*>" rss > open.txt

Figuring out a way to sort the file such that the closing tags followed the opening tags was another matter. The general outline was there, but if I could sort out the details perhaps this could become a basis for a podcast RSS generator. And that could potentially be useful in relation to Worm's sonic archive.