User:Andre Castro/prototyping/1.2/Archiveorg-seachTerm: Difference between revisions
Andrecastro (talk | contribs) (Created page with "==Searching soundfiles per term== Feching sound feiles from archive.org based on search terms In order to do that I am making 2 API requests: * 1 - searching for a given term w...") |
Andrecastro (talk | contribs) |
||
Line 19: | Line 19: | ||
<source lang= | <source lang="python"> | ||
#!/usr/bin/pyhton | #!/usr/bin/pyhton | ||
import urllib2, urllib, json, re | import urllib2, urllib, json, re |
Revision as of 17:15, 9 February 2012
Searching soundfiles per term
Feching sound feiles from archive.org based on search terms
In order to do that I am making 2 API requests:
- 1 - searching for a given term within mediaType:Audio
- getting the identifier of the first search occurance id_0
- 2 - requesting details on identifier (id_0)
I use the 2nd (details) API query to look for the containig files.
From this list I get the first ogg or mp3 (in case ogg files are present)
Download soundfile In archive.org files are stored http://www.archive.org/download/ + identifier + filename
#!/usr/bin/pyhton
import urllib2, urllib, json, re
# ====API Query====
term = 'orange'
url = 'http://www.archive.org/advancedsearch.php?q=' + term + '+AND+mediatype:Audio&rows=15&output=json' #api query
search = urllib2.urlopen(url)
search_result = json.load(search)
id_0 = search_result['response']['docs'][0]['identifier'] #look for the identifier in json dict
details_url = 'http://www.archive.org/details/' + id_0 + '&output=json' #details on identifier
details_search = urllib2.urlopen(details_url)
details_result = json.load(details_search)
files=details_result['files'].keys() #look for the containig files
files_list=[]
for i in files:
mp3 = re.findall(r'.mp3$', i)
ogg = re.findall(r'.ogg$', i)
#print mp3
#print ogg
if len(ogg)>0:
files_list.append(i)
extension = '.ogg'
if i in ogg:
print 'ogg in list'
elif len(mp3)>0:
files_list.append(i)
extension = '.mp3'
print files_list
audio_url = 'http://www.archive.org/download/' + id_0 + files_list[0]
urllib.urlretrieve(audio_url, term + extension)
print files_list[0]
print audio_url