User:Andre Castro/prototyping/1.2/Archiveorg-seachTerm: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "==Searching soundfiles per term== Feching sound feiles from archive.org based on search terms In order to do that I am making 2 API requests: * 1 - searching for a given term w...")
 
Line 19: Line 19:




<source lang=pyhton>
<source lang="python">
#!/usr/bin/pyhton
#!/usr/bin/pyhton
import urllib2, urllib, json, re
import urllib2, urllib, json, re

Revision as of 18:15, 9 February 2012

Searching soundfiles per term

Feching sound feiles from archive.org based on search terms

In order to do that I am making 2 API requests:

  • 1 - searching for a given term within mediaType:Audio
    • getting the identifier of the first search occurance id_0
  • 2 - requesting details on identifier (id_0)


I use the 2nd (details) API query to look for the containig files.

From this list I get the first ogg or mp3 (in case ogg files are present)

Download soundfile In archive.org files are stored http://www.archive.org/download/ + identifier + filename


#!/usr/bin/pyhton
import urllib2, urllib, json, re

# ====API Query====
term = 'orange'
url = 'http://www.archive.org/advancedsearch.php?q=' + term + '+AND+mediatype:Audio&rows=15&output=json' #api query

search = urllib2.urlopen(url)
search_result = json.load(search)
id_0 = search_result['response']['docs'][0]['identifier'] #look for the identifier in json dict

details_url = 'http://www.archive.org/details/' + id_0 + '&output=json' #details on identifier
details_search = urllib2.urlopen(details_url)
details_result = json.load(details_search)

files=details_result['files'].keys() #look for the containig files
files_list=[]


for i in files:
	mp3 = re.findall(r'.mp3$', i)
	ogg = re.findall(r'.ogg$', i)
	#print mp3
	#print ogg
	if len(ogg)>0:
		files_list.append(i)
		extension = '.ogg'
 		if i in ogg:
			print 'ogg in list'  		 
		elif len(mp3)>0:
			files_list.append(i)
			extension = '.mp3'
		
print files_list

audio_url = 'http://www.archive.org/download/' + id_0 + files_list[0]	
urllib.urlretrieve(audio_url, term + extension)
print files_list[0]
print audio_url