|
|
Line 1: |
Line 1: |
| = Radical Browsers =
| | great site thanks <a href=" http://www.plime.com/members/sextube/ ">sextube singapore</a> =)) <a href=" http://www.plime.com/members/uporn/ ">uporn u tube</a> ;)) <a href=" http://www.plime.com/members/nexxx/ ">nexxx moviepost</a> =))) <a href=" http://www.plime.com/members/xvideos/ ">os x video codec</a> +)) <a href=" http://www.plime.com/members/tube-8/ ">tube 8 amps</a> =: )) <a href=" http://www.plime.com/members/pornhube/ ">chinese pornhube</a> +))) <a href=" http://www.plime.com/members/porntube/ ">better than porntube</a> =) <a href=" http://www.plime.com/members/pornotub/ ">pornotub not working</a> |
| | |
| Walk into any of your more "service oriented" stores in the United States and you're likely to be quickly "greeted" by a salesperson with something along the line of "is there something I can help you out with today?" A simple, and oft-used, non-committal response is to say, "No, thanks. I'm just browsing".
| |
| | |
| A software "browser" is for many quite their most frequently used piece of software. As a result of it's very persistence and ubiquity, the browser as a particular piece of software threatens to fade into the background, becoming a "natural" and seemingly neutral part of one's daily (computing) experience.
| |
| | |
| The original conception of the world wide web was one that supported a variety of means of viewing and interacting with online content.
| |
| | |
| By digging into the underlying network mechanisms, protocols, and markup languages it's possible to create radically different kinds of "browsing" of the material made available via the world wide web.
| |
| | |
| some examples
| |
| * [http://bak.spc.org/iod/ WebStalker]
| |
| ** [http://bak.spc.org/iod/mutation.html A Means of Mutation, Matthew Fuller]
| |
| * http://www.potatoland.org
| |
| * http://www.artisopensource.net/hacks
| |
| | |
| == Page mashups with Python & [[BeautifulSoup]] ==
| |
| | |
| Some useful tools built into Python:
| |
| | |
| * [[urllib2]]
| |
| * urlparse
| |
| | |
| Issue with urllib and wikipedia
| |
| (Setting User-Agent to "pretend" to be a "real" browser):
| |
| * http://bytes.com/forum/thread500417.html
| |
| | |
| == Code ==
| |
| | |
| A simple function to open a soup from a random URL. The function sets the user agent as this allows python to "mimic" another browser. Many sites refuse unrecognized browser tags (in an attempt to block "bots").
| |
| | |
| <source lang="python"> | |
| def opensoup (url):
| |
| """
| |
| returns (page, actualurl)
| |
| actualurl maybe different than url in the case of a redirect
| |
| """
| |
| request = urllib2.Request(url)
| |
| user_agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14"
| |
| request.add_header("User-Agent", user_agent)
| |
| pagefile=urllib2.urlopen(request)
| |
| page=BeautifulSoup.BeautifulSoup(pagefile)
| |
| realurl = pagefile.geturl()
| |
| return (page, realurl)
| |
| </source> | |
| | |
| | |
| The following is a example of a Python CGI-script that uses Beautiful Soup to display only the link tags of a given page. Link's are re-directed to load the referred to page using the same script, so that the user can "browse" through the links of one or more sites through the filter of the script. This basic framework could be used to produce alternative "browsers" viewable via a users own regular browser like [[Firefox]].
| |
| | |
| <source lang="python">
| |
| #!/usr/bin/python
| |
| | |
| import BeautifulSoup, cgi
| |
| import urllib, urllib2, urlparse
| |
| import cgitb; cgitb.enable()
| |
| | |
| inputs = cgi.FieldStorage()
| |
| pageurl = inputs.getvalue("url", "http://news.bbc.co.uk")
| |
| | |
| # setting the user-agent
| |
| request = urllib2.Request(pageurl)
| |
| user_agent = "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14"
| |
| request.add_header("User-Agent", user_agent)
| |
| | |
| pagefile=urllib2.urlopen(request)
| |
| page=BeautifulSoup.BeautifulSoup(pagefile)
| |
| realurl = pagefile.geturl()
| |
| | |
| def scriptURL():
| |
| """ returns: current URL without query """
| |
| httpHost = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
| |
| scriptName = os.environ.get('SCRIPT_NAME')
| |
| return "http://" + httpHost + scriptName
| |
| | |
| this = scriptURL()
| |
| | |
| print "Content-type: text/html"
| |
| print
| |
| | |
| # make absolute all href's
| |
| for r in page.findAll(True, {'href': True}):
| |
| href = r['href']
| |
| if not href.lower().startswith("http"):
| |
| r['href'] = urlparse.urljoin(realurl, href)
| |
| # make absolute all src's
| |
| for r in page.findAll(True, {'src': True}):
| |
| href = r['src']
| |
| if not href.lower().startswith("http"):
| |
| r['src'] = urlparse.urljoin(realurl, href)
| |
| | |
| title = ""
| |
| try:
| |
| title = page.title.string
| |
| except AttributeError:
| |
| pass
| |
| | |
| print "<h1>%s</h1>" % title
| |
| print "<h2>%s</h2>" % (realurl)
| |
| print "<ol>"
| |
| links=page.findAll("a")
| |
| for l in links:
| |
| if not l.has_key("href"): continue
| |
| href = l['href']
| |
| if not href.lower().startswith("http"):
| |
| href = urlparse.urljoin(realurl, href)
| |
| label = l.renderContents()
| |
| href = this + "?url=" + urllib.quote(href, "")
| |
|
| |
| print """<li><a href="%s">%s</a></li>""" % (href, label)
| |
| | |
| print "</ol>"
| |
| </source> | |