Br browser: Difference between revisions
No edit summary |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Br_browser detects and displays ( urls) of broken/ dead links. <br> | |||
I am interested in anonymous or depersonalized, transit spaces I will explore the abandoned places on the web.(start from the wiki) <br> | |||
Browsing through broken links - Server Internal/ External errors which eventually redirect you to no space...<br> | |||
"It is the vanishing point of now-here is the same time a no-where." | |||
== Top Server Errors == | |||
[http://en.wikipedia.org/wiki/List_of_HTTP_status_codes List of HTTP status codes ]<br> | |||
'''400''' Bad Request <br> | |||
'''403''' Forbidden <br> | |||
'''404''' Not Found ---> The requested resource could not be found but may be available again in the future<br> | |||
'''500''' Internal Server Error ---> A generic error message, given when no more specific message is suitable. | |||
keywords: detect, direct, link, error, redirect | |||
== TEST TEST TEST == | |||
http://www.nytimes.com/ | http://www.nytimes.com/ | ||
http://www.bbc.co.uk/news/ | http://www.bbc.co.uk/news/ | ||
Line 10: | Line 28: | ||
http://www.bbc.co.uk/news/nettt/ | http://www.bbc.co.uk/news/nettt/ | ||
http://www.bbbccc.co.uk/news/ | http://www.bbbccc.co.uk/news/ | ||
http://www.google.nl/calendar | |||
http://www.wikipediaaaa.org/ | http://www.wikipediaaaa.org/ | ||
== PROTOTYPE == | |||
//CGI SCRIPT _ BrLinks | |||
<source lang="python"> | |||
#!/usr/bin/env python | |||
#-*- coding:utf-8 -*- | |||
import cgi, urllib2, html5lib, urlparse | |||
import cgitb; cgitb.enable() | |||
print "Content-type: text/html;charset=utf-8" | |||
print | |||
q = cgi.FieldStorage() | |||
url = q.getvalue("url","http://pzwart3.wdka.hro.nl/mediawiki/index.php?title=Br_browser&action=submit") | |||
f = urllib2.urlopen(url) | |||
ct = f.info().get("content-type") | |||
if ct.startswith("text/html"): | |||
t = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False) | |||
for a in t.iter("a"): | |||
href = a.get("href") | |||
href = urlparse.urljoin(url, href) | |||
try: | |||
page = urllib2.urlopen(href) | |||
except IOError, e: | |||
print '<a href="{0}">BROKEN {0}</a><br>'.format(href) | |||
</source> |
Latest revision as of 12:54, 6 June 2013
Br_browser detects and displays ( urls) of broken/ dead links.
I am interested in anonymous or depersonalized, transit spaces I will explore the abandoned places on the web.(start from the wiki)
Browsing through broken links - Server Internal/ External errors which eventually redirect you to no space...
"It is the vanishing point of now-here is the same time a no-where."
Top Server Errors
List of HTTP status codes
400 Bad Request
403 Forbidden
404 Not Found ---> The requested resource could not be found but may be available again in the future
500 Internal Server Error ---> A generic error message, given when no more specific message is suitable.
keywords: detect, direct, link, error, redirect
TEST TEST TEST
http://www.nytimes.com/ http://www.bbc.co.uk/news/ http://edition.cnn.com/ http://www.wikipedia.org/ http://www.mediawiki.org/
http://edition.cnn.com/wwweb/ http://www.bbc.co.uk/news/nettt/ http://www.bbbccc.co.uk/news/ http://www.google.nl/calendar http://www.wikipediaaaa.org/
PROTOTYPE
//CGI SCRIPT _ BrLinks
#!/usr/bin/env python
#-*- coding:utf-8 -*-
import cgi, urllib2, html5lib, urlparse
import cgitb; cgitb.enable()
print "Content-type: text/html;charset=utf-8"
print
q = cgi.FieldStorage()
url = q.getvalue("url","http://pzwart3.wdka.hro.nl/mediawiki/index.php?title=Br_browser&action=submit")
f = urllib2.urlopen(url)
ct = f.info().get("content-type")
if ct.startswith("text/html"):
t = html5lib.parse(f,treebuilder="etree",namespaceHTMLElements=False)
for a in t.iter("a"):
href = a.get("href")
href = urlparse.urljoin(url, href)
try:
page = urllib2.urlopen(href)
except IOError, e:
print '<a href="{0}">BROKEN {0}</a><br>'.format(href)