Roll your own google: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "or De/Reconstructing the Inter/Web == CGI == * Start with a simple form (HTTP/Post/Submit!) * Can respond with ANY type (image/audio/...) * Respond to browser (with audio?) *...")
 
No edit summary
Line 1: Line 1:
or De/Reconstructing the Inter/Web
Google dominates contemporary access to the Internet, becoming virtually synonymous with search, online video, and through Android increasingly mobile.
 
BACK in the early daze, net sites were sparse isolated islands of, tethered together with [[webrings]] and a patchwork of amateur link lists and proto-[[portals]]. This exercise is at once a simple exercise in CGI scripting and an earnest effort to take back the web. Restriction: all data that your cgi uses must be local to the server -- meaning your "results" will be purely algorithmic, and/or based only on input provided to it (via the search box), and/or using either collected or crawled data you've yourself gathered. It's only fair, that's how Google works too.


== CGI ==
== CGI ==
Line 5: Line 7:
* Can respond with ANY type (image/audio/...)
* Can respond with ANY type (image/audio/...)
* Respond to browser (with audio?)
* Respond to browser (with audio?)
* Schmoogle
== AJAX ==
* Autocompletion
== Visual lies ==
* [[d3]]
== Tracking ==
* Cookies, sessions, tracking


== Text analysis ==
== Links ==
* [[Eliza]] and Weizenbaum's clever text and response
* [[Eliza]] and Weizenbaum's clever text and response
Eliza as a search engine ?!
Eliza as a search engine ?!
== Spidering ==
* [[Scrapy]]
* [[Scrapy]]


Line 27: Line 18:
* [[Lucene]]... [http://www.slideshare.net/otisg/lucene-introduction intro], tokenizer
* [[Lucene]]... [http://www.slideshare.net/otisg/lucene-introduction intro], tokenizer
* Crawling
* Crawling
== History ==
* [[git]]
== Other stuff ==
* What links here?
== Graphs ==
* Neo4J, Gremlin, Triples, and SPARQL ?!
== Embed-able ==
* (re)building a webring?
* PHP vs Python vs C?!
== Databases ==
* Flat-file
* BDB
* MySQL
* Mongo, ...
== Feeds ==
* [[RSS]]
== Realtime ==
* Image analysis
* Video installation ?!
* ?! Bayrle's video works
* Search by image... what does it mean with a fixed corpus
* Danya Vasiliev's Puppet piece ?!
* Danya Vasiliev's Puppet piece ?!
* WebRTC -- image as a search ?!
== Aspen Movie Map ==
?!
CD-ROMs with video ?!
* Video/still collage / Hockney / Stefan Piat

Revision as of 19:29, 3 March 2014

Google dominates contemporary access to the Internet, becoming virtually synonymous with search, online video, and through Android increasingly mobile.

BACK in the early daze, net sites were sparse isolated islands of, tethered together with webrings and a patchwork of amateur link lists and proto-portals. This exercise is at once a simple exercise in CGI scripting and an earnest effort to take back the web. Restriction: all data that your cgi uses must be local to the server -- meaning your "results" will be purely algorithmic, and/or based only on input provided to it (via the search box), and/or using either collected or crawled data you've yourself gathered. It's only fair, that's how Google works too.

CGI

  • Start with a simple form (HTTP/Post/Submit!)
  • Can respond with ANY type (image/audio/...)
  • Respond to browser (with audio?)

Links

  • Eliza and Weizenbaum's clever text and response

Eliza as a search engine ?!

Creating an index

  • How does an algorithm "see" a text, a sound, a video, a webpage
  • Beautiful Soup
  • Lucene... intro, tokenizer
  • Crawling
  • Danya Vasiliev's Puppet piece ?!