User:Lucia Dossin/Protyping/Assignment 5
Roll Your Own Google
From the exercise's page: 'This exercise is at once a simple exercise in CGI scripting and an opportunity to critically reflect on the state of the Web and the role of centralized commercial services such as Google.
Create a cgi "search engine". It needs to be self-contained, that is contain it's own index based on your own specific crawling of data. Part of the point of doing this is to reflect on the question of data centralization and specificity. What does it mean to create your own index? Your "results" could be "purely" algorithmic, and/or based only on input provided to it (via the search box), and/or using either collected or crawled data you've yourself gathered. It's only fair, that's how Google works too.'
Description
My own search engine uses an index that's made by crossing two different texts. As input, a word in a search box - as output, a list of correlated words. By clicking on a word from the results, user starts another search, using the clicked word as search term.
The initial idea was to have a subjective dictionary of synonyms, where the correlations between words would be done directly by me. In other words, the associations would be exactly the ones I could think of, for each and every word in the index. Very quickly, this idea showed itself hard to execute, as it would require a good number of years for me to build this index.
A much more viable way would be to have input from existing text, such as a dictionary, for example. But if I used only a dictionary, the results would be the 'real' meaning of each word. A mix between two texts was then made: one text generated the titles (or the entries) and another text generated the content (the correlations). For the titles, I used A Pocket Dictionary by William Richards and for the content, I used The Cook and Housekeeper's Complete and Universal Dictionary; Including a System of Modern Cookery, in all Its Various Branches, Adapted to the Use of Private Families
The search mechanism is running through the use of Whoosh, a Python library created by Matt Chaput.
There's an initial version of the Free Association Index on
http://headroom.pzwart.wdka.hro.nl/~ldossin/free-association-index/
There is still a lot to explore in this project: from changing parameters controlling the score of each word, to adding new indexes (in the same fashion of Google's Images, News, Videos tabs, there could be several indexes where the same word would display different results, according to the 'nature' if the index).
Besides that, I would also like to allow the user to suggest a correlation to the term being searched, so that the results would display both 'internal, official' correlations as well as 'external' ones.
All that would need a backend that would allow me manage the suggestions and eventually add, myself, new correlations to existing terms.