Revision as of 20:24, 28 March 2014

Roll Your Own Google

From the exercise's page: 'This exercise is at once a simple exercise in CGI scripting and an opportunity to critically reflect on the state of the Web and the role of centralized commercial services such as Google.

Create a cgi "search engine". It needs to be self-contained, that is contain it's own index based on your own specific crawling of data. Part of the point of doing this is to reflect on the question of data centralization and specificity. What does it mean to create your own index? Your "results" could be "purely" algorithmic, and/or based only on input provided to it (via the search box), and/or using either collected or crawled data you've yourself gathered. It's only fair, that's how Google works too.'

Description

Free Association Index

My own search engine uses an index that's made by crossing two different texts. As input, a word in a search box - as output, a list of correlated words. By clicking on a word from the results, user starts another search, using the clicked word as search term.

The initial idea was to have a subjective dictionary of synonyms, where the correlations between words would be done directly by me. In other words, the associations would be exactly the ones I could think of, for each and every word in the index. Very quickly, this idea showed itself hard to execute, as it would require a good number of years for me to build this index.

A much more viable way would be to have input from existing text, such as a dictionary, for example. But if I used only a dictionary, the results would be the 'real' meaning of each word. A mix between two texts was then made: one text generated the titles (or the entries) and another text generated the content (the correlations). For the titles, I used A Pocket Dictionary by William Richards and for the content, I used The Cook and Housekeeper's Complete and Universal Dictionary; Including a System of Modern Cookery, in all Its Various Branches, Adapted to the Use of Private Families

The search mechanism is running through the use of Whoosh, a Python library created by Matt Chaput.

There's an initial version of the Free Association Index on
http://headroom.pzwart.wdka.hro.nl/~ldossin/free-association-index/

There is still a lot to explore in this project: for example, changing the parameters which control the score of each word, recording the clicks and the association path generated in each visit (possiblly displaying it to the user), adding new indexes (in the same fashion of Google's Images, News, Videos tabs, there could be several indexes where the same word would display different results, according to the 'nature' of the index).

Also, some improvements could be done, such as allowing the search of more than one word, and building a more interesting index, with less pronouns and other non-meaningful words, for example.

Besides that, I would also like to allow the user to suggest a correlation to the term being searched, so that the results would display both 'internal, official' correlations as well as 'external' ones.

All that would require a backend to allow me to manage the suggestions and eventually add, myself, new correlations to existing terms.

User:Lucia Dossin/Protyping/Assignment 5: Difference between revisions

Revision as of 20:24, 28 March 2014

Contents

Roll Your Own Google

Description

Free Association Index

Screenshots

Code

@@ Line 19: / Line 19: @@
 http://headroom.pzwart.wdka.hro.nl/~ldossin/free-association-index/
-There is still a lot to explore in this project: from changing parameters controlling the score of each word, to adding new indexes (in the same fashion of Google's Images, News, Videos tabs, there could be several indexes where the same word would display different results, according to the 'nature' of the index). Some improvements should also be done, such as allowing the search of more than one word, and building a more interesting index, with less pronouns and other non-meaningful words, for example.
+There is still a lot to explore in this project: for example, changing the parameters which control the score of each word, recording the clicks and the association path generated in each visit (possiblly displaying it to the user), adding new indexes (in the same fashion of Google's Images, News, Videos tabs, there could be several indexes where the same word would display different results, according to the 'nature' of the index).
+Also, some improvements could be done, such as allowing the search of more than one word, and building a more interesting index, with less pronouns and other non-meaningful words, for example.
 Besides that, I would also like to allow the user to suggest a correlation to the term being searched, so that the results would display both 'internal, official' correlations as well as 'external' ones.