User:Manetta/media-objects/contemporary-encyclopedias

training set = "contemporary encyclopedia"

In the process of making an encyclopedia, categories are decided on wherein various objects are placed. It is search for an universal system to describe the world.

Training sets are used to train data-mining algorithms for pattern recognition. These training sets are the contemporary version of the traditional encyclopedia. From the famous Encyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers initiated by Diderot and D'Alembert in 1751, the encyclopedia was a body of information to share knowledge from human to human. But the contemporary encyclopedia's are constructed to rather share structures and information of humans with machines.

In order to automate processes of recognition purposes, researchers are again triggered to reconsider their categorization structures and to question the classification of objects in these categories. The training sets give a glimpse on the process of constructing such simplified model of reality, and reveal the difficulties that appear along the way.

steps of constructing such encyclopedia:

1. The SUN group training set is collected by quering the scenes needed into searchengines. Result: the training set is built with typical digital images of low quality that are common to appear on the web.

2. The SUN group decides on category merges, drops, and arising, by judging their visual and semantic strength.

3. The SUN group asks 'mechanical turks' to annotate the images, by marking the objects that appear in a scene with a vector line. Small objects disappear, common objects become the common objects that will be recognized.

Also, the most often annotated scene is 'living room', followed by 'bedroom' and 'kitchen'. The probability that 'living room' will be the outcome of the recognition-algorithm is therefor much higher than for scenes that are not annotated that often. These results are hence also types of categories in themselves. Although not directly decided upon by the research group, this hierarchy orginates thanks to the selection of scenes that the annotators worked on.