User:Kimberley
Vernacular Language Processing (SI16)
Transcription
Selection Process
"Annotation Compass"
"Cloverleaf"
Cloverleaf is a tool to navigate a set of text. Through generated short-cuts, it is meant to interrupt the linearity of a text. The result is a collage of excerpts and aims to free unexpected reading paths. The tool can be used to stitch various voices together in a non-hierarchical manner, giving off hybrid constructions where common points and divergences can co-exist.
in detail
For two texts in a set of text following an index order, let’s consider a ‘preceding text’ and its succeeding. In a first place, the function bridge() will look for the first identical word occurring in both texts (excluding stop words). Let’s name the position (index) of this word ‘i’ for the preceding text and ‘j’ for the succeeding text:
text, index 0: “Strawberries don’t grow tasty (i) in the Netherlands.” text, index 1: “Pineapple is very tasty (j) with salt (i) and chilli powder.” text, index 2: “Blocks of salt (j) distract cows (i).” text, index 3: “There was many field with cows (j) in this area.” Result: "Strawberries don’t grow tasty with salt distract cows in this area."
Since every text, in a given set of at least four texts, will alternatively take the ‘preceding’ and the ‘succeeding’ position, each text will hold a word indexed as ‘i’ and a word indexed as ‘j’: marking the identical words occurring between a text and its succeeding. These marks will then determine the start and the end of each excerpt, and open the ‘shortcut’ aforementioned.
As a result, the preceding text will be printed from its index (j)—attributed formerly when this text was in a ’succeeding’ position—until (i), its common word with its current succeeding text. The function will loop until the last two texts of the set (in the index order).
TEXT 1 xxxxxxxxx(J)oooooooooooo(I)xxxxxxxxxxxxx TEXT 2 xxxxxxxxxxxxxxxxx(J)oooooooooooo(I)xxxxxxxxxx TEXT 3 xxxxxxxxxxx(J)ooooooo(I)xxxxxxxxxxxxxxxx TEXT 4 xxxxxx(J)oooooooooooooooo(I)xxxxxxxxxxxxxx o = printed text x = rejected text J = same word's index preceding text I = same word's index in succeeding text
In the case no match is found between two texts, the text in succeeding position will be printed from its first word to its last.