User:Andre Castro/wip-independent001: Difference between revisions
Andrecastro (talk | contribs) (Created page with "==MATERIAL TEXTS== I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web. At the moment...") |
Andrecastro (talk | contribs) No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
==MATERIAL | ==TEXTS AS MATERIAL== | ||
I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web. | I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web. | ||
Line 40: | Line 40: | ||
Some side-products of the process seem interesting and can point to other directions. | Some side-products of the process seem interesting and can point to other directions. | ||
Such as a text condensed into a space without spaces | |||
[[Image:NoSpaceTxt.png|700px]] | |||
Started thinking also of encryption, the stories behind Alan Turing in WWII trying to decipher German's communications. | |||
What if letters in a text are replaced by numbers?? |
Latest revision as of 20:31, 18 October 2011
TEXTS AS MATERIAL
I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web.
At the moment I have been taking stories present in shortstories101.com as my source materials
Have written a script that creates a concordance - number of times a word occur in a text.
Give a try, remembering the $1 that follow the script has to be the url of a story from http://www.shortstories101.com/
#! /bin/sh
#this script is meant to be used on the stories from the website http://www.shortstories101.com
# $1 = url
DATE=$(date +%H%M-%d%m)
NAME=${DATE}.txt
lynx -dump -nolist $1 > temp1-${NAME}
#get rid of extra info
HEAD=$(grep 'Written by' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
TAIL=$(grep '* Currently' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
HEAD=$(($HEAD + 3))
TAIL=$(($TAIL - $HEAD ))
awk 'FNR>'${HEAD}'' temp1-${NAME} > temp2-$NAME
awk 'FNR<'${TAIL}'' temp2-${NAME} > temp3-$NAME
cat temp3-$NAME | tr ' ' '\n' | sort | uniq -c | sort -n -r > concordance-$NAME
rm temp*-${NAME}
cat concordance-$NAME
Could be interesting to gather information from a lot of stories present at this site, and having a recombination algorithm mixing them into seemingly coherent new stories.These could be submitted to the same site
Some side-products of the process seem interesting and can point to other directions.
Such as a text condensed into a space without spaces
Started thinking also of encryption, the stories behind Alan Turing in WWII trying to decipher German's communications. What if letters in a text are replaced by numbers??