User:Andre Castro/wip-independent001

From XPUB & Lens-Based wiki
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

TEXTS AS MATERIAL

I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web.

At the moment I have been taking stories present in shortstories101.com as my source materials


Have written a script that creates a concordance - number of times a word occur in a text. Give a try, remembering the $1 that follow the script has to be the url of a story from http://www.shortstories101.com/

#! /bin/sh

#this script is meant to be used on the stories from the website http://www.shortstories101.com
# $1 = url

DATE=$(date +%H%M-%d%m)
NAME=${DATE}.txt

lynx -dump -nolist $1 > temp1-${NAME}

#get rid of extra info
HEAD=$(grep 'Written by' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
TAIL=$(grep '* Currently' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
HEAD=$(($HEAD + 3))
TAIL=$(($TAIL - $HEAD ))
awk 'FNR>'${HEAD}'' temp1-${NAME} > temp2-$NAME
awk 'FNR<'${TAIL}'' temp2-${NAME} > temp3-$NAME

cat temp3-$NAME | tr ' ' '\n' | sort | uniq -c | sort -n -r > concordance-$NAME

rm temp*-${NAME}
cat concordance-$NAME


Could be interesting to gather information from a lot of stories present at this site, and having a recombination algorithm mixing them into seemingly coherent new stories.These could be submitted to the same site


Some side-products of the process seem interesting and can point to other directions. Such as a text condensed into a space without spaces NoSpaceTxt.png

Started thinking also of encryption, the stories behind Alan Turing in WWII trying to decipher German's communications. What if letters in a text are replaced by numbers??