User:Andre Castro/wip-independent001
MATERIAL TEXTS
I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web.
At the moment I have been taking stories present in shortstories101.com as my source materials
Have written a script that creates a concordance - number of times a word occur in a text.
Give a try, remembering the $1 that follow the script has to be the url of a story from http://www.shortstories101.com/
#! /bin/sh
#this script is meant to be used on the stories from the website http://www.shortstories101.com
# $1 = url
DATE=$(date +%H%M-%d%m)
NAME=${DATE}.txt
lynx -dump -nolist $1 > temp1-${NAME}
#get rid of extra info
HEAD=$(grep 'Written by' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
TAIL=$(grep '* Currently' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
HEAD=$(($HEAD + 3))
TAIL=$(($TAIL - $HEAD ))
awk 'FNR>'${HEAD}'' temp1-${NAME} > temp2-$NAME
awk 'FNR<'${TAIL}'' temp2-${NAME} > temp3-$NAME
cat temp3-$NAME | tr ' ' '\n' | sort | uniq -c | sort -n -r > concordance-$NAME
rm temp*-${NAME}
cat concordance-$NAME
Could be interesting to gather information from a lot of stories present at this site, and having a recombination algorithm mixing them into seemingly coherent new stories.These could be submitted to the same site
Some side-products of the process seem interesting and can point to other directions.
Such as a text condensed into a space without spaces