User:Andre Castro/wip-independent001

MATERIAL TEXTS

I started tinkering with ideas related to text processing, displaying of text content and text re-generation based on texts present on the web.

At the moment I have been taking stories present in shortstories101.com as my source materials

Have written a script that creates a concordance - number of times a word occur in a text. Give a try, remembering the $1 that follow the script has to be the url of a story from http://www.shortstories101.com/

#! /bin/sh

#this script is meant to be used on the stories from the website http://www.shortstories101.com
# $1 = url

DATE=$(date +%H%M-%d%m)
NAME=${DATE}.txt

lynx -dump -nolist $1 > temp1-${NAME}

#get rid of extra info
HEAD=$(grep 'Written by' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
TAIL=$(grep '* Currently' -n temp1-${NAME} | sed -n 's/^\([0-9]*\)[:].*/\1/p')
HEAD=$(($HEAD + 3))
TAIL=$(($TAIL - $HEAD ))
awk 'FNR>'${HEAD}'' temp1-${NAME} > temp2-$NAME
awk 'FNR<'${TAIL}'' temp2-${NAME} > temp3-$NAME

cat temp3-$NAME | tr ' ' '\n' | sort | uniq -c | sort -n -r > concordance-$NAME

rm temp*-${NAME}
cat concordance-$NAME

Could be interesting to gather information from a lot of stories present at this site, and having a recombination algorithm mixing them into seemingly coherent new stories.These could be submitted to the same site

Some side-products of the process seem interesting and can point to other directions.