User:Laurier Rochon/prototyping/pythov

From XPUB & Lens-Based wiki

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

OBAMA Inaugural speech Remix

(original text)

The words have been spoken during rising tides of prosperity and the still waters of peace. Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forebears, and true to our founding documents.

So it has been. So it must be with this generation of Americans.

That we are in the midst of crisis is now well understood. Our nation is at war against a far-reaching network of violence and hatred. Our economy is badly weakened, a consequence of greed and irresponsibility on the part of some but also our collective failure to make hard choices and prepare the nation for a new age.

Homes have been lost, jobs shed, businesses shuttered. Our health care is too costly, our schools fail too many, and each day brings further evidence that the ways we use energy strengthen our adversaries and threaten our planet.

These are the indicators of crisis, subject to data and statistics. Less measurable, but no less profound, is a sapping of confidence across our land; a nagging fear that America's decline is inevitable, that the next generation must lower its sights.

Today I say to you that the challenges we face are real, they are serious and they are many. They will not be met easily or in a short span of time. But know this America: They will be met.

On this day, we gather because we have chosen hope over fear, unity of purpose over conflict and discord.

On this day, we come to proclaim an end to the petty grievances and false promises, the recriminations and worn-out dogmas that for far too long have strangled our politics.

We remain a young nation, but in the words of Scripture, the time has come to set aside childish things. The time has come to reaffirm our enduring spirit; to choose our better history; to carry forward that precious gift, that noble idea, passed on from generation to generation: the God-given promise that all are equal, all are free, and all deserve a chance to pursue their full measure of happiness.


REMIX


The nation for far too many, and they will be with this day, we the skill or in the skill or vision of crisis, subject to set aside childish things. Yet, every so it has come to choose our economy is a sapping of greed and false promises, the next generation must lower its sights.

At war against a consequence of prosperity and worn-out dogmas that all are serious and irresponsibility on this day, we remain a short span of violence and all are the midst of purpose over fear, unity of violence and prepare the ways we gather because of scripture, the god-given promise that america's decline is taken amidst gathering clouds and all deserve a nagging fear that precious gift, that the recriminations and threaten our schools fail too costly, our collective failure to choose our planet.

So it must lower its sights. So it must be met. That precious gift, that the midst of our forebears, and all are in high office, but in high office, but also our land; a far-reaching network of crisis, subject to reaffirm our economy is taken amidst gathering clouds and discord. Our planet. Our health care is too costly, our economy is too many, and raging storms. Homes have been spoken during rising tides of americans. Our enduring spirit; to reaffirm our founding documents. These are free, and threaten our enduring spirit; to generation: the time.

Less measurable, but no less profound, is now well understood. Today i say to the still waters of violence and statistics. They will not simply because we remain a consequence of americans. But no less profound, is inevitable, that we the time. On this america: they will be met.

On this generation of greed and irresponsibility on not simply because we are the still waters of americans. We use energy strengthen our land; a chance to choose our nation for far too costly, our better history; to make hard choices and each day brings further evidence that america's decline is a consequence of time has been spoken during rising tides of purpose over conflict and irresponsibility on not simply because we remain a short span of time has come to data and threaten our planet.

The ways we are equal, all are many.


Approach 1 : using strings/substring & regular expressions

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.). The only big advantage over method 2 is that the text can change while the algorithm runs, and adapt in consequence.

He knew it. He knew it. I saw the cat before he saw the cat before he saw him. 
He knew it. He knew it. I saw him. 
He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato. 
He saw him. He saw him. I saw the potato. 
He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato. 
He knew it. He saw him. I saw the potato. 
He knew it. He knew it. I saw the cat before he saw him. 
He saw the potato. He knew it. I saw him. 
He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato. 
He knew it. He knew it. I saw him.
import random, re

#do this 10 times
for a in range(0,10):
        #start text 
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
        #make the period a word...this is just easier
	text = text.replace('.',' . ')
        #final string
	f = ''
        #number of sentences (yes, this could be done with split(), but I wanted to
        #try using exclusively string functions!
	nbsents = text.count('.')
        #counter
	nbchars = 0
        #for every sentence...(i.e. keep the number of sentences intact)
	for a in range(0,nbsents):
                #now...the uglier part. begin to count the characters at 0
		start = nbchars
                #then, find the first space of the string, starting at our character count, since
                #this will change later. This is basically the end of the first word.
		end = text.find(" ",nbchars+1)
                #find the first period (end of sent., starting at our character count)
                #basically, make sure we're in the right sentence.
		dot = text.find(" . ",nbchars+1)
                #put that position BACK into nbchars, this way we are moving forwards with the text
		nbchars = dot+2
                #chosen is our word!
		chosen = text[start:end].strip(' \t\n\r')
                #add it to the final string, add a space too
		f = f+text[start:end]+ " "
                #this is pretty self-expl...while the end of the sentence has not been reached...
		while chosen!='.':
                        #reg ex : word boudary + last word + word boundary...that's it
			searchstr = "\\b%s\\b" % chosen
                        #find this (or these! may be more than 1 occurence). ignore caps.
			a = re.compile(searchstr,re.IGNORECASE);
                        #create a list of the 'following words', that match our reg ex
			nextwords = []
                        #for every one of these matches
			for m in a.finditer(text):
                                #get its position in the text
				nextwordpos = text.find(" ",m.end()+1)
                                #append it to the array
				nextwords.append(text[m.end()+1:nextwordpos])
                        #and choose one randomly!
			chosen = nextwords[random.randrange(0,len(nextwords))]
                        #also add it to our final string
			f = f+chosen
                        #and unless we're not at the end of the sentence, add a space
                        #perhaps there's a better way to do this...hm.
			if chosen != '.':
				f = f+' '
        #put the period back without the extra space
	f = f.replace(' .','.')
        #blastoff!
	print f


Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He knew it. He saw him. I saw the cat before he knew it.
He saw him. He saw him. I saw him.
He saw him. He saw the potato. I saw him.
He saw the cat before he knew it. He knew it. I saw him.
He saw him. He knew it. I saw him.
He knew it. He knew it. I saw the cat before he saw the potato.
He saw the cat before he saw him. He knew it. I saw him.
He knew it. He knew it. I saw the potato.
He saw the potato. He saw the potato. I saw him.
He knew it. He saw the potato. I saw him.

EDIT : actually this is much better...

import random

text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
words = text.split()
d = {}
c = 0
sents = text.split('.')

for w in words:
	if c< len(words)-1:
		if words[c] not in d:
			d[w] = []
		d[w].append(words[c+1])
	c=c+1

for b in range(0,5):
	f = ''
	for a in range(0,len(sents)-1):
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f