User:Laurier Rochon/prototyping/pythov: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 116: Line 116:
import random
import random


#do this 10 times
for b in range(0,10):
for b in range(0,10):
        #start text
text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
        #just to make things easier...this way we can treat the period as a token
text = text.replace('.',' . ').lower()
text = text.replace('.',' . ').lower()
        #split the sentences into pieces
words = text.split()
words = text.split()
        #dictionary
d = {}
d = {}
        #counter
c = 0
c = 0
        #final string
f = ''
f = ''
        #split at the period. this will be useful later to count the # of sentences
sents = text.split('.')
sents = text.split('.')
        #keep the same amount of sentences
for a in range(0,len(sents)-1):
for a in range(0,len(sents)-1):
                #for every word
for w in words:
for w in words:
if c< len(words)-1:
if c< len(words)-1:
                                #if word is not already in our dictionary
if words[c] not in d:
if words[c] not in d:
                                        #create a list as value in the dictionary
d[w] = []
d[w] = []
                                #and just append to the list (existing or newly created)
d[w].append(words[c+1])
d[w].append(words[c+1])
c=c+1
c=c+1
                #
                #the last steps just created our dictionary...here is the action
                #
                #strip whitespace and then split into words
allw = sents[a].strip(' \t\n\r').split()
allw = sents[a].strip(' \t\n\r').split()
                #start with the 1st word of the sentence
chosen = allw[0]
chosen = allw[0]
                #capitalize it
f = f + str(chosen).capitalize()+' '
f = f + str(chosen).capitalize()+' '
                #if its not a period
while chosen != '.':
while chosen != '.':
                        #choose the next word randomly
new = d[chosen][random.randrange(0,len(d[chosen]))]
new = d[chosen][random.randrange(0,len(d[chosen]))]
f = f +str(new)+' '
f = f +str(new)+' '
                        #and set the new chosen word as the last one randomly chosen
chosen = new
chosen = new
        #put back the period where it was
f = f.replace(' .','.')
f = f.replace(' .','.')
print f
print f
</source>
</source>

Revision as of 21:20, 18 May 2011

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

OBAMA Inaugural speech Remix

(original text)

The words have been spoken during rising tides of prosperity and the still waters of peace. Yet, every so often the oath is taken amidst gathering clouds and raging storms. At these moments, America has carried on not simply because of the skill or vision of those in high office, but because We the People have remained faithful to the ideals of our forebears, and true to our founding documents.

So it has been. So it must be with this generation of Americans.

That we are in the midst of crisis is now well understood. Our nation is at war against a far-reaching network of violence and hatred. Our economy is badly weakened, a consequence of greed and irresponsibility on the part of some but also our collective failure to make hard choices and prepare the nation for a new age.

Homes have been lost, jobs shed, businesses shuttered. Our health care is too costly, our schools fail too many, and each day brings further evidence that the ways we use energy strengthen our adversaries and threaten our planet.

These are the indicators of crisis, subject to data and statistics. Less measurable, but no less profound, is a sapping of confidence across our land; a nagging fear that America's decline is inevitable, that the next generation must lower its sights.

Today I say to you that the challenges we face are real, they are serious and they are many. They will not be met easily or in a short span of time. But know this America: They will be met.

On this day, we gather because we have chosen hope over fear, unity of purpose over conflict and discord.

On this day, we come to proclaim an end to the petty grievances and false promises, the recriminations and worn-out dogmas that for far too long have strangled our politics.

We remain a young nation, but in the words of Scripture, the time has come to set aside childish things. The time has come to reaffirm our enduring spirit; to choose our better history; to carry forward that precious gift, that noble idea, passed on from generation to generation: the God-given promise that all are equal, all are free, and all deserve a chance to pursue their full measure of happiness.


REMIX


The nation for far too many, and they will be with this day, we the skill or in the skill or vision of crisis, subject to set aside childish things. Yet, every so it has come to choose our economy is a sapping of greed and false promises, the next generation must lower its sights.

At war against a consequence of prosperity and worn-out dogmas that all are serious and irresponsibility on this day, we remain a short span of violence and all are the midst of purpose over fear, unity of violence and prepare the ways we gather because of scripture, the god-given promise that america's decline is taken amidst gathering clouds and all deserve a nagging fear that precious gift, that the recriminations and threaten our schools fail too costly, our collective failure to choose our planet.

So it must lower its sights. So it must be met. That precious gift, that the midst of our forebears, and all are in high office, but in high office, but also our land; a far-reaching network of crisis, subject to reaffirm our economy is taken amidst gathering clouds and discord. Our planet. Our health care is too costly, our economy is too many, and raging storms. Homes have been spoken during rising tides of americans. Our enduring spirit; to reaffirm our founding documents. These are free, and threaten our enduring spirit; to generation: the time.

Less measurable, but no less profound, is now well understood. Today i say to the still waters of violence and statistics. They will not simply because we remain a consequence of americans. But no less profound, is inevitable, that we the time. On this america: they will be met.

On this generation of greed and irresponsibility on not simply because we are the still waters of americans. We use energy strengthen our land; a chance to choose our nation for far too costly, our better history; to make hard choices and each day brings further evidence that america's decline is a consequence of time has been spoken during rising tides of purpose over conflict and irresponsibility on not simply because we remain a short span of time has come to data and threaten our planet.

The ways we are equal, all are many.


Approach 1 : using strings/substring & regular expressions

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.). The only big advantage over method 2 is that the text can change while the algorithm runs, and adapt in consequence.

He knew it. He knew it. I saw the cat before he saw the cat before he saw him. 
He knew it. He knew it. I saw him. 
He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato. 
He saw him. He saw him. I saw the potato. 
He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato. 
He knew it. He saw him. I saw the potato. 
He knew it. He knew it. I saw the cat before he saw him. 
He saw the potato. He knew it. I saw him. 
He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato. 
He knew it. He knew it. I saw him.
import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f


Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He knew it. He saw him. I saw the cat before he knew it.
He saw him. He saw him. I saw him.
He saw him. He saw the potato. I saw him.
He saw the cat before he knew it. He knew it. I saw him.
He saw him. He knew it. I saw him.
He knew it. He knew it. I saw the cat before he saw the potato.
He saw the cat before he saw him. He knew it. I saw him.
He knew it. He knew it. I saw the potato.
He saw the potato. He saw the potato. I saw him.
He knew it. He saw the potato. I saw him.
import random

#do this 10 times
for b in range(0,10):
        #start text
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
        #just to make things easier...this way we can treat the period as a token
	text = text.replace('.',' . ').lower()
        #split the sentences into pieces
	words = text.split()
        #dictionary
	d = {}
        #counter
	c = 0
        #final string
	f = ''
        #split at the period. this will be useful later to count the # of sentences
	sents = text.split('.')
        #keep the same amount of sentences
	for a in range(0,len(sents)-1):
                #for every word
		for w in words:
			if c< len(words)-1:
                                #if word is not already in our dictionary
				if words[c] not in d:
                                        #create a list as value in the dictionary
					d[w] = []
                                #and just append to the list (existing or newly created)
				d[w].append(words[c+1])
			c=c+1
                #
                #the last steps just created our dictionary...here is the action
                #

                #strip whitespace and then split into words
		allw = sents[a].strip(' \t\n\r').split()
                #start with the 1st word of the sentence
		chosen = allw[0]
                #capitalize it
		f = f + str(chosen).capitalize()+' '
                #if its not a period
		while chosen != '.':
                        #choose the next word randomly
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
                        #and set the new chosen word as the last one randomly chosen
			chosen = new
        #put back the period where it was
	f = f.replace(' .','.')
	print f