User:Laurier Rochon/prototyping/pythov: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 8: Line 8:


<source lang="text">
<source lang="text">
He saw the cat before he knew it. He saw the cat before he saw the potato. I was sure.  
He knew it. He knew it. I saw the cat before he saw the cat before he saw him.
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure.  
He knew it. He knew it. I saw him.  
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure.
He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato.  
He saw the cat before he knew it. He knew it. I was sure.  
He saw him. He saw him. I saw the potato.
He knew it. He saw the cat before he knew it. I was sure.  
He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato.
He saw the potato. He knew it. I was sure.  
He knew it. He saw him. I saw the potato.  
He knew it. He saw the potato. I was sure.
He knew it. He knew it. I saw the cat before he saw him.  
He saw the potato. He saw the potato. I was sure.
He saw the potato. He knew it. I saw him.  
He saw the cat before he saw the potato. He knew it. I was sure.
He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato.  
He knew it. He saw the potato. I was sure.  
He knew it. He knew it. I saw him.  
</source>
</source>


Line 24: Line 24:


for a in range(0,10):
for a in range(0,10):
text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ')
text = text.replace('.',' . ')
f = ''
f = ''
Line 57: Line 57:


<source lang="text">
<source lang="text">
He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. He knew it. I was sure.
He knew it. He saw him. I saw the cat before he knew it.
He saw the potato. He saw the cat before he saw the cat before he knew it. I was sure.
He saw him. He saw him. I saw him.
He saw the cat before he knew it. He saw the potato. I was sure.
He saw him. He saw the potato. I saw him.
He saw the cat before he knew it. He saw the potato. I was sure.
He saw the cat before he knew it. He knew it. I saw him.
He knew it. He knew it. I was sure.
He saw him. He knew it. I saw him.
He saw the potato. He saw the potato. I was sure.
He knew it. He knew it. I saw the cat before he saw the potato.
He knew it. He saw the potato. I was sure.
He saw the cat before he saw him. He knew it. I saw him.
He knew it. He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. I was sure.
He knew it. He knew it. I saw the potato.
He knew it. He saw the cat before he saw the cat before he saw the potato. I was sure.
He saw the potato. He saw the potato. I saw him.
He saw the cat before he saw the potato. He saw the potato. I was sure.
He knew it. He saw the potato. I saw him.
</source>
</source>


Line 73: Line 73:


for b in range(0,10):
for b in range(0,10):
text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
text = text.replace('.',' . ').lower()
words = text.split()
words = text.split()

Revision as of 19:13, 16 May 2011

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

Approach 1 : using strings/substring

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).

He knew it. He knew it. I saw the cat before he saw the cat before he saw him. 
He knew it. He knew it. I saw him. 
He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato. 
He saw him. He saw him. I saw the potato. 
He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato. 
He knew it. He saw him. I saw the potato. 
He knew it. He knew it. I saw the cat before he saw him. 
He saw the potato. He knew it. I saw him. 
He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato. 
He knew it. He knew it. I saw him.
import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f


Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He knew it. He saw him. I saw the cat before he knew it.
He saw him. He saw him. I saw him.
He saw him. He saw the potato. I saw him.
He saw the cat before he knew it. He knew it. I saw him.
He saw him. He knew it. I saw him.
He knew it. He knew it. I saw the cat before he saw the potato.
He saw the cat before he saw him. He knew it. I saw him.
He knew it. He knew it. I saw the potato.
He saw the potato. He saw the potato. I saw him.
He knew it. He saw the potato. I saw him.
import random

for b in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
	text = text.replace('.',' . ').lower()
	words = text.split()
	d = {}
	c = 0
	f = ''
	sents = text.split('.')
	for a in range(0,len(sents)-1):
		for w in words:
			if c< len(words)-1:
				if words[c] not in d:
					d[w] = []
				d[w].append(words[c+1])
			c=c+1
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f