User:Laurier Rochon/prototyping/pythov: Difference between revisions

Revision as of 19:13, 16 May 2011

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

Approach 1 : using strings/substring

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).

He knew it. He knew it. I saw the cat before he saw the cat before he saw him. 
He knew it. He knew it. I saw him. 
He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato. 
He saw him. He saw him. I saw the potato. 
He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato. 
He knew it. He saw him. I saw the potato. 
He knew it. He knew it. I saw the cat before he saw him. 
He saw the potato. He knew it. I saw him. 
He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato. 
He knew it. He knew it. I saw him.

import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f

Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He knew it. He saw him. I saw the cat before he knew it.
He saw him. He saw him. I saw him.
He saw him. He saw the potato. I saw him.
He saw the cat before he knew it. He knew it. I saw him.
He saw him. He knew it. I saw him.
He knew it. He knew it. I saw the cat before he saw the potato.
He saw the cat before he saw him. He knew it. I saw him.
He knew it. He knew it. I saw the potato.
He saw the potato. He saw the potato. I saw him.
He knew it. He saw the potato. I saw him.

import random

for b in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
	text = text.replace('.',' . ').lower()
	words = text.split()
	d = {}
	c = 0
	f = ''
	sents = text.split('.')
	for a in range(0,len(sents)-1):
		for w in words:
			if c< len(words)-1:
				if words[c] not in d:
					d[w] = []
				d[w].append(words[c+1])
			c=c+1
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f

@@ Line 8: / Line 8: @@
 <source lang="text">
-He saw the cat before he knew it. He saw the cat before he saw the potato. I was sure.
+He knew it. He knew it. I saw the cat before he saw the cat before he saw him.
-He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure.
+He knew it. He knew it. I saw him.
-He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure.
+He saw the cat before he knew it. He saw the cat before he knew it. I saw the potato.
-He saw the cat before he knew it. He knew it. I was sure.
+He saw him. He saw him. I saw the potato.
-He knew it. He saw the cat before he knew it. I was sure.
+He saw the potato. He saw the cat before he saw him. I saw the cat before he saw the potato.
-He saw the potato. He knew it. I was sure.
+He knew it. He saw him. I saw the potato.
-He knew it. He saw the potato. I was sure.
+He knew it. He knew it. I saw the cat before he saw him.
-He saw the potato. He saw the potato. I was sure.
+He saw the potato. He knew it. I saw him.
-He saw the cat before he saw the potato. He knew it. I was sure.
+He knew it. He knew it. I saw the cat before he saw the cat before he saw the potato.
-He knew it. He saw the potato. I was sure.
+He knew it. He knew it. I saw him.
 </source>
@@ Line 24: / Line 24: @@
 for a in range(0,10):
-	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
+	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
 	text = text.replace('.',' . ')
 	f = ''
@@ Line 57: / Line 57: @@
 <source lang="text">
-He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. He knew it. I was sure.
+He knew it. He saw him. I saw the cat before he knew it.
-He saw the potato. He saw the cat before he saw the cat before he knew it. I was sure.
+He saw him. He saw him. I saw him.
-He saw the cat before he knew it. He saw the potato. I was sure.
+He saw him. He saw the potato. I saw him.
-He saw the cat before he knew it. He saw the potato. I was sure.
+He saw the cat before he knew it. He knew it. I saw him.
-He knew it. He knew it. I was sure.
+He saw him. He knew it. I saw him.
-He saw the potato. He saw the potato. I was sure.
+He knew it. He knew it. I saw the cat before he saw the potato.
-He knew it. He saw the potato. I was sure.
+He saw the cat before he saw him. He knew it. I saw him.
-He knew it. He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. I was sure.
+He knew it. He knew it. I saw the potato.
-He knew it. He saw the cat before he saw the cat before he saw the potato. I was sure.
+He saw the potato. He saw the potato. I saw him.
-He saw the cat before he saw the potato. He saw the potato. I was sure.
+He knew it. He saw the potato. I saw him.
 </source>
@@ Line 73: / Line 73: @@
 for b in range(0,10):
-	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
+	text = 'He saw the cat before he saw the potato.He knew it.I saw him.'
 	text = text.replace('.',' . ').lower()
 	words = text.split()