User:Laurier Rochon/prototyping/pythov: Difference between revisions

Revision as of 20:11, 16 May 2011

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

Approach 1 : using strings/substring

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).

He saw the cat before he knew it. He saw the cat before he saw the potato. I was sure. 
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. 
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. 
He saw the cat before he knew it. He knew it. I was sure. 
He knew it. He saw the cat before he knew it. I was sure. 
He saw the potato. He knew it. I was sure. 
He knew it. He saw the potato. I was sure. 
He saw the potato. He saw the potato. I was sure. 
He saw the cat before he saw the potato. He knew it. I was sure. 
He knew it. He saw the potato. I was sure.

import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f

Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. He knew it. I was sure.
He saw the potato. He saw the cat before he saw the cat before he knew it. I was sure.
He saw the cat before he knew it. He saw the potato. I was sure.
He saw the cat before he knew it. He saw the potato. I was sure.
He knew it. He knew it. I was sure.
He saw the potato. He saw the potato. I was sure.
He knew it. He saw the potato. I was sure.
He knew it. He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. I was sure.
He knew it. He saw the cat before he saw the cat before he saw the potato. I was sure.
He saw the cat before he saw the potato. He saw the potato. I was sure.

import random

for b in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ').lower()
	words = text.split()
	d = {}
	c = 0
	f = ''
	sents = text.split('.')
	for a in range(0,len(sents)-1):
		for w in words:
			if c< len(words)-1:
				if words[c] not in d:
					d[w] = []
				d[w].append(words[c+1])
			c=c+1
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f

@@ Line 1: / Line 1: @@
-''Building Markov chains with simple sentences.
+''Building Markov chains with simple sentences.''
-Will reproduce the same amount of sentences, always using the first word of each one.''
+''Will reproduce the same amount of sentences, always using the first word of each one.''
 == Approach 1 : using strings/substring ==
-Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing.
+Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).
 <source lang="text">