User:Laurier Rochon/prototyping/pythov: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 1: Line 1:
''Building Markov chains with simple sentences.  
''Building Markov chains with simple sentences.''
Will reproduce the same amount of sentences, always using the first word of each one.''
 
''Will reproduce the same amount of sentences, always using the first word of each one.''


== Approach 1 : using strings/substring ==
== Approach 1 : using strings/substring ==


Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing.
Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).


<source lang="text">
<source lang="text">

Revision as of 19:11, 16 May 2011

Building Markov chains with simple sentences.

Will reproduce the same amount of sentences, always using the first word of each one.

Approach 1 : using strings/substring

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing (gotta remember where the last period was using counters, etc.).

He saw the cat before he knew it. He saw the cat before he saw the potato. I was sure. 
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. 
He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. 
He saw the cat before he knew it. He knew it. I was sure. 
He knew it. He saw the cat before he knew it. I was sure. 
He saw the potato. He knew it. I was sure. 
He knew it. He saw the potato. I was sure. 
He saw the potato. He saw the potato. I was sure. 
He saw the cat before he saw the potato. He knew it. I was sure. 
He knew it. He saw the potato. I was sure.
import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f


Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. He knew it. I was sure.
He saw the potato. He saw the cat before he saw the cat before he knew it. I was sure.
He saw the cat before he knew it. He saw the potato. I was sure.
He saw the cat before he knew it. He saw the potato. I was sure.
He knew it. He knew it. I was sure.
He saw the potato. He saw the potato. I was sure.
He knew it. He saw the potato. I was sure.
He knew it. He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. I was sure.
He knew it. He saw the cat before he saw the cat before he saw the potato. I was sure.
He saw the cat before he saw the potato. He saw the potato. I was sure.
import random

for b in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ').lower()
	words = text.split()
	d = {}
	c = 0
	f = ''
	sents = text.split('.')
	for a in range(0,len(sents)-1):
		for w in words:
			if c< len(words)-1:
				if words[c] not in d:
					d[w] = []
				d[w].append(words[c+1])
			c=c+1
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f