User:Laurier Rochon/prototyping/pythov: Difference between revisions

From XPUB & Lens-Based wiki
(Created page with "''Building Markov chains with simple sentences. Will reproduce the same amount of sentences, always using the first word of each one.'' == Approach 1 : using strings/substring =...")
 
No edit summary
Line 1: Line 1:
''Building Markov chains with simple sentences. Will reproduce the same amount of sentences, always using the first word of each one.''
''Building Markov chains with simple sentences.  
Will reproduce the same amount of sentences, always using the first word of each one.''


== Approach 1 : using strings/substring ==
== Approach 1 : using strings/substring ==

Revision as of 20:09, 16 May 2011

Building Markov chains with simple sentences. Will reproduce the same amount of sentences, always using the first word of each one.

Approach 1 : using strings/substring

Conclusion : A bit more dense, and harder to read, but works well. Gets sloppy in the multi-sentence thing.

He saw the cat before he knew it. He saw the cat before he saw the potato. I was sure. He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. He saw the cat before he saw the potato. He saw the cat before he knew it. I was sure. He saw the cat before he knew it. He knew it. I was sure. He knew it. He saw the cat before he knew it. I was sure. He saw the potato. He knew it. I was sure. He knew it. He saw the potato. I was sure. He saw the potato. He saw the potato. I was sure. He saw the cat before he saw the potato. He knew it. I was sure. He knew it. He saw the potato. I was sure.

import random, re

for a in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ')
	f = ''
	nbsents = text.count('.')
	nbchars = 0
	for a in range(0,nbsents):
		start = nbchars
		end = text.find(" ",nbchars+1)
		dot = text.find(" . ",nbchars+1)
		nbchars = dot+2
		chosen = text[start:end].strip(' \t\n\r')
		f = f+text[start:end]+ " "
		while chosen!='.':
			searchstr = "\\b%s\\b" % chosen
			a = re.compile(searchstr,re.IGNORECASE);
			nextwords = []
			for m in a.finditer(text):
				nextwordpos = text.find(" ",m.end()+1)
				nextwords.append(text[m.end()+1:nextwordpos])
			chosen = nextwords[random.randrange(0,len(nextwords))]
			f = f+chosen
			if chosen != '.':
				f = f+' '
	f = f.replace(' .','.')
	print f


Approach 2 : dictionaries/lists

Conclusion : more light-weight, a bit more modular and much easier for multi-sentence

He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. He knew it. I was sure. He saw the potato. He saw the cat before he saw the cat before he knew it. I was sure. He saw the cat before he knew it. He saw the potato. I was sure. He saw the cat before he knew it. He saw the potato. I was sure. He knew it. He knew it. I was sure. He saw the potato. He saw the potato. I was sure. He knew it. He saw the potato. I was sure. He knew it. He saw the cat before he saw the cat before he saw the cat before he saw the cat before he knew it. I was sure. He knew it. He saw the cat before he saw the cat before he saw the potato. I was sure. He saw the cat before he saw the potato. He saw the potato. I was sure.

import random

for b in range(0,10):
	text = 'He saw the cat before he saw the potato.He knew it.I was sure.'
	text = text.replace('.',' . ').lower()
	words = text.split()
	d = {}
	c = 0
	f = ''
	sents = text.split('.')
	for a in range(0,len(sents)-1):
		for w in words:
			if c< len(words)-1:
				if words[c] not in d:
					d[w] = []
				d[w].append(words[c+1])
			c=c+1
		allw = sents[a].strip(' \t\n\r').split()
		chosen = allw[0]
		f = f + str(chosen).capitalize()+' '
		while chosen != '.':
			new = d[chosen][random.randrange(0,len(d[chosen]))]
			f = f +str(new)+' '
			chosen = new
	f = f.replace(' .','.')
	print f