User:Laurier Rochon/prototyping/graphov

From XPUB & Lens-Based wiki
< User:Laurier Rochon
Revision as of 15:30, 21 May 2011 by Laurier Rochon (talk | contribs) (Created page with "== Visualizing markov chains == (work in progress) '''Some notes''' * I'm skeptical of 'data visualization', 'infographics' and the like. Furthermore, I don't feel completely ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Visualizing markov chains

(work in progress)

Some notes

  • I'm skeptical of 'data visualization', 'infographics' and the like. Furthermore, I don't feel completely qualified to draw conclusions from the more extensive graphs.
  • I guess I consider this as a quick prototyping tool just to get a general overview of what's going on in a large set of data
  • Graphs are ugly? Yes. My computer is slow.
  • All done using Gephi, which is free and works pretty well. They use mainly a '.gexf' format, which is similar to XML to generate the graphs. Unfortunately, the documentation for this software blows.

Ex1 : the original markov chain

Using the following sentence : They ate the cat before he saw the potato. He knew it. I saw him.

Click HERE for the full image (not much bigger)

Gephi1.gif

Ex2 : using Obama's inaugural speech

Using the remix from Obama's speech (User:Laurier_Rochon/prototyping/pythov)

Click HERE for the full image (PDF)

Gephi2.gif

Soft stuff

This generates the file to throw into Gephi

import random

text = 'They ate the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
words = text.split()
d = {}
sents = text.split('.')

data = '<?xml version="1.0" encoding="UTF-8"?>\n<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">\n    <meta lastmodifieddate="2009-03-20">  \n      <creator>lolo</creator>  \n      <description>markov chains</description>    </meta>  \n  <graph mode="static">'

c = 0

data = data + '<nodes>\n'
for w in words:
	if w not in d:
		d[w] = c
		data = data + '<node id="'+str (c)+'" label="'+str (w)+'" />\n'
		c=c+1
data = data + '</nodes>\n'

data = data + '<edges>\n'
e = 0
for w in words:
	if e<len(words)-1:
		data = data + '<edge id="'+str (e)+'" source="'+str (d[words[e]])+'" target="'+str (d[words[e+1]])+'" />\n'
		e = e+1
data = data + '</edges>\n'

data = data + '</graph>\n</gexf>\n'

f = open('mc.gexf','w');
f.write(data)
f.close()

Specifically, this :

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
    <meta lastmodifieddate="2009-03-20">  
      <creator>lolo</creator>  
      <description>markov chains</description>    </meta>  
  <graph mode="static"><nodes>
<node id="0" label="they" />
<node id="1" label="ate" />
<node id="2" label="the" />
<node id="3" label="cat" />
<node id="4" label="before" />
<node id="5" label="he" />
<node id="6" label="saw" />
<node id="7" label="potato" />
<node id="8" label="." />
<node id="9" label="knew" />
<node id="10" label="it" />
<node id="11" label="i" />
<node id="12" label="him" />
</nodes>
<edges>
<edge id="0" source="0" target="1" />
<edge id="1" source="1" target="2" />
<edge id="2" source="2" target="3" />
<edge id="3" source="3" target="4" />
<edge id="4" source="4" target="5" />
<edge id="5" source="5" target="6" />
<edge id="6" source="6" target="2" />
<edge id="7" source="2" target="7" />
<edge id="8" source="7" target="8" />
<edge id="9" source="8" target="5" />
<edge id="10" source="5" target="9" />
<edge id="11" source="9" target="10" />
<edge id="12" source="10" target="8" />
<edge id="13" source="8" target="11" />
<edge id="14" source="11" target="6" />
<edge id="15" source="6" target="12" />
<edge id="16" source="12" target="8" />
</edges>
</graph>
</gexf>