User:Laurier Rochon/prototyping/graphov

From XPUB & Lens-Based wiki

Visualizing markov chains

(work in progress)

Some notes

  • I'm skeptical of 'data visualization', 'infographics' and the like. Furthermore, I don't feel completely qualified to draw conclusions from the more extensive graphs.
  • I guess I consider this as a quick prototyping tool just to get a general overview of what's going on in a large set of data
  • Graphs are ugly? Yes. My computer is slow.
  • All done using Gephi, which is free and works pretty well. They use mainly a '.gexf' format, which is similar to XML to generate the graphs. Unfortunately, the documentation for this software blows.

Ex1 : the original markov chain

Using the following sentences : They ate the cat before he saw the potato. He knew it. I saw him.

Gephi1.gif

Ex2 : using Obama's inaugural speech

Using the remix from Obama's speech (here)

HERE for the full image (PDF)

Gephi2.gif

Soft stuff

This generates the file to throw into Gephi

import random

text = 'They ate the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
words = text.split()
d = {}
sents = text.split('.')

data = '<?xml version="1.0" encoding="UTF-8"?>\n<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">\n    <meta lastmodifieddate="2009-03-20">  \n      <creator>lolo</creator>  \n      <description>markov chains</description>    </meta>  \n  <graph mode="static">'

c = 0

data = data + '<nodes>\n'
for w in words:
	if w not in d:
		d[w] = c
		data = data + '<node id="'+str (c)+'" label="'+str (w)+'" />\n'
		c=c+1
data = data + '</nodes>\n'

data = data + '<edges>\n'
e = 0
for w in words:
	if e<len(words)-1:
		data = data + '<edge id="'+str (e)+'" source="'+str (d[words[e]])+'" target="'+str (d[words[e+1]])+'" />\n'
		e = e+1
data = data + '</edges>\n'

data = data + '</graph>\n</gexf>\n'

f = open('mc.gexf','w');
f.write(data)
f.close()

Specifically, this :

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
    <meta lastmodifieddate="2009-03-20">  
      <creator>lolo</creator>  
      <description>markov chains</description>    </meta>  
  <graph mode="static"><nodes>
<node id="0" label="they" />
<node id="1" label="ate" />
<node id="2" label="the" />
<node id="3" label="cat" />
<node id="4" label="before" />
<node id="5" label="he" />
<node id="6" label="saw" />
<node id="7" label="potato" />
<node id="8" label="." />
<node id="9" label="knew" />
<node id="10" label="it" />
<node id="11" label="i" />
<node id="12" label="him" />
</nodes>
<edges>
<edge id="0" source="0" target="1" />
<edge id="1" source="1" target="2" />
<edge id="2" source="2" target="3" />
<edge id="3" source="3" target="4" />
<edge id="4" source="4" target="5" />
<edge id="5" source="5" target="6" />
<edge id="6" source="6" target="2" />
<edge id="7" source="2" target="7" />
<edge id="8" source="7" target="8" />
<edge id="9" source="8" target="5" />
<edge id="10" source="5" target="9" />
<edge id="11" source="9" target="10" />
<edge id="12" source="10" target="8" />
<edge id="13" source="8" target="11" />
<edge id="14" source="11" target="6" />
<edge id="15" source="6" target="12" />
<edge id="16" source="12" target="8" />
</edges>
</graph>
</gexf>