User:Laurier Rochon/prototyping/graphov: Difference between revisions

From XPUB & Lens-Based wiki
No edit summary
No edit summary
Line 12: Line 12:
== Ex1 : the original markov chain ==
== Ex1 : the original markov chain ==


Using the following sentence : ''They ate the cat before he saw the potato. He knew it. I saw him.''
Using the following sentences : ''They ate the cat before he saw the potato. He knew it. I saw him.''
 
Click [[File:2.jpg | HERE]] for the full image (not much bigger)


[[File:gephi1.gif]]
[[File:gephi1.gif]]
Line 22: Line 20:
Using the remix from Obama's speech ([[User:Laurier_Rochon/prototyping/pythov|here]])
Using the remix from Obama's speech ([[User:Laurier_Rochon/prototyping/pythov|here]])


Click [[File:1.pdf | HERE]] for the full image (PDF)
[[File:1.pdf | HERE]] for the full image (PDF)


[[File:gephi2.gif]]
[[File:gephi2.gif]]

Revision as of 15:38, 21 May 2011

Visualizing markov chains

(work in progress)

Some notes

  • I'm skeptical of 'data visualization', 'infographics' and the like. Furthermore, I don't feel completely qualified to draw conclusions from the more extensive graphs.
  • I guess I consider this as a quick prototyping tool just to get a general overview of what's going on in a large set of data
  • Graphs are ugly? Yes. My computer is slow.
  • All done using Gephi, which is free and works pretty well. They use mainly a '.gexf' format, which is similar to XML to generate the graphs. Unfortunately, the documentation for this software blows.

Ex1 : the original markov chain

Using the following sentences : They ate the cat before he saw the potato. He knew it. I saw him.

Gephi1.gif

Ex2 : using Obama's inaugural speech

Using the remix from Obama's speech (here)

HERE for the full image (PDF)

Gephi2.gif

Soft stuff

This generates the file to throw into Gephi

import random

text = 'They ate the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
words = text.split()
d = {}
sents = text.split('.')

data = '<?xml version="1.0" encoding="UTF-8"?>\n<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">\n    <meta lastmodifieddate="2009-03-20">  \n      <creator>lolo</creator>  \n      <description>markov chains</description>    </meta>  \n  <graph mode="static">'

c = 0

data = data + '<nodes>\n'
for w in words:
	if w not in d:
		d[w] = c
		data = data + '<node id="'+str (c)+'" label="'+str (w)+'" />\n'
		c=c+1
data = data + '</nodes>\n'

data = data + '<edges>\n'
e = 0
for w in words:
	if e<len(words)-1:
		data = data + '<edge id="'+str (e)+'" source="'+str (d[words[e]])+'" target="'+str (d[words[e+1]])+'" />\n'
		e = e+1
data = data + '</edges>\n'

data = data + '</graph>\n</gexf>\n'

f = open('mc.gexf','w');
f.write(data)
f.close()

Specifically, this :

<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
    <meta lastmodifieddate="2009-03-20">  
      <creator>lolo</creator>  
      <description>markov chains</description>    </meta>  
  <graph mode="static"><nodes>
<node id="0" label="they" />
<node id="1" label="ate" />
<node id="2" label="the" />
<node id="3" label="cat" />
<node id="4" label="before" />
<node id="5" label="he" />
<node id="6" label="saw" />
<node id="7" label="potato" />
<node id="8" label="." />
<node id="9" label="knew" />
<node id="10" label="it" />
<node id="11" label="i" />
<node id="12" label="him" />
</nodes>
<edges>
<edge id="0" source="0" target="1" />
<edge id="1" source="1" target="2" />
<edge id="2" source="2" target="3" />
<edge id="3" source="3" target="4" />
<edge id="4" source="4" target="5" />
<edge id="5" source="5" target="6" />
<edge id="6" source="6" target="2" />
<edge id="7" source="2" target="7" />
<edge id="8" source="7" target="8" />
<edge id="9" source="8" target="5" />
<edge id="10" source="5" target="9" />
<edge id="11" source="9" target="10" />
<edge id="12" source="10" target="8" />
<edge id="13" source="8" target="11" />
<edge id="14" source="11" target="6" />
<edge id="15" source="6" target="12" />
<edge id="16" source="12" target="8" />
</edges>
</graph>
</gexf>