User:Laurier Rochon/prototyping/graphov
Visualizing markov chains
(work in progress)
Some notes
- I'm skeptical of 'data visualization', 'infographics' and the like. Furthermore, I don't feel completely qualified to draw conclusions from the more extensive graphs.
- I guess I consider this as a quick prototyping tool just to get a general overview of what's going on in a large set of data
- Graphs are ugly? Yes. My computer is slow.
- All done using Gephi, which is free and works pretty well. They use mainly a '.gexf' format, which is similar to XML to generate the graphs. Unfortunately, the documentation for this software blows.
- The learning curve is very gentle, and the basic associative graph is very simple, it goes a little something like :
Declare what it is you are graphing, and give them a unique id (the number 1,2,3,4).
1 : "morning"
2 : "midday"
3 : "evening"
4 : "night"
Then just create the relationships, and give an id to this relationship (1,2,3,4,5,6,7, etc.). Important, the number here has a very different meaning than in the previous step.
1 : source=1 target=2 #(which means, go from "morning" to "midday")
2 : source=2 target=3 #(midday to evening)
3 : source=3 target=4 #(etc.)
4 : source=4 target=1 #and over and over again
5 : source=3 target=2 #...
Ex1 : the original markov chain
Using the following sentences : They ate the cat before he saw the potato. He knew it. I saw him.
Ex2 : using Obama's inaugural speech
Using the remix from Obama's speech (here)
Soft stuff
This generates the file to throw into Gephi
import random
text = 'They ate the cat before he saw the potato.He knew it.I saw him.'
text = text.replace('.',' . ').lower()
words = text.split()
d = {}
sents = text.split('.')
data = '<?xml version="1.0" encoding="UTF-8"?>\n<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">\n <meta lastmodifieddate="2009-03-20"> \n <creator>lolo</creator> \n <description>markov chains</description> </meta> \n <graph mode="static">'
c = 0
data = data + '<nodes>\n'
for w in words:
if w not in d:
d[w] = c
data = data + '<node id="'+str (c)+'" label="'+str (w)+'" />\n'
c=c+1
data = data + '</nodes>\n'
data = data + '<edges>\n'
e = 0
for w in words:
if e<len(words)-1:
data = data + '<edge id="'+str (e)+'" source="'+str (d[words[e]])+'" target="'+str (d[words[e+1]])+'" />\n'
e = e+1
data = data + '</edges>\n'
data = data + '</graph>\n</gexf>\n'
f = open('mc.gexf','w');
f.write(data)
f.close()
Specifically, this :
<?xml version="1.0" encoding="UTF-8"?>
<gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
<meta lastmodifieddate="2009-03-20">
<creator>lolo</creator>
<description>markov chains</description> </meta>
<graph mode="static"><nodes>
<node id="0" label="they" />
<node id="1" label="ate" />
<node id="2" label="the" />
<node id="3" label="cat" />
<node id="4" label="before" />
<node id="5" label="he" />
<node id="6" label="saw" />
<node id="7" label="potato" />
<node id="8" label="." />
<node id="9" label="knew" />
<node id="10" label="it" />
<node id="11" label="i" />
<node id="12" label="him" />
</nodes>
<edges>
<edge id="0" source="0" target="1" />
<edge id="1" source="1" target="2" />
<edge id="2" source="2" target="3" />
<edge id="3" source="3" target="4" />
<edge id="4" source="4" target="5" />
<edge id="5" source="5" target="6" />
<edge id="6" source="6" target="2" />
<edge id="7" source="2" target="7" />
<edge id="8" source="7" target="8" />
<edge id="9" source="8" target="5" />
<edge id="10" source="5" target="9" />
<edge id="11" source="9" target="10" />
<edge id="12" source="10" target="8" />
<edge id="13" source="8" target="11" />
<edge id="14" source="11" target="6" />
<edge id="15" source="6" target="12" />
<edge id="16" source="12" target="8" />
</edges>
</graph>
</gexf>