User:Manetta/scripts/python-translate-to-computer-phonemes
translating text into phonemes used by Sphinx
using the CMU dictionary file from the software package Sphinx (cmu07a.dic)
for download here: http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
more information about it: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
cmu07a.dic looks like:
abso AE B S OW absolom AE B S AH L AH M absolut AE B S AH L UW T absolut's AE B S AH L UW T S absolute AE B S AH L UW T absolutely AE B S AH L UW T L IY absoluteness AE B S AH L UW T N AH S absolutes AE B S AH L UW T S absolution AE B S AH L UW SH AH N absolutism AE B S AH L UW T IH Z AH M absolutist AE B S IH L UW T IH S T absolve AH B Z AA L V
this is its alphabet:
(from: speech.cs.cmu.edu)
Phoneme Example Translation ------- ------- ----------- AA odd AA D AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY CH cheese CH IY Z D dee D IY DH thee DH IY EH Ed EH D ER hurt HH ER T EY ate EY T F fee F IY G green G R IY N HH he HH IY IH it IH T IY eat IY T JH gee JH IY K key K IY L lee L IY M me M IY N knee N IY NG ping P IH NG OW oat OW T OY toy T OY P pee P IY R read R IY D S sea S IY SH she SH IY T tea T IY TH theta TH EY T AH UH hood HH UH D UW two T UW V vee V IY W we W IY Y yield Y IY L D Z zee Z IY ZH seizure S IY ZH ER
cmu07a.dic is a file in which "the pronunciation is encoded using a modified form of the Arpabet system,
with the addition of stress marks on vowels of levels 0, 1, and 2." (released in 2008)
from: wikipage about the cmu07a.dic
"Arpabet is a phonetic transcription code developed by Advanced Research Projects Agency (ARPA)
as a part of their Speech Understanding Project (1971–1976). It represents each phoneme of
General American English with a distinct sequence of ASCII characters."
from: Wikipage on Arpabet
import re
import os
with open('output.txt', 'w') as txt:
x = open('input.txt', 'r')
searchlines = x.readlines()
x.close()
print searchlines
search = searchlines[0].split(" ")
print search[0]
for i, searchitem in enumerate(search):
print searchitem
dic = open('cmu07a.dic', 'r')
for line in dic:
if re.match(searchitem, line):
print line
break
txt.write(line), "\n"
dic.close()
call K AO L me M IY echo EH K OW
my M AY wife W AY F is IH Z echo EH K OW
my M AY brother B R AH DH ER is IH Z echo EH K OW
echo EH K OW is IH Z my M AY mom M AA M
my M AY boss B AA S name N EY M is IH Z echo EH K OW
my M AY dad D AE D is IH Z echo EH K OW