User:Angeliki/Exhibition ttssr: Difference between revisions
(Created page with "Sitting inside a pocket(sphinx): Angeliki Speech recognition feedback loops using the first sentence of a scanned text as input == Instructions for installation == === In Linu...") |
|||
(44 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Sitting inside a pocket(sphinx) | ::::::::::::::__TOC__ | ||
<br /> | |||
<br /> | |||
:::::::::::<big>'''ttssr: Reading and speech recognition in loop''' </big> | |||
:::::::::::::::<big>Angeliki Diakrousi</big> | |||
<div style="background:#FFFFFF"> | |||
My collection of texts ''[[Reader6/Angeliki#6.2FAngeliki|From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge]]'', refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ''ttssr-> Reading and speech recognition in loop'' software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process. | |||
<br /><br /> | |||
== The story behind == | |||
As in oral cultures, in contrast to the literate cultures, I am getting engaged with two important ingredients for knowledge production; the presence of a speaker and the oral speech. The oral narratives are based on the previous ones keeping a movable line to the past by adjusting to the history of the performer, but only if they are important and enjoyable for the present audience and time. The structure of the oral poem is based on rhythmic formulas, the memory and verbal interaction. Oral cultures exist without the need of writing, texts and dictionaries. They don't need a library to be stored, to look up and create their texts. The learning process is shared from individual positions but with the participation of the community. | |||
Both my reader and my software highlight also another aspect of knowledge production, from literate cultures, regarding repetition and text processing tasks. From weaving to typewriting and programming, women, mainly hidden from the public, were exploring the realm of writing beyond its conventional form. According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. But aren’t these endless feedback loops similar to the rhythmic narratives of the anonymous oral cultures? How this knowledge is produced through repetitive formulas that are easily memorized? | |||
In the context of the present available technologies, like speech recognition software, and in relation with other projects, using the technology of their time, these questions can be explored on a more practical base. With the aid and the errors of [https://en.wikipedia.org/wiki/CMU_Sphinx Pocketsphinx], I aim to create new oral and reading experiences. My work derives from the examples of two other projects coming from 80s: [https://www.youtube.com/watch?v=8z32JTnRrHc Boomerang (1974)] and [https://www.youtube.com/watch?v=fAxHlLK3Oyk I Am Sitting In A Room (1981)]. The first one is about forming a tape by recording and broadcasting continuously the artist speaking. The latter is exploiting the imperfections of the system of recording tape machines and grabs the room echoes as musical qualities in the tape-delay system. It is noticeable that repetition seems to become an interesting instrument in the two projects and the process of typewriting. In all cases the machine includes repetition as its basic element (eternal loops and repeated processes). My software transcribes the voice of people reading a text. This process can be related to the typists transcribing the speech of writers, replacing the contemporary speech recognition software. Focusing on this anonymous labour, worked together with a machine, one cannot forget the anonymous people, who worked for the Pocketsphinx models. The training of this software is based on recorded human voices reading words. The acoustic model, being created after, is structured with phonemes. I am reversing this procedure by giving back Pocketsphinx to human voices. | |||
</div> | |||
<br /> | |||
::::::::::::::::''Link of the project: '''https://pzwiki.wdka.nl/mediadesign/Ttssr-Speech_Recognition_Iterations'''''<br /> | |||
::::::::::::::::''Link of this page: '''https://pzwiki.wdka.nl/mediadesign/User:Angeliki/Exhibition_ttssr#Ubuntu.2FMac''''' | |||
== Input text == | |||
<syntaxhighlight lang="bash"> | |||
Any one is one having been that one Any one is such a one. | |||
Any one having been that one is one remembering something oi such a thing, is one | |||
remembering having been that one. | |||
Each one having been one is being one having been that one. Each one haying been | |||
one is remembering something of this thing, is remembering something or haying been | |||
that one | |||
Each one is one. Each one has been one. Each one being one, each one havrng been | |||
one is remembering something or that thing. | |||
</syntaxhighlight> | |||
== Necessary Equipment == | |||
=== for one oral poet: === | |||
* 1 laptop | |||
* 2 sets of headphones with microphone | |||
* 1 USB audio interface | |||
=== for >1 oral poets: === | |||
* 1 laptop | |||
* 1 loudspeaker | |||
* 1 microphone | |||
* 1 set of headphones | |||
* 1 USB audio interface | |||
== Set-up == | |||
Print this page and put it in the installation<br/> | |||
::::::::::::[[File:Ttssr installation.JPG|300px]] | |||
== Software installation == | |||
=== Ubuntu/Mac === | |||
'''README.md'''<br /> | |||
This project is part of the OuNuPo project of the Experimental Publishing course in Master of Media Design in Piet Zwart Institute. <br /> | |||
https://issue.xpub.nl/05/<br/> | |||
https://git.xpub.nl/OuNuPo-make/<br/> | |||
https://xpub.nl/<br/> | |||
This is a different and separated version of my own proposal within the project. | |||
==== Author ==== | |||
Angeliki Diakrousi | |||
==== Description ==== | |||
My collection of texts 'From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge', refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ttssr-> Reading and speech recognition in loop software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process. | |||
===== output.txt ===== | |||
According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. | |||
My choice for input to this software is an extract of the text 'Many Many Women' of Gertrude Stein. The input can be any text of your choice. | |||
===== References ===== | |||
Kittler, F.A., (1999) Typewriter, in: Winthrop-Young, G., Wutz, M. (Trans.), Gramophone, Film, Typewriter. Stanford University Press, Stanford, Calif, pp. 214–221 <br/> | |||
Stein, G., 2005. Many Many Women, in: Matisse Picasso and Gertrude Stein With Two Shorter Stories | |||
==== Installation ==== | |||
===== Requirements ===== | |||
* Python3 | |||
* GNU make | |||
===== Install Dependencies ===== | |||
* pip3: ''sudo apt install python3-pip'' | |||
* PocketSphinx package: ''sudo apt-get install pocketsphinx pocketsphinx-en-us'' | |||
* PocketSphinx Python library: ''sudo pip3 install PocketSphinx'' | |||
* Other software packages:''sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev'' | |||
* Speech Recognition Python library: ''sudo pip3 install SpeechRecognition'' | |||
* TermColor Python library: ''sudo pip3 install termcolor'' | |||
* PyAudio Python library: ''sudo pip3 install pyaudio'', ''sudo apt-get install python-pyaudio python3-pyaudio'' | |||
* Sox: ''sudo apt-get install sox'' | |||
===== Clone Repository ===== | |||
''git clone https://gitlab.com/nglk/ttssr.git'' | |||
===== Make command ===== | |||
Sitting inside a pocket(sphinx) | |||
Speech recognition feedback loops using the first sentence of a scanned text as input | Speech recognition feedback loops using the first sentence of a scanned text as input | ||
run: | run: <br/> | ||
''cd ttssr'' <br/> | |||
''make ttssr-human-only'' | |||
==== Licenses ==== | |||
© 2018 WTFPL – Do What the Fuck You Want to Public License. <br/> | |||
© 2018 BSD 3-Clause – Berkeley Software Distribution | |||
=== Windows === | |||
* Install Ubuntu in Windows: https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows#0 | |||
* Open the start menu and search for CMD (Command Prompt). Open it and type ''bash''. Now you are in the Ubuntu environment. | |||
* Install git: ''sudo apt install git'' | |||
* Install python3: | |||
*:''sudo apt-get update''<br /> | |||
*:''sudo apt-get install python3.6'' | |||
* Install pip: | |||
*:''sudo apt-get install python-pip python-dev build-essential'' | |||
*:''sudo pip install --upgrade pip'' | |||
*:''sudo pip install --upgrade virtualenv'' | |||
* Install GNU make: | |||
*:''sudo apt-get install build-essential'' | |||
* ''sudo apt install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools'' | |||
* Follow the instructions of the Ubuntu version above: https://pzwiki.wdka.nl/mediadesign/User:Angeliki/Exhibition_ttssr#Install_Dependencies | |||
== Scripts == | |||
=== Makefile === | |||
<syntaxhighlight lang="bash"> | |||
ttssr-human-only: ocr/output.txt ## Loop: text to speech-speech recognition. Dependencies: espeak, pocketsphinx | |||
bash src/ttssr-loop-human-only.sh ocr/output.txt | |||
</syntaxhighlight> | |||
=== ttssr-loop-human-only.sh === | |||
<syntaxhighlight lang="bash"> | |||
#!/bin/bash | |||
i=0; | |||
echo "Read every new sentence out loud!" | |||
head -n 1 $1 > output/input0.txt | |||
while [[ $i -le 6 ]] | |||
do echo $i | |||
cat output/input$i.txt | |||
python3 src/ttssr_write_audio.py src/sound$i.wav 2> /dev/null | |||
play src/sound$i.wav 2> /dev/null & #in the background the sound, without it all the sounds play one by one//2 is stderr | |||
python3 src/ttssr_transcribe.py sound$i.wav > output/input$((i+1)).txt 2> /dev/null | |||
sleep 1 | |||
(( i++ )) | |||
done | |||
today=$(date +%Y%m%d.%H-%M); | |||
mkdir -p "output/ttssr.$today" | |||
mv -v output/input* output/ttssr.$today; | |||
mv -v src/sound* output/ttssr.$today; | |||
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |||
# Version 2, December 2004 | |||
# Copyright (C) 2018 Angeliki Diakrousi <diakaggel@gmail.com> | |||
# Everyone is permitted to copy and distribute verbatim or modified | |||
# copies of this license document, and changing it is allowed as long | |||
# as the name is changed. | |||
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE | |||
# TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION | |||
# 0. You just DO WHAT THE FUCK YOU WANT TO. | |||
</syntaxhighlight> | |||
=== ttssr_transcribe.py === | |||
<syntaxhighlight lang="python"> | |||
#!/usr/bin/env python3 | |||
import speech_recognition as sr | |||
import sys | |||
from termcolor import cprint, colored | |||
from os import path | |||
import random | |||
a1 = sys.argv[1] | |||
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), a1) | |||
# use the audio file as the audio source | |||
r = sr.Recognizer() | |||
with sr.AudioFile(AUDIO_FILE) as source: | |||
audio = r.record(source) | |||
color = ["white", "yellow"] | |||
on_color = ["on_red", "on_magenta", "on_blue", "on_grey"] | |||
# recognize speech using Sphinx | |||
try: | |||
cprint( r.recognize_sphinx(audio), random.choice(color), random.choice(on_color)) | |||
except sr.UnknownValueError: | |||
print("uknown") | |||
except sr.RequestError as e: | |||
print("Sphinx error; {0}".format(e)) | |||
# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com> | |||
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com> | |||
# All rights reserved. | |||
# Redistribution and use in source and binary forms, with or without | |||
# modification, are permitted provided that the following conditions are met: | |||
# 1. Redistributions of source code must retain the above copyright notice, this | |||
# list of conditions and the following disclaimer. | |||
# 2. Redistributions in binary form must reproduce the above copyright notice, | |||
# this list of conditions and the following disclaimer in the documentation | |||
# and/or other materials provided with the distribution. | |||
# 3. Neither the name of the copyright holder nor the names of its contributors | |||
# may be used to endorse or promote products derived from this software without | |||
# specific prior written permission. | |||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |||
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |||
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |||
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |||
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |||
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |||
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |||
</syntaxhighlight> | |||
src/ttssr_write_audio.py | |||
<syntaxhighlight lang="python"> | |||
#!/usr/bin/env python3 | |||
# NOTE: this example requires PyAudio because it uses the Microphone class | |||
import speech_recognition as sr | |||
import sys | |||
from time import sleep | |||
a1 = sys.argv[1] | |||
# obtain audio from the microphone | |||
r = sr.Recognizer() | |||
with sr.Microphone() as source: | |||
# print("Read every new sentence out loud!") | |||
audio = r.listen(source) | |||
# write audio to a WAV file | |||
with open(a1, "wb") as f: | |||
f.write(audio.get_wav_data()) | |||
# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com> | |||
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com> | |||
# All rights reserved. | |||
# Redistribution and use in source and binary forms, with or without | |||
# modification, are permitted provided that the following conditions are met: | |||
# 1. Redistributions of source code must retain the above copyright notice, this | |||
# list of conditions and the following disclaimer. | |||
# 2. Redistributions in binary form must reproduce the above copyright notice, | |||
# this list of conditions and the following disclaimer in the documentation | |||
# and/or other materials provided with the distribution. | |||
# 3. Neither the name of the copyright holder nor the names of its contributors | |||
# may be used to endorse or promote products derived from this software without | |||
# specific prior written permission. | |||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |||
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |||
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |||
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |||
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |||
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |||
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |||
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |||
</syntaxhighlight> |
Latest revision as of 18:28, 20 July 2018
- ttssr: Reading and speech recognition in loop
- Angeliki Diakrousi
My collection of texts From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge, refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ttssr-> Reading and speech recognition in loop software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process.
The story behind
As in oral cultures, in contrast to the literate cultures, I am getting engaged with two important ingredients for knowledge production; the presence of a speaker and the oral speech. The oral narratives are based on the previous ones keeping a movable line to the past by adjusting to the history of the performer, but only if they are important and enjoyable for the present audience and time. The structure of the oral poem is based on rhythmic formulas, the memory and verbal interaction. Oral cultures exist without the need of writing, texts and dictionaries. They don't need a library to be stored, to look up and create their texts. The learning process is shared from individual positions but with the participation of the community. Both my reader and my software highlight also another aspect of knowledge production, from literate cultures, regarding repetition and text processing tasks. From weaving to typewriting and programming, women, mainly hidden from the public, were exploring the realm of writing beyond its conventional form. According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. But aren’t these endless feedback loops similar to the rhythmic narratives of the anonymous oral cultures? How this knowledge is produced through repetitive formulas that are easily memorized? In the context of the present available technologies, like speech recognition software, and in relation with other projects, using the technology of their time, these questions can be explored on a more practical base. With the aid and the errors of Pocketsphinx, I aim to create new oral and reading experiences. My work derives from the examples of two other projects coming from 80s: Boomerang (1974) and I Am Sitting In A Room (1981). The first one is about forming a tape by recording and broadcasting continuously the artist speaking. The latter is exploiting the imperfections of the system of recording tape machines and grabs the room echoes as musical qualities in the tape-delay system. It is noticeable that repetition seems to become an interesting instrument in the two projects and the process of typewriting. In all cases the machine includes repetition as its basic element (eternal loops and repeated processes). My software transcribes the voice of people reading a text. This process can be related to the typists transcribing the speech of writers, replacing the contemporary speech recognition software. Focusing on this anonymous labour, worked together with a machine, one cannot forget the anonymous people, who worked for the Pocketsphinx models. The training of this software is based on recorded human voices reading words. The acoustic model, being created after, is structured with phonemes. I am reversing this procedure by giving back Pocketsphinx to human voices.
- Link of the project: https://pzwiki.wdka.nl/mediadesign/Ttssr-Speech_Recognition_Iterations
- Link of this page: https://pzwiki.wdka.nl/mediadesign/User:Angeliki/Exhibition_ttssr#Ubuntu.2FMac
- Link of the project: https://pzwiki.wdka.nl/mediadesign/Ttssr-Speech_Recognition_Iterations
Input text
Any one is one having been that one Any one is such a one.
Any one having been that one is one remembering something oi such a thing, is one
remembering having been that one.
Each one having been one is being one having been that one. Each one haying been
one is remembering something of this thing, is remembering something or haying been
that one
Each one is one. Each one has been one. Each one being one, each one havrng been
one is remembering something or that thing.
Necessary Equipment
for one oral poet:
- 1 laptop
- 2 sets of headphones with microphone
- 1 USB audio interface
for >1 oral poets:
- 1 laptop
- 1 loudspeaker
- 1 microphone
- 1 set of headphones
- 1 USB audio interface
Set-up
Print this page and put it in the installation
Software installation
Ubuntu/Mac
README.md
This project is part of the OuNuPo project of the Experimental Publishing course in Master of Media Design in Piet Zwart Institute.
https://issue.xpub.nl/05/
https://git.xpub.nl/OuNuPo-make/
https://xpub.nl/
This is a different and separated version of my own proposal within the project.
Author
Angeliki Diakrousi
Description
My collection of texts 'From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge', refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ttssr-> Reading and speech recognition in loop software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process.
output.txt
According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. My choice for input to this software is an extract of the text 'Many Many Women' of Gertrude Stein. The input can be any text of your choice.
References
Kittler, F.A., (1999) Typewriter, in: Winthrop-Young, G., Wutz, M. (Trans.), Gramophone, Film, Typewriter. Stanford University Press, Stanford, Calif, pp. 214–221
Stein, G., 2005. Many Many Women, in: Matisse Picasso and Gertrude Stein With Two Shorter Stories
Installation
Requirements
- Python3
- GNU make
Install Dependencies
- pip3: sudo apt install python3-pip
- PocketSphinx package: sudo apt-get install pocketsphinx pocketsphinx-en-us
- PocketSphinx Python library: sudo pip3 install PocketSphinx
- Other software packages:sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev
- Speech Recognition Python library: sudo pip3 install SpeechRecognition
- TermColor Python library: sudo pip3 install termcolor
- PyAudio Python library: sudo pip3 install pyaudio, sudo apt-get install python-pyaudio python3-pyaudio
- Sox: sudo apt-get install sox
Clone Repository
git clone https://gitlab.com/nglk/ttssr.git
Make command
Sitting inside a pocket(sphinx) Speech recognition feedback loops using the first sentence of a scanned text as input
run:
cd ttssr
make ttssr-human-only
Licenses
© 2018 WTFPL – Do What the Fuck You Want to Public License.
© 2018 BSD 3-Clause – Berkeley Software Distribution
Windows
- Install Ubuntu in Windows: https://tutorials.ubuntu.com/tutorial/tutorial-ubuntu-on-windows#0
- Open the start menu and search for CMD (Command Prompt). Open it and type bash. Now you are in the Ubuntu environment.
- Install git: sudo apt install git
- Install python3:
- sudo apt-get update
- sudo apt-get install python3.6
- sudo apt-get update
- Install pip:
- sudo apt-get install python-pip python-dev build-essential
- sudo pip install --upgrade pip
- sudo pip install --upgrade virtualenv
- Install GNU make:
- sudo apt-get install build-essential
- sudo apt install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg libav-tools
- Follow the instructions of the Ubuntu version above: https://pzwiki.wdka.nl/mediadesign/User:Angeliki/Exhibition_ttssr#Install_Dependencies
Scripts
Makefile
ttssr-human-only: ocr/output.txt ## Loop: text to speech-speech recognition. Dependencies: espeak, pocketsphinx
bash src/ttssr-loop-human-only.sh ocr/output.txt
ttssr-loop-human-only.sh
#!/bin/bash
i=0;
echo "Read every new sentence out loud!"
head -n 1 $1 > output/input0.txt
while [[ $i -le 6 ]]
do echo $i
cat output/input$i.txt
python3 src/ttssr_write_audio.py src/sound$i.wav 2> /dev/null
play src/sound$i.wav 2> /dev/null & #in the background the sound, without it all the sounds play one by one//2 is stderr
python3 src/ttssr_transcribe.py sound$i.wav > output/input$((i+1)).txt 2> /dev/null
sleep 1
(( i++ ))
done
today=$(date +%Y%m%d.%H-%M);
mkdir -p "output/ttssr.$today"
mv -v output/input* output/ttssr.$today;
mv -v src/sound* output/ttssr.$today;
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# Version 2, December 2004
# Copyright (C) 2018 Angeliki Diakrousi <diakaggel@gmail.com>
# Everyone is permitted to copy and distribute verbatim or modified
# copies of this license document, and changing it is allowed as long
# as the name is changed.
# DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
# TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
# 0. You just DO WHAT THE FUCK YOU WANT TO.
ttssr_transcribe.py
#!/usr/bin/env python3
import speech_recognition as sr
import sys
from termcolor import cprint, colored
from os import path
import random
a1 = sys.argv[1]
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), a1)
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source)
color = ["white", "yellow"]
on_color = ["on_red", "on_magenta", "on_blue", "on_grey"]
# recognize speech using Sphinx
try:
cprint( r.recognize_sphinx(audio), random.choice(color), random.choice(on_color))
except sr.UnknownValueError:
print("uknown")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com>
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com>
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
src/ttssr_write_audio.py
#!/usr/bin/env python3
# NOTE: this example requires PyAudio because it uses the Microphone class
import speech_recognition as sr
import sys
from time import sleep
a1 = sys.argv[1]
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
# print("Read every new sentence out loud!")
audio = r.listen(source)
# write audio to a WAV file
with open(a1, "wb") as f:
f.write(audio.get_wav_data())
# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com>
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com>
# All rights reserved.
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.