User:Angeliki/Exhibition ttssr

From XPUB & Lens-Based wiki




ttssr: Reading and speech recognition in loop


Angeliki Diakrousi



My collection of texts From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge, refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ttssr-> Reading and speech recognition in loop software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process.


The story behind

As in oral cultures, in contrast to the literate cultures, I am getting engaged with two important ingredients for knowledge production; the presence of a speaker and the oral speech. The oral narratives are based on the previous ones keeping a movable line to the past by adjusting to the history of the performer, but only if they are important and enjoyable for the present audience and time. The structure of the oral poem is based on rhythmic formulas, the memory and verbal interaction. Oral cultures exist without the need of writing, texts and dictionaries. They don't need a library to be stored, to look up and create their texts. The learning process is shared from individual positions but with the participation of the community. Both my reader and my software highlight also another aspect of knowledge production, from literate cultures, regarding repetition and text processing tasks. From weaving to typewriting and programming, women, mainly hidden from the public, were exploring the realm of writing beyond its conventional form. According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. But aren’t these endless feedback loops similar to the rhythmic narratives of the anonymous oral cultures? How this knowledge is produced through repetitive formulas that are easily memorized? In the context of the present available technologies, like speech recognition software, and in relation with other projects, using the technology of their time, these questions can be explored on a more practical base. With the aid and the errors of Pocketsphinx, I aim to create new oral and reading experiences. My work derives from the examples of two other projects coming from 80s: Boomerang (1974) and I Am Sitting In A Room (1981). The first one is about forming a tape by recording and broadcasting continuously the artist speaking. The latter is exploiting the imperfections of the system of recording tape machines and grabs the room echoes as musical qualities in the tape-delay system. It is noticeable that repetition seems to become an interesting instrument in the two projects and the process of typewriting. In all cases the machine includes repetition as its basic element (eternal loops and repeated processes). My software transcribes the voice of people reading a text. This process can be related to the typists transcribing the speech of writers, replacing the contemporary speech recognition software. Focusing on this anonymous labour, worked together with a machine, one cannot forget the anonymous people, who worked for the Pocketsphinx models. The training of this software is based on recorded human voices reading words. The acoustic model, being created after, is structured with phonemes. I am reversing this procedure by giving back Pocketsphinx to human voices.


Link of the project: https://pzwiki.wdka.nl/mediadesign/Ttssr-Speech_Recognition_Iterations
Link of this page: https://pzwiki.wdka.nl/mediadesign/User:Angeliki/Exhibition_ttssr#Ubuntu.2FMac

Input text

Any one is one having been that one Any one is such a one.

Any one having been that one is one remembering something oi such a thing, is one
remembering having been that one.

Each one having been one is being one having been that one. Each one haying been

one is remembering something of this thing, is remembering something or haying been
that one

Each one is one. Each one has been one. Each one being one, each one havrng been

one is remembering something or that thing.

Necessary Equipment

for one oral poet:

  • 1 laptop
  • 2 sets of headphones with microphone
  • 1 USB audio interface

for >1 oral poets:

  • 1 laptop
  • 1 loudspeaker
  • 1 microphone
  • 1 set of headphones
  • 1 USB audio interface

Set-up

Print this page and put it in the installation

Ttssr installation.JPG

Software installation

Ubuntu/Mac

README.md
This project is part of the OuNuPo project of the Experimental Publishing course in Master of Media Design in Piet Zwart Institute.
https://issue.xpub.nl/05/
https://git.xpub.nl/OuNuPo-make/
https://xpub.nl/

This is a different and separated version of my own proposal within the project.

Author

Angeliki Diakrousi

Description

My collection of texts 'From Tedious Tasks to Liberating Orality: Practices of the Excluded on Sharing Knowledge', refers to orality in relation to programming, as a way of sharing knowledge including our individually embodied position and voice. The emphasis on the role of personal positioning is often supported by feminist theorists. Similarly, and in contrast to scanning, reading out loud is a way of distributing knowledge in a shared space with other people, and this is the core principle behind the ttssr-> Reading and speech recognition in loop software. Using speech recognition software and python scripts I propose to the audience to participate in a system that highlights how each voice bears the personal story of an individual. In this case the involvement of a machine provides another layer of reflection of the reading process.

output.txt

According to Kittler (1999, pg. 221) “A desexualized writing profession, distant from any authorship, only empowers the domain of text processing. That is why so many novels written by recent women writers are endless feedback loops making secretaries into writers”. My choice for input to this software is an extract of the text 'Many Many Women' of Gertrude Stein. The input can be any text of your choice.

References

Kittler, F.A., (1999) Typewriter, in: Winthrop-Young, G., Wutz, M. (Trans.), Gramophone, Film, Typewriter. Stanford University Press, Stanford, Calif, pp. 214–221
Stein, G., 2005. Many Many Women, in: Matisse Picasso and Gertrude Stein With Two Shorter Stories

Installation

Requirements
  • Python3
  • GNU make
Install Dependencies
  • pip3: sudo apt install python3-pip
  • PocketSphinx package: sudo apt-get install pocketsphinx pocketsphinx-en-us
  • PocketSphinx Python library: sudo pip3 install PocketSphinx
  • Other software packages:sudo apt-get install gcc automake autoconf libtool bison swig python-dev libpulse-dev
  • Speech Recognition Python library: sudo pip3 install SpeechRecognition
  • TermColor Python library: sudo pip3 install termcolor
  • PyAudio Python library: sudo pip3 install pyaudio, sudo apt-get install python-pyaudio python3-pyaudio
  • Sox: sudo apt-get install sox
Clone Repository

git clone https://gitlab.com/nglk/ttssr.git

Make command

Sitting inside a pocket(sphinx) Speech recognition feedback loops using the first sentence of a scanned text as input

run:
cd ttssr
make ttssr-human-only

Licenses

© 2018 WTFPL – Do What the Fuck You Want to Public License.
© 2018 BSD 3-Clause – Berkeley Software Distribution

Windows

Scripts

Makefile

ttssr-human-only: ocr/output.txt ## Loop: text to speech-speech recognition. Dependencies: espeak, pocketsphinx
	bash src/ttssr-loop-human-only.sh ocr/output.txt


ttssr-loop-human-only.sh

#!/bin/bash
i=0;
echo "Read every new sentence out loud!"
head -n 1 $1 > output/input0.txt
while [[ $i -le 6 ]]
	do echo $i
	cat output/input$i.txt 
	python3 src/ttssr_write_audio.py src/sound$i.wav 2> /dev/null
	play src/sound$i.wav 2> /dev/null & #in the background the sound, without it all the sounds play one by one//2 is stderr
	python3 src/ttssr_transcribe.py sound$i.wav > output/input$((i+1)).txt 2> /dev/null
	sleep 1
	(( i++ ))
done
today=$(date +%Y%m%d.%H-%M);
mkdir -p "output/ttssr.$today"
mv -v output/input* output/ttssr.$today;
mv -v src/sound* output/ttssr.$today;



 #            DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 #                    Version 2, December 2004

 # Copyright (C) 2018 Angeliki Diakrousi <diakaggel@gmail.com>

 # Everyone is permitted to copy and distribute verbatim or modified
 # copies of this license document, and changing it is allowed as long
 # as the name is changed.

 #            DO WHAT THE FUCK YOU WANT TO PUBLIC LICENSE
 #   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

 #  0. You just DO WHAT THE FUCK YOU WANT TO.

ttssr_transcribe.py

#!/usr/bin/env python3

import speech_recognition as sr
import sys
from termcolor import cprint, colored
from os import path
import random

a1 = sys.argv[1] 
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), a1) 


# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source) 

color = ["white", "yellow"]
on_color = ["on_red", "on_magenta", "on_blue", "on_grey"]

# recognize speech using Sphinx
try:
    cprint( r.recognize_sphinx(audio), random.choice(color), random.choice(on_color))
except sr.UnknownValueError:
    print("uknown")
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))


# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com>
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com>
# All rights reserved.

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:

# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.

# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.

# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.

# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


src/ttssr_write_audio.py

#!/usr/bin/env python3
# NOTE: this example requires PyAudio because it uses the Microphone class

import speech_recognition as sr
import sys
from time import sleep

a1 = sys.argv[1]

# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
    # print("Read every new sentence out loud!")
    audio = r.listen(source)

# write audio to a WAV file
with open(a1, "wb") as f:
	f.write(audio.get_wav_data())



# Copyright (c) 2018, Angeliki Diakrousi <diakaggel@gmail.com>
# Copyright (c) 2014-2017, Anthony Zhang <azhang9@gmail.com>
# All rights reserved.

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:

# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.

# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.

# 3. Neither the name of the copyright holder nor the names of its contributors
# may be used to endorse or promote products derived from this software without
# specific prior written permission.

# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.