User:Angeliki/Grad project speech analysis: Difference between revisions

From XPUB & Lens-Based wiki
Line 1: Line 1:
== Re- humanizing voice samples ==
== Re- humanizing voice samples ==
I want to highlight the politics/situation regarding the speech recognition and analysis tools that are changing our perspective on our voices and used to control our access and behaviour. Those tools that are recently very broadly used are trained by a database of 'real' voice samples. These samples come from different sources around the world sometimes with the permission of the people donating their voice. The more the accents are the better the tool can become. At the same time these tools are used to verify  
I want to highlight/be aware of the politics/situation regarding the speech recognition and analysis tools that are changing our perspective on our embodied voices. Beyond their experimental or commercial uses they are also used to control our access in a country or institutions and affect our behaviour with our bodies and technology. In extend the way they are used affects our relations with the others/estrange us from our relations with the others and our surroundings/ dehumanize our lives. Those tools that are recently very broadly used are trained by a database of 'real' voice samples. These samples come from different sources around the world [research projects, frequencies, radio] sometimes with the permission of the people donating their voice. The more the accents are and the bigger the dataset becomes the better the tool can be trained. At the same time these tools (like automatic dialect analysis) are used from states [Germany] to verify the claims of origin of refugees. It is very often that this process can get wrong because "Identifying the region of origin for anyone based on their speech is an extremely complex task" and depends on "a wide range of factors".<ref>https://www.dw.com/en/automatic-speech-analysis-software-used-to-verify-refugees-dialects/a-37980819</ref>


I will do it by relating the two sides of the tool.  
 
I will do it by relating the two sides of the tool and rethink/hack this tool in a way that open a conversation around this issue/ radicalise this tool.  


Our voice is personal  
Our voice is personal  
Line 25: Line 26:
| with permission from the users or not in the case of real samples || matter of privacy, de-humanizing automated processes regarding control of the body
| with permission from the users or not in the case of real samples || matter of privacy, de-humanizing automated processes regarding control of the body
|-
|-
| Trained data/ speech recognition
samples:
{|
|-
|{{#Widget:Audio|mp3=https://audio.tatoeba.org/sentences/eng/2544351.mp3}}<br />
I wish I had your strength.
|| tatoeba.org for Common Voice of Mozilla
|-
| {{#Widget:Audio|mp3=https://pzwiki.wdka.nl/mw-mediadesign/images/7/7f/LDC93S1.mp3}}<br />
LDC93S1 0 46797 She had your dark suit in greasy wash water all year.
|| catalog.ldc.upenn.edu for Pocketsphinx
|-
| {{#Widget:Audio|flac=https://catalog.ldc.upenn.edu/desc/addenda/LDC2018S11.flac }}<br />
por que al fin y al cabo el miedo de la mujer a la violencia del hombre es el espejo del miedo del hombre a la mujer sin miedo CMPB_M_32_01IVN_00004
|| catalog.ldc.upenn.edu for Pocketsphinx (broadcast conversation)
|-
| {{#Widget:Audio|flac=https://catalog.ldc.upenn.edu/desc/addenda/LDC2017S17.flac}}<br />
<pre style="white-space: pre-wrap;">
Interview 15
(A=Interviewer; B=Interviewee)
A: So we are recording.  Awesome.  So how long have you lived in Flint? (unclear)
B: 38 years.
A: Is that your whole life?  Wow you look really young.
B: Thank you!
A: So what can you tell me about what it’s like to have grown up around here?
B: Normal, just very- Working class, nice people.  Good values, good heart.  Did all the usual.  Rode my bike.  Played outside.  Brownies. Family-oriented.  Just very- I mean- very Midwestern.  You know.  Cliché. I mean really.
A: Okay.  Have you- have you traveled around to other places to see how things are like in comparison to Flint?
B: Um…do you mean throughout the county or the state or the country or-
A: Any. (unclear)
B: Any…um… unfortunately I haven’t been able to travel that much.  My traveling has been just basically through the state.  A couple trips to Canada.  Ontario.  Some through the- like- I guess you’d say upper Midwest.  Ohio.  Illinois.  Indiana.  Iowa. And it’s- I find it very similar.  I mean different.  I mean.  What their economy was based in.  Because it wasn’t GM.  They were really sim- It was- felt like really similar.  Just the way people are- small communities are nice and down to earth.  What I come across but just similar.  But I would like to travel more in my life. 
A: So would you say, like it- you kind of had like a typical American experience growing up in Flint?
B: Yeah, I would say so, just um … like I said, bike riding, Brownies. I was fortunate I had both my parents stay together.  Most of my friends got divorced, their parents got divorced, so we were atypical in that sense. And just...um, normal neighborhood, just close but not in each other’s business, so to speak, and just family oriented. Nothing too exciting. </pre>
|| catalog.ldc.upenn.edu for Pocketsphinx (microphone conversation)
|-
|}
|}
|}

Revision as of 11:40, 12 November 2018

Re- humanizing voice samples

I want to highlight/be aware of the politics/situation regarding the speech recognition and analysis tools that are changing our perspective on our embodied voices. Beyond their experimental or commercial uses they are also used to control our access in a country or institutions and affect our behaviour with our bodies and technology. In extend the way they are used affects our relations with the others/estrange us from our relations with the others and our surroundings/ dehumanize our lives. Those tools that are recently very broadly used are trained by a database of 'real' voice samples. These samples come from different sources around the world [research projects, frequencies, radio] sometimes with the permission of the people donating their voice. The more the accents are and the bigger the dataset becomes the better the tool can be trained. At the same time these tools (like automatic dialect analysis) are used from states [Germany] to verify the claims of origin of refugees. It is very often that this process can get wrong because "Identifying the region of origin for anyone based on their speech is an extremely complex task" and depends on "a wide range of factors".[1]


I will do it by relating the two sides of the tool and rethink/hack this tool in a way that open a conversation around this issue/ radicalise this tool.

Our voice is personal

Voice samples for training speech analysis software (LDC). Tracing the samples Using speech analysis software to verify voice samples


what data: ordered samples or real samples (broadcast conversations, broadcast news, field recordings[air traffic, walking/noise background, ], meeting speech, microphone conversation, microphone speech, telephone conversations, telephone speech, transcribed speech, video) examples of verification: diagnostic tool(for disease, depression), personal assistants (humanize the software voice), refugees seeking asylum/verification of claims of origin/Germany, banks


from where: universities (of linguistics) around the world, research projects or satellites, radio Example


extracts of descriptions of the samples: "Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers.", "The audio files are 8 KHz, 16-bit linear sampled data, representing continuous monitoring, without squelch or silence elimination, of a single FAA frequency for one to two hours.", "The Air Traffic Control Corpus (ATC0) is comprised of recorded speech for use in supporting research and development activities in the area of robust speech recognition in domains similar to air traffic control (several speakers, noisy channels, relatively small vocabulary, constrained languaged, etc.) The audio data is composed of voice communication traffic between various controllers and pilots." Example

with permission from the users or not in the case of real samples matter of privacy, de-humanizing automated processes regarding control of the body
Trained data/ speech recognition

samples:


I wish I had your strength.

tatoeba.org for Common Voice of Mozilla

LDC93S1 0 46797 She had your dark suit in greasy wash water all year.

catalog.ldc.upenn.edu for Pocketsphinx

por que al fin y al cabo el miedo de la mujer a la violencia del hombre es el espejo del miedo del hombre a la mujer sin miedo CMPB_M_32_01IVN_00004

catalog.ldc.upenn.edu for Pocketsphinx (broadcast conversation)

Interview 15
(A=Interviewer; B=Interviewee)
A: So we are recording.  Awesome.  So how long have you lived in Flint? (unclear)
B: 38 years.
A: Is that your whole life?  Wow you look really young.
B: Thank you!
A: So what can you tell me about what it’s like to have grown up around here?
B: Normal, just very- Working class, nice people.  Good values, good heart.  Did all the usual.  Rode my bike.  Played outside.  Brownies. Family-oriented.  Just very- I mean- very Midwestern.  You know.  Cliché. I mean really.
A: Okay.  Have you- have you traveled around to other places to see how things are like in comparison to Flint?
B: Um…do you mean throughout the county or the state or the country or-
A: Any. (unclear)
B: Any…um… unfortunately I haven’t been able to travel that much.  My traveling has been just basically through the state.  A couple trips to Canada.  Ontario.  Some through the- like- I guess you’d say upper Midwest.  Ohio.  Illinois.  Indiana.  Iowa. And it’s- I find it very similar.  I mean different.  I mean.  What their economy was based in.  Because it wasn’t GM.  They were really sim- It was- felt like really similar.  Just the way people are- small communities are nice and down to earth.  What I come across but just similar.  But I would like to travel more in my life.  
A: So would you say, like it- you kind of had like a typical American experience growing up in Flint?
B: Yeah, I would say so, just um … like I said, bike riding, Brownies. I was fortunate I had both my parents stay together.  Most of my friends got divorced, their parents got divorced, so we were atypical in that sense. And just...um, normal neighborhood, just close but not in each other’s business, so to speak, and just family oriented. Nothing too exciting. 
catalog.ldc.upenn.edu for Pocketsphinx (microphone conversation)