User:Angeliki/Grad project speech analysis: Difference between revisions
Line 13: | Line 13: | ||
{| | {| border="1" style="border-collapse:collapse" | ||
|- | |- | ||
! Voice samples for training speech analysis software ([https://catalog.ldc.upenn.edu/search LDC]). Tracing the samples !! Using speech analysis software to verify voice samples <br /> | ! Voice samples for training speech analysis software ([https://catalog.ldc.upenn.edu/search LDC]). Tracing the samples !! Using speech analysis software to verify voice samples <br /> | ||
Line 23: | Line 23: | ||
|- | |- | ||
| from where: universities (of linguistics) around the world, research projects or satellites, radio|| | | from where: universities (of linguistics) around the world, research projects or satellites, radio|| <br /><br /> | ||
Line 44: | Line 44: | ||
|- | |- | ||
| {{#Widget:Audio|flac=https://catalog.ldc.upenn.edu/desc/addenda/LDC2017S17.flac}}<br /> | | {{#Widget:Audio|flac=https://catalog.ldc.upenn.edu/desc/addenda/LDC2017S17.flac}}<br /> | ||
(microphone conversation) | |||
<pre style="white-space: pre-wrap;"> | <pre style="white-space: pre-wrap;"> | ||
Interview 15 | Interview 15 | ||
Line 50: | Line 51: | ||
B: 38 years. | B: 38 years. | ||
A: Is that your whole life? Wow you look really young. | A: Is that your whole life? Wow you look really young. | ||
B: Thank you!(... | B: Thank you!(...) | ||
</pre> | </pre> | ||
|- | |- | ||
| <pre style="white-space: pre-wrap;">((TAPE-HEADER "TAPE02; LOGAN, BOSTON ATCT; FINAL ONE, F1; 126.5 MHz; 26 JUNE 1991, 2012 TO 2212 UTC; TRANSCRIBER FR")) | | (air traffic) <pre style="white-space: pre-wrap;">((TAPE-HEADER "TAPE02; LOGAN, BOSTON ATCT; FINAL ONE, F1; 126.5 MHz; 26 JUNE 1991, 2012 TO 2212 UTC; TRANSCRIBER FR")) | ||
((COMMENT | ((COMMENT | ||
Line 76: | Line 77: | ||
</pre> | </pre> | ||
|} | |} | ||
|} | |} |
Revision as of 12:19, 12 November 2018
Re- humanizing voice samples
I want to highlight/be aware of the politics/situation regarding the speech recognition and analysis tools that are changing our perspective on our embodied voices. Beyond their experimental or commercial uses they are also used to control our access in a country or institutions and affect our behaviour with our bodies and technology. Those tools that are recently very broadly used are trained by a database of 'real' voice samples. These samples come from different sources around the world [research projects, frequencies, radio] sometimes with the permission of the people donating their voice. The more the accents are and the bigger the dataset becomes the better the tool can be trained. At the same time these tools (like automatic dialect analysis) are used from states [Germany] to verify the claims of origin of refugees. It is very often that this process can get wrong because "Identifying the region of origin for anyone based on their speech is an extremely complex task" and depends on "a wide range of factors".[1]
I will do this by relating the two sides of the tool and rethink/hack this tool in a way that open a conversation around this issue/ radicalise these tools.
I want to do this project because we are gradually and rapidly donate our personal data in big organisations in the shake of the "public good" In extend the way they are used affects our relations with the others/estrange us from our relations with the others and our surroundings/ dehumanize our lives. our voice is personal
This week Guardian Money was told that if there are clips of your voice out there on the web – on a podcast, say – there is technology that can create a very convincing imitation of your voice. And should we be worried about the large-scale harvesting of our voiceprints
The companies behind this technology say that a voiceprint includes more than 100 unique physical and behavioural characteristics of each individual, such as length of the vocal tract, nasal passage, pitch, accent and so on. They claim it is as unique to an individual as a fingerprint, and that their systems even recognise people if they have a cold or sore throat.
Voice samples for training speech analysis software (LDC). Tracing the samples | Using speech analysis software to verify voice samples
| |||
---|---|---|---|---|
what data: ordered samples or real samples (broadcast conversations, broadcast news, field recordings[air traffic, walking/noise background, ], meeting speech, microphone conversation, microphone speech, telephone conversations, telephone speech, transcribed speech, video) | examples of verification: diagnostic tool(for disease, depression), personal assistants (humanize the software voice), refugees seeking asylum/verification of claims of origin/Germany, banks
| |||
from where: universities (of linguistics) around the world, research projects or satellites, radio |
| |||
extracts of descriptions of the samples: "Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers.", "The audio files are 8 KHz, 16-bit linear sampled data, representing continuous monitoring, without squelch or silence elimination, of a single FAA frequency for one to two hours.", "The Air Traffic Control Corpus (ATC0) is comprised of recorded speech for use in supporting research and development activities in the area of robust speech recognition in domains similar to air traffic control (several speakers, noisy channels, relatively small vocabulary, constrained languaged, etc.) The audio data is composed of voice communication traffic between various controllers and pilots." |
| |||
with permission from the users or not in the case of real samples | matter of privacy, de-humanizing automated processes regarding control of the body | |||
Some audio samples with their transcriptions:
|