User:Angeliki/Grad project speech analysis

From XPUB & Lens-Based wiki

Re- humanizing voice samples

I want to highlight the politics/situation regarding the speech recognition and analysis tools that are changing our perspective on our voices and used to control our access and behaviour. Those tools that are recently very broadly used are trained by a database of 'real' voice samples. These samples come from different sources around the world sometimes with the permission of the people donating their voice. The more the accents are the better the tool can become. At the same time these tools are used to verify

I will do it by relating the two sides of the tool.

Our voice is personal

Voice samples for training speech analysis software (LDC). Tracing the samples Using speech analysis software to verify voice samples


what data: ordered samples or real samples (broadcast conversations, broadcast news, field recordings[air traffic, walking/noise background, ], meeting speech, microphone conversation, microphone speech, telephone conversations, telephone speech, transcribed speech, video) examples of verification: diagnostic tool(for disease, depression), personal assistants (humanize the software voice), refugees seeking asylum/verification of claims of origin/Germany, banks


from where: universities (of linguistics) around the world, research projects or satellites, radio Example


extracts of descriptions of the samples: "Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers.", "The audio files are 8 KHz, 16-bit linear sampled data, representing continuous monitoring, without squelch or silence elimination, of a single FAA frequency for one to two hours.", "The Air Traffic Control Corpus (ATC0) is comprised of recorded speech for use in supporting research and development activities in the area of robust speech recognition in domains similar to air traffic control (several speakers, noisy channels, relatively small vocabulary, constrained languaged, etc.) The audio data is composed of voice communication traffic between various controllers and pilots." Example

with permission from the users or not in the case of real samples matter of privacy, de-humanizing automated processes regarding control of the body