User:Angeliki/Grad project loop

Human always in the loop

What do I want to make?

I am interested in the detachment of the voice from its physical proprietor while being mediated through different kinds of media. This has created a new relationship with our bodies and the others, that has opened possibilities of communication and has extended ourselves. But at the same time we approach those media as users that forget about the embodied presence in this digital landscape. In our post-human era, as Zizek says, “we loose the basic distance which makes us human” in “the prospect of the direct link between our brain and the digital network” (The Economist, 2018). It seems that we are more than distant users using a communication tool. We are part of the communication circuit being altered and controlled by commercial and state structures. Our voices are transferred, used and controlled into this massive flow of information. Thinking back in time about the communication diagram of Shannon Claude^[1] I made a similar annotated diagram that shows how complex the process has become. The new data practices of social media and machine learning have intervened into the communication system making eternal loops. We are already part of it. I want to make visible the human presence in the loop of the communication systems. We are part of the loop, we experience it everyday and we can intervene in it. As Zizek observes, it is very difficult for the common person to understand how algorithms work “but we can easily understand how we are controlled by the digital grid” (The Economist, 2018).
It is commonly unknown that the most important part of making an intelligent ‘assistant’ (based on speech recognition) is the contribution of huge data of human voices. Speech recognition and analysis tools are trained by a database of 'real' voice samples. These samples come from different sources and communication systems around the world [imitations of conversations, radio broadcasts, telephone conversations, field recordings, online readings] sometimes with the permission of the people donating their voice. At the same time these tools (like the automatic dialect analysis) are used from the state of Germany to verify the claims of origin of refugees. It is very often that they can get wrong because "Identifying the region of origin for anyone based on their speech is an extremely complex task" and depends on "a wide range of factors"(Sputnik, no date). Google also is using voice recordings of the users from other google apps to train its speech recognition tool.

I want to highlight several qualities regarding this process, like materiality, time (delay) and embodiment(involvement of physical bodies), that are fading out and altered into this massive and quick automated process. But also to define what is our new relation with our voices and bodies related to those systems. I aim to accomplish this by engaging with the old mediums, like radio that depends on physical factors, and the speech recognition tools that are trained on mediated voices. I want to engage with these tools and at the same time try practices that interrogate our contribution in the loop.

How do I plan to make it?

I will make visible this condition (of human always on the loop) through a collection of material related to trained data of speech recognition tools [voice samples, ways of collection of samples] and practices that re-enact the loop with human activities involved in it. More specifically, these practices will be related to the human processes of the loop, like the presence of human annotators [transcription], listening and identifying voices, decoding and encoding, donating voice samples but also other elements like the delay in the circuit and the spatial or hardware qualities that distort the voice. Below I present some examples I am already involved in that can be elements for supporting my approach:
Here are some examples of voice samples that I found from an organisation involved for the training of a speech recognition tools [pocketsphinx]:

(microphone conversation)

Interview 15
(A=Interviewer; B=Interviewee)
A: So we are recording.  Awesome.  So how long have you lived in Flint? (unclear)
B: 38 years.
A: Is that your whole life?  Wow you look really young.
B: Thank you!(...)

(telephone conversation/ giving directions on spot while walking)

And this is an example of a description accompanied these voice samples that shows the way that a voice sample was made: ...The number of interviewees in a single program varies from one to three, but typically, one interviewer and two interviewees appear in the program. The material includes passages of interactive dialogue, but longer stretches of monologue-like speech comprise the majority of the collected data...
To start grasping and controlling a reality that we are already in [human-in-the-loop] we should also understand the processes happening in relation to our bodies. Exercises like the deep listening sessions of Pauline Oliveros can be a daily experimentation where I can find a place between our bodies and the technology of mediation. They are about sharing and following collectively instructions that we get connected with the inner functions of the body, like circulation, electromagnetism and vibrations, by reading, listening and moving. My personal experience of it was that I became a medium by repeating a video with instructions.

What is my timetable?

Until the end of January: Conducting interviews with people related in collecting voice data or engaged with and re-appropriating communication technologies. I will participate in workshops or lectures related to these topics throughout the year, like CCC (Chaos Communication Congress). I will research on recordings/databases and ways of collecting voice samples for training. At the same time I will experiment with mediation in private and public spaces. I will also do more experiments on collective reading, listening, repeating, walking. I will document observations and audio, video, photos, text produced.
February: I will go on with my experiments and I will involve more people in the process. Invite them to share their personal relation to these experiments [how they perceive the mediation of their voice and their presence in the loop] and use this material in the process. Merge my process with facts related to the people involved (where is that process important or not).
March-April: Considering the documentation and outcome of my processes I will re-think my next steps and select part of the practices that have more impact.
May-June: Wrap up

Why do I want to make it?

This era of post-humanity is about the “expanding role of science, machines and digital media in social control and regulation” (The Economist, 2018). We think that we may have a safe distance from our mediated communication and we perceive ourselves mostly as users but we don’t really understand ourselves as part of it. I think the shift of this perception towards our embodied presence in the communication loop can open up possibilities for the relation we have with it. Our communication and our relation with these tools have been mostly formed by big corporations or the states and their character is mostly military, male and scientific. It seems to me that the communication platforms are estranged^[2] realities where the personal body [cultural, physical, political, gender] disappears^[3]. They are technically complex and there is a mystery around them. My purpose is to appropriate these systems in a way that follows up the "situation"/position of a person and redefine them more conceptually. I am interested especially in voice because our voice is a personal and unique element. For oral cultures the voice was a medium to spread knowledge, on a way that differs a lot from the writing cultures, "When auditory experiences are shared, histories too are shared, and not only from mouth to ear: they are perceived by and encoded in the body through the physicality of sound waves and passed on from one generation to another."(Public Radio - documenta 14, no date) The machine learning and speech recognition tools entering our daily lives demand a huge database of voices for training. "(S)hould we be worried about the large-scale harvesting of our voiceprints? (…) The companies behind this technology say that a voiceprint includes more than 100 unique physical and behavioural characteristics of each individual, such as length of the vocal tract, nasal passage, pitch, accent and so on. They claim it is as unique to an individual as a fingerprint, and that their systems even recognise people if they have a cold or sore throat."(Jones, 2018).
Technology becomes an extension of this desire to reach the invisible and distant, something beyond the limitations of your own body. But when I talk about detachment I don’t mean it in a negative sense; there is first an alienation and frustration and distancing, but if we understand it with our body we understand this communication, so it is a way of understanding media through simple techniques as, for instance, just ‘repeating’ a youtube video or transcribing the voice of our interlocutor.

Who can help me and how?

People I interview like Reni Hofmüller can help me on experimenting with the specific technology of mediation and imagining other aspects of them. Also, Raadio Caargo can help me imagining potential futures [feminist futurotopias as they call it] of the mediums by engaging with different related methodologies and practices. Joana, a former student can help me with prototyping, references and discussion on embodied and distant voice. My tutors Amy and Clara with deep listening exercises.

Previously

I move towards this direction the last years. I worked with collective writing, reading, speech recognition, collective annotation and collective reading. It is very often in my work that I am interested in the parallel presence through the voice and the tools that relate the embodied and the distant voice.

Relation to a larger context

In general there are several approaches from artists and theoreticians on the transmission and distortion of the voice being mediated. All this theory starts with the invention of the telephone, when we could listen to our own voice outside of our bodies. I relate this context with the data practices. There is a term related to AI and machine learning called “Human-in-the-loop” or HITL that I indicate in the title of my project. It “is defined as a model that requires human interaction. Human-in-the-loop allows the user to change the outcome of an event or process.”^[4]
The way we perceive ourselves in the loop is how we perceive the messages being in between the channels. The encoding and decoding model of communication “In contrast to other media theories that disempower audiences, Hall proposed that audience members can play an active role in decoding messages as they rely on their own social contexts, and might be capable of changing messages themselves through collective action.” The message can be “interpreted differently from person to person.” ^[5]
There are several attempts [from feminists, artists, programmers, sociologists] of approaching hacking, technological cultures from a more feminist approach that involves the body and the vulnerabilities of the individual.

Bibliography

The Economist (2018) ‘Are liberals and populists just searching for a new master?’, 8 October. Available at: https://www.economist.com/open-future/2018/10/08/are-liberals-and-populists-just-searching-for-a-new-master (Accessed: 25 October 2018).

Sputnik (no date) Germany to Use Dialect Recognition Software to Verify Origins of Refugees. Available at: https://sputniknews.com/europe/201703181051711403-germany-software-dialects-refugees/ (Accessed: 11 November 2018).

Jones, R. (2018) ‘Voice recognition: is it really as secure as it sounds?’, The Guardian, 22 September. Available at: https://www.theguardian.com/money/2018/sep/22/voice-recognition-is-it-really-as-secure-as-it-sounds (Accessed: 11 November 2018).

Public Radio - documenta 14 (no date). Available at: https://www.documenta14.de/en/public-radio/ (Accessed: 7 November 2018).

↑ https://monoskop.org/Information_theory#mediaviewer/File:Shannon_Claude_E_1948_General_communication_system_diagram.jpg
↑ estranged= 1. To make hostile, unsympathetic, or indifferent; 2. To remove from an accustomed place or set of associations
↑ The movie "The phantom of the operator" that I will watch talks about it. https://underbelly.nu/product/the-phantom-of-the-operator/
↑ https://en.wikipedia.org/wiki/Human-in-the-loop
↑ https://en.wikipedia.org/wiki/Encoding/decoding_model_of_communication

[1] ttps://monoskop.org/Information_theory#mediaviewer/File:Shannon_Claude_E_1948_General_communication_system_diagram.jpg

[2] stranged= 1. To make hostile, unsympathetic, or indifferent; 2. To remove from an accustomed place or set of associations

[3] The movie "The phantom of the operator" that I will watch talks about it. https://underbelly.nu/product/the-phantom-of-the-operator/

[4] ttps://en.wikipedia.org/wiki/Human-in-the-loop

[5] ttps://en.wikipedia.org/wiki/Encoding/decoding_model_of_communication

[1]

[2]

[3]

[4]

[5]