We know how you feel (better than you do)

Facial expression analysis and an emotion ontology

Emotion generator, it seems to be a fitting name for the medium of film. Unsurprisingly, many scholars have attempted, and are still attempting, to theorise spectatorial emotions. They try to tackle the paradox of fiction. Why does the audience laugh, cry, or have a chill down their spine while they know what happens on screen is not 'real'? What is their relation to the characters on screen? And how do they relate to the story presented to them? Why do they experience emotions, or at least say they experience emotions?

As a film director I have always been intrigued about these questions, and wondered how I could get the maximum emotional response in my audience. Theorists have often approached these questions from a psychological and philosophical angle. Attempts to explain how spectators relate to films and how their in-cinema emotions relate to their awareness of reality and the characters on the screen.^[1] What seems the be addressed less often, is how the cueing of emotions affects human behaviour, and the ethical consequences of that.

Though, with recent technological developments on the analysis of facial expressions, that are related to emotions, this question seems to get more and more relevant. That emotions have impact on behaviour is both acknowledged by the social sciences and (primarily?) by commercial parties. For example, it is claimed that more 50% of our consumer behaviour is influenced by our emotions (Langeveld, 2014). So from a commercial perspective a general theory on human emotions and how to target them, is more than welcome. That leaves one wondering: are we indeed machines that, when you press the right button, all express similar behaviour, or even experience similar emotions? Or when rephrased: do we experience emotions in a similar way?

Ad Etkins seems to ask us that question in his exhibition Recent Ouija (2015, Stedelijk Museum Amsterdam). His video works are roller coasters for emotions. Triggering one emotion after the other, without requiring us to distinguish any plot, or develop any sympathy for the character(s).

It leaves you wondering, on what ground do we experience those emotions? How 'sincere' are the emotions we experience when we watch films or television? The first question that needs answering, is one of definition. We need to make a distinction between emotion, affect, feelings and might eventually even need to include mood in the equation as well. This distinction seems absent in the communication of companies involved in facial expression analysis. They prefer the term emotion: probably because of simplicity and understandability. A first distinction of emotion and affect is made by Brian Massumi (1995). He states that emotions are on a concious level, whereas affect is a reflexive, unconscious response. Though, during my following research I will have to expand on these distinctions in order to enter the debate more precise.

As commercial parties are interested in maximization of emotional response (as they believe this enhances brand engagement), they to turn to 'hard facts' on emotional yield. Numbers and statistics to hedge themselves from responsibility. They find themselves supported by some psychologists who are making an indexation and classification of human emotions and their corresponding facial expressions. It started of as a study for the benefit of people with syndromes in the autistic spectrum. In their research they ended up with 412 distinct emotions, grouped in 12 categories (Baron-Cohen, 2004). This research was grounded on the notion that the previously hailed six universal basic emotions -- anger, disgust, fear, joy, sadness, and surprise -- were not sufficient to address the various emotions encountered in human interactions. In The Mindreading Emotion Library they let six actors play each of the 412 emotions, both on sound and on video.

This research recently turned another way, and now provides the basis for engagement studies for commercials, running under the name Affectiva. Using data of facial movements they try to determine which version of an ad provides the 'optimal' emotional response. These are Affectiva's first steps. Later, so they tell us, they believe their computer technology can be used by autists not only to recognise emotions, but also to learn how to mimic the right emotion in a certain situation. It seems to me one then tries to teach people to smile, without knowing how to empathise with happiness, to learn to put a grievous face, without knowing how to grieve. Where the research of Baron-Cohen set a standard for the recognition/classification of those emotions. They want to set a standard for the expression of emotions based on their models.

As Alfred Korzybski in the beginning of the 20th century stated: "the map is not the territory". He meant that models can give us some insight into what we might discover around the corner, but they lack detail and do not explain how things have gotten there. The same counts for models of facial expressions and emotions: they can give some insight in what response we can likely expect of somebody, but they are too shallow to account for the rich sources of the experienced emotions. The associations that form the foundation of emotions are not taken into account. ^[2] Affectiva's system (and similar systems by competitors like RealEyes and Emotient) is looking for hard facts: it tries to render visible the emotions that those who experience them don't even notice they have (Langeveld, 2014). It tries to come up with a single answer to the seemingly simple question: "what do you feel right now?". By that, apparently neglecting the ambiguous nature of human emotions.

When exploring the nature of human emotions in cinema, theatre and other forms of (narrative) art, Kendall Walton compares our emotional responses with those of a child playing the game of make-believe.

Compare [a spectator] with a child playing an ordinary game of make-believe with his father. The father, pretending to be a ferocious monster, cunningly stalks the child and at a crucial moment, lunges viciously at him. The child flees, screaming, to the next room. The scream is more or less involuntary, and so is the flight. But the child has a delighted grin on is face even while he runs, and he unhesitatingly comes back for more. He is perfectly aware that his father is only "playing," that the whole thing is "just a game," and that only make-believedly is there a vicious monster after him. He is not really afraid.

It is like the child is acting, playing a role. But he is not acting for a spectator, he is mainly playing the role for himself. Simultaneous experiencing and expressing two emotions, both of which are sincere. Walton puts it more polished/ornate when he says:

Rather than somehow fooling ourselves into thinking fictions are real, we become fictional.

He argues that we, not unlike in dreams and fantasies, transcendent into a fictional realm when we watch a film or read a novel. We pursue a (therapeutic) desire towards the fiction. (Is it a desire for aura or otherness?).

Expanding on that, I would like to argue that most of our human interactions have a form play in them. Do we not take on different 'roles' in different interactions -- whether it's concious or not? Does 'being yourself' not lead to different behaviour in distinct social contexts? Are we not theatrical beings?

When following this line of thought, what is intriguing is what happens when one uses videos of b-actors acting out over 400 emotions, as a basis to train others how to express emotions properly. A feedback loop is created in which the stereotypical expression of an emotion (as played out by the actor) becomes the norm.

It seems similar to what George Didi-Huberman describes in Invention de l’hysterie (1982, translated into English in 2003), on a prison for four thousand incurable or mad woman. In this "feminine inferno" Charcot studies hysteria by making pictures of women in certain poses. But due to his methods, he encourages the woman to pose more and more in strange ways. The woman beforehand saw their hysteria almost as an art form, like theatre. But as they where pushing themselves further, the illness worsened. Oddly, this might be compared to children copying behaviour (or words) they see on television. It often starts as a joke, but then becomes embedded in their behaviour and gets its own connotations.

As Friedrich Kitler states, each new technology influences our self-image. Similarly, training of facial emotions with data-analysis, will only lead to a higher awareness of ones self-performance. It seems to me that the double promise of facial emotion analysis -- on the one hand they claim to be able to find 'the real' emotion behind the face, on the other hand they want to train people to exhibit facial expressions fitting to a certain situation -- is both grounded in a form of normalisation. While companies like Affectiva and Emotient state their research is based on facts, the normalisation seems to be based on a human classification and annotation of facial expressions and their link to emotions.

In the following research I want to look into emotions and affect. What is 'sincerity' in emotional expression? How is this influenced by facial expression analysis? How does the premise of these technologies relate to the notion of sincerity? And what are its assumptions of and implication for human interaction?

The fact that film is a, or maybe even the, medium of emotions it seems only fitting to research this topic of normative emotions using this medium. Even more so because actors are trained on expressing emotions using various techniques, which might add interesting insights into the subject. On the other hand, the questions raised in this text only exist because of computer technologies used to analyse and quantify facial expressions. I therefore believe it would be good to investigate these technologies.

References

Langeveld, Nick (2014), "How Facial Coding Will Help Marketers Better Understand Consumers Neuromarketing" presented by Nick Langeveld, President and CEO of Affectiva, at the 2014 iMedia Summit in Austin, TX. https://www.youtube.com/watch?v=yBMc5xAmOWg
Kendall Walton (1978), Fearing Fictions
Massumi, Brian (1995), Cultural Critique, No. 31, The Politics of Systems and Environments, Part II. (Autumn, 1995), pp. 83-109.
Didi-Huberman, George (1982, translated into English in 2003), Invention de l’hysterie

↑ ie G.M.Smith (2003) - Film Structure and the Emotion System, M.Smith (1995) - Engaging Characters, A. Coplan (2009) - Empathy and character engagement.
↑ In my graduation thesis at the Utrecht School of the Arts, I have written about how associations provide the basis of emotions and how this relates to moods (2013).

[1] G.M.Smith (2003) - Film Structure and the Emotion System, M.Smith (1995) - Engaging Characters, A. Coplan (2009) - Empathy and character engagement.

[2] In my graduation thesis at the Utrecht School of the Arts, I have written about how associations provide the basis of emotions and how this relates to moods (2013).

[1]

[2]