Is AI ready to understand human emotions?
Does this person necessarily express anger because they are frowning? For most current emotion detection algorithms, the answer is yes. But it's not that simple...
In recent years, affective computing has become a hot topic and is increasingly present in many fields: security, marketing, entertainment, tourism, and even in the health sector.
Indeed, because of its large economic and social potential, affective computing has become a flourishing multidisciplinary research field with a wide range of applications:
- mass emotional analysis (general mood of a population, level of well-being generated by a recreational or tourist facility, etc.);
- security (risks of aggression in a stadium or public transport, detection of drowsy drivers, etc.);
- marketing and entertainment (emotional reactions to a product or film, for example);
- assistants and personal companions;
- health (pain detection, accompanying people with communication disorders, up to assistance in medical and psychopathological diagnosis).
Over the last 10 years, artificial intelligence has made spectacular progress in the field of facial expression analysis, thanks in particular to the increase in computing power of computers and the arrival of Deep Learning techniques.
Deep learning has allowed many technical advances, especially because algorithms, unlike those of traditional machine learning and if they are provided with enough learning examples (which can be counted in millions), require very little or no human intervention.
However, if current algorithms excel at recognizing the 6 so-called basic emotions (joy, fear, anger, sadness, surprise and disgust) on databases made up of facial expressions often exaggerated and posed in the laboratory (up to accuracy rates of over 99%), performance drops dramatically when AI is faced with more natural situations.
What could explain these difficulties and how to improve the relevance of algorithms? A short tour of the psychology of emotions could help us identify the mechanisms that still resist current systems.
The subtle and complex alchemy of emotions
Beyond the 6 basic emotions
Current automated systems, despite their increasing performance, use and require an universal design of emotions and facial expressions.
The classical theory of basic emotions (represented by Darwin and Paul Ekman to name only the best known), by its simple and categorized description, makes it possible to give an intelligibility to this very heterogeneous phenomenon, but it is now outdated.
Recent research has shown that there are many more categories of emotions, and that they are separated by more fuzzy boundaries than conventional theories suggest. It therefore seems necessary, in order to get as close as possible to reality, to develop new theoretical models and to train our systems to recognize a wider range of expressions, relating to complex or compound emotions, as well as to cognitive states or even non-emotional facial expressions, which seem to be much more common than emotional expressions in interactions between individuals.
Great variability depending on the context
Directly linked to the problem of categories of emotions, the question of the variability of expressions is essential to account for the richness of human emotional life. In the Basic Emotions Model, the variability of facial expressions is often attributed to methodological errors or to secondary factors, such as social display rules, which lead individuals to inhibit their emotional reactions.
This classic view is now replaced by theories that seek to take into account the complexity of the emotions that manifest in the interaction between the individual and his environment. Indeed, all observations in ecological environments show that facial expressions are highly variable and depend on multiple factors.
First of all, there can be several types of facial expressions for the same emotion. Indeed, depending on many factors such as their personality, their previous experience, their culture, the different social roles that they are brought to play but also according to the situation, the individuals do not live or do not express their emotions in the same way or with the same intensity.
Likewise, the same facial expression can have several meanings. For example, depending on the situation, a frown can mean anger, disgust, sadness, but also more complex mental states such as confusion or intense thinking. The notion of context therefore plays a very important role in order to correctly interpret facial expressions.
Emotions that are expressed in all modalities
In their daily interactions with the physical and social environment, humain beings use many modalities to express and decode emotional states: facial expressions, but also tone of voice, direction of gaze, gestures, postures, etc. For example, it is known that the intensity of an emotion is more easily perceived by non-visual modalities, such as tone of voice or bodily reactions.
It therefore seems natural and necessary for automated systems to be able to use several modalities, such as the coupling of audio and visual signals to differentiate emotional expressions from facial distortions caused by speech, analysis of vocal characteristics or emotional discourse, or even the fusion of more than two modalities in order to increase their precision.
Is emotional AI for tomorrow?
Affective computing is a constantly evolving field, and researchers have developed increasingly sophisticated and powerful systems in recent years. However, a large majority of these systems have been tested in conditions far from the natural context, and it can be argued that AI is still far from the capabilities of humans in decoding emotional states.
Emotion is, according to the most recent theories, a very subtle and changing phenomenon, which varies depending on many factors that theory has not yet been able to formalize in their entirety. Its theoretical definition, which reflects its complexity, is still a work in progress.
However, this lack of consensus and the diversity of theoretical models should not discourage researchers. There is a need to continue research along the four promising axes highlighted in this review (broadening the range of emotions, taking into account variability and context, moving towards multimodality and generating data from natural conditions), in order to that artificial intelligence can understand the subtle and complex way in which humans feel, express and exchange their emotions on a daily basis.
For a systematic review of literature, see: Masson, A. et al. (2020). The Current Challenges of Automatic Recognition of Facial Expressions: A Systematic Review. AI Communications, vol. Pre-press, no. Pre-press, pp. 1-26, 2020. DOI: 10.3233/AIC-200631.