AI Shows Higher Emotional IQ than Humans

Summary: A new study tested whether artificial intelligence can demonstrate emotional intelligence by evaluating six generative AIs, including ChatGPT, on standard emotional intelligence (EI) assessments. The AIs achieved an average score of 82%, significantly higher than the 56% scored by human participants.

These systems not only excelled at selecting emotionally intelligent responses but were also able to generate new, reliable EI tests in record time. The findings suggest that AI could play a role in emotionally sensitive domains like education, coaching, and conflict resolution, when supervised appropriately.

Key Facts:

AI Emotional IQ: Generative AIs outperformed humans in emotional intelligence tests, scoring 82% vs. 56%.
Test Creation: ChatGPT-4 created new EI tests that matched expert-designed assessments in clarity and realism.
Real-World Use: Findings suggest potential for AI in coaching, education, and conflict management.

Source: University of Geneva

Is artificial intelligence (AI) capable of suggesting appropriate behaviour in emotionally charged situations?

A team from the University of Geneva (UNIGE) and the University of Bern (UniBE) put six generative AIs — including ChatGPT — to the test using emotional intelligence (EI) assessments typically designed for humans.

The outcome: these AIs outperformed average human performance and were even able to generate new tests in record time. These findings open up new possibilities for AI in education, coaching, and conflict management.

The study is published in Communications Psychology.

Large Language Models (LLMs) are artificial intelligence (AI) systems capable of processing, interpreting and generating human language. The ChatGPT generative AI, for example, is based on this type of model. LLMs can answer questions and solve complex problems.

But can they also suggest emotionally intelligent behaviour?

These results pave the way for AI to be used in contexts thought to be reserved for humans.

Emotionally charged scenarios

To find out, a team from UniBE, Institute of Psychology, and UNIGE’s Swiss Center for Affective Sciences (CISA) subjected six LLMs (ChatGPT-4, ChatGPT-o1, Gemini 1.5 Flash, Copilot 365, Claude 3.5 Haiku and DeepSeek V3) to emotional intelligence tests.

‘‘We chose five tests commonly used in both research and corporate settings. They involved emotionally charged scenarios designed to assess the ability to understand, regulate, and manage emotions,’’ says Katja Schlegel, lecturer and principal investigator at the Division of Personality Psychology, Differential Psychology, and Assessment at the Institute of Psychology at UniBE, and lead author of the study.

For example: One of Michael’s colleagues has stolen his idea and is being unfairly congratulated. What would be Michael’s most effective reaction?

a) Argue with the colleague involved

b) Talk to his superior about the situation

c) Silently resent his colleague

d) Steal an idea back

Here, option b) was considered the most appropriate.

In parallel, the same five tests were administered to human participants. “In the end, the LLMs achieved significantly higher scores — 82% correct answers versus 56% for humans.

This suggests that these AIs not only understand emotions, but also grasp what it means to behave with emotional intelligence,” explains Marcello Mortillaro, senior scientist at the UNIGE’s Swiss Center for Affective Sciences (CISA), who was involved in the research.

New tests in record time

In a second stage, the scientists asked ChatGPT-4 to create new emotional intelligence tests, with new scenarios. These automatically generated tests were then taken by over 400 participants.

‘‘They proved to be as reliable, clear and realistic as the original tests, which had taken years to develop,’’ explains Katja Schlegel.

‘‘LLMs are therefore not only capable of finding the best answer among the various available options, but also of generating new scenarios adapted to a desired context. This reinforces the idea that LLMs, such as ChatGPT, have emotional knowledge and can reason about emotions,’’ adds Marcello Mortillaro.

These results pave the way for AI to be used in contexts thought to be reserved for humans, such as education, coaching or conflict management, provided it is used and supervised by experts.

About this AI and Emotional IQ research news

Author: Antoine Guenot
Source: University of Geneva
Contact: Antoine Guenot – University of Geneva
Image: The image is credited to Neuroscience News

Original Research: Open access.
“Large language models are proficient in solving and creating emotional intelligence tests” by Marcello Mortillaro et al. Communications Psychology

Abstract

Large language models are proficient in solving and creating emotional intelligence tests

Large Language Models (LLMs) demonstrate expertise across diverse domains, yet their capacity for emotional intelligence remains uncertain.

This research examined whether LLMs can solve and generate performance-based emotional intelligence tests.

Results showed that ChatGPT-4, ChatGPT-o1, Gemini 1.5 flash, Copilot 365, Claude 3.5 Haiku, and DeepSeek V3 outperformed humans on five standard emotional intelligence tests, achieving an average accuracy of 81%, compared to the 56% human average reported in the original validation studies.

In a second step, ChatGPT-4 generated new test items for each emotional intelligence test.

These new versions and the original tests were administered to human participants across five studies (total N = 467). Overall, original and ChatGPT-generated tests demonstrated statistically equivalent test difficulty.

Perceived item clarity and realism, item content diversity, internal consistency, correlations with a vocabulary test, and correlations with an external ability emotional intelligence test were not statistically equivalent between original and ChatGPT-generated tests.

However, all differences were smaller than Cohen’s d ± 0.25, and none of the 95% confidence interval boundaries exceeded a medium effect size (d ± 0.50). Additionally, original and ChatGPT-generated tests were strongly correlated (r = 0.46).

These findings suggest that LLMs can generate responses that are consistent with accurate knowledge about human emotions and their regulation.