Reinforcing Stereotypes of Anger: Emotion AI on African American Vernacular English

Generative AI & LLMs
Published: arXiv: 2511.10846v1
Authors

Rebecca Dorn Christina Chance Casandra Rusti Charles Bickham Kai-Wei Chang Fred Morstatter Kristina Lerman

Abstract

Automated emotion detection is widely used in applications ranging from well-being monitoring to high-stakes domains like mental health and hiring. However, models often rely on annotations that reflect dominant cultural norms, limiting model ability to recognize emotional expression in dialects often excluded from training data distributions, such as African American Vernacular English (AAVE). This study examines emotion recognition model performance on AAVE compared to General American English (GAE). We analyze 2.7 million tweets geo-tagged within Los Angeles. Texts are scored for strength of AAVE using computational approximations of dialect features. Annotations of emotion presence and intensity are collected on a dataset of 875 tweets with both high and low AAVE densities. To assess model accuracy on a task as subjective as emotion perception, we calculate community-informed "silver" labels where AAVE-dense tweets are labeled by African American, AAVE-fluent (ingroup) annotators. On our labeled sample, GPT and BERT-based models exhibit false positive prediction rates of anger on AAVE more than double than on GAE. SpanEmo, a popular text-based emotion model, increases false positive rates of anger from 25 percent on GAE to 60 percent on AAVE. Additionally, a series of linear regressions reveals that models and non-ingroup annotations are significantly more correlated with profanity-based AAVE features than ingroup annotations. Linking Census tract demographics, we observe that neighborhoods with higher proportions of African American residents are associated with higher predictions of anger (Pearson's correlation r = 0.27) and lower joy (r = -0.10). These results find an emergent safety issue of emotion AI reinforcing racial stereotypes through biased emotion classification. We emphasize the need for culturally and dialect-informed affective computing systems.

Paper Summary

Problem
Emotion AI systems, which use Natural Language Processing (NLP) to recognize emotions in text, often struggle to accurately interpret emotional expressions in dialects spoken by historically marginalized communities, such as African American Vernacular English (AAVE). This can lead to biased or inaccurate results, amplifying harmful stereotypes and undermining the reliability of emotion AI.
Key Innovation
This research paper investigates how text-based emotion detection AI systems handle AAVE, focusing on individual sociolinguistic feature influence on automated affect. The authors use a dataset of 875 tweets sourced from the greater Los Angeles County and find that models label texts with AAVE features with disproportionately high false positive rates for anger and disgust, while struggling to identify joy. They also identify two unique African American communication practices, augmentation and performativity, which may partially explain the differences in annotators' emotion perception and sensitivity to profanities.
Practical Impact
This research has important implications for the development of more culturally sensitive emotion AI systems. By understanding how AAVE is perceived and interpreted by emotion AI, we can develop more accurate and reliable models that take into account the nuances of language use in different communities. This can help to reduce bias and stereotyping in emotion AI and improve its effectiveness in applications such as mental health and therapy chat-bots.
Analogy / Intuitive Explanation
Imagine trying to understand a joke that relies heavily on cultural references or idioms that are unfamiliar to you. You might misinterpret the joke or think it's not funny, even though it's intended to be humorous. Similarly, emotion AI systems may struggle to understand the nuances of AAVE and misinterpret emotional expressions, leading to biased or inaccurate results. This research aims to improve our understanding of how AAVE is perceived and interpreted by emotion AI, so that we can develop more accurate and reliable models that take into account the complexities of language use in different communities.
Paper Information
Categories:
cs.CL cs.AI cs.CY
Published Date:

arXiv ID:

2511.10846v1

Quick Actions