SenBen: Sensitive Scene Graphs for Explainable Content Moderation

AI in healthcare

Published: arXiv: 2604.08819v1

Authors

Fatih Cagatay Akyon Alptekin Temizel

Abstract

Content moderation systems classify images as safe or unsafe but lack spatial grounding and interpretability: they cannot explain what sensitive behavior was detected, who is involved, or where it occurs. We introduce the Sensitive Benchmark (SenBen), the first large-scale scene graph benchmark for sensitive content, comprising 13,999 frames from 157 movies annotated with Visual Genome-style scene graphs (25 object classes, 28 attributes including affective states such as pain, fear, aggression, and distress, 14 predicates) and 16 sensitivity tags across 5 categories. We distill a frontier VLM into a compact 241M student model using a multi-task recipe that addresses vocabulary imbalance in autoregressive scene graph generation through suffix-based object identity, Vocabulary-Aware Recall (VAR) Loss, and a decoupled Query2Label tag head with asymmetric loss, yielding a +6.4 percentage point improvement in SenBen Recall over standard cross-entropy training. On grounded scene graph metrics, our student model outperforms all evaluated VLMs except Gemini models and all commercial safety APIs, while achieving the highest object detection and captioning scores across all models, at $7.6\times$ faster inference and $16\times$ less GPU memory.

Paper Summary

Problem

Content moderation systems are used to classify images as safe or unsafe, but they lack spatial grounding and interpretability. This means that they cannot explain what sensitive behavior was detected, who is involved, or where it occurs. As a result, content moderation systems are not transparent, making it difficult to audit, adapt to different content policies, and provide meaningful human oversight.

Key Innovation

The authors introduce the Sensitive Benchmark (SenBen), the first large-scale scene graph benchmark for sensitive content. SenBen comprises 13,999 frames from 157 movies annotated with Visual Genome-style scene graphs, which include object classes, attributes, and predicates. The authors also propose a novel training recipe to distill a frontier VLM into a compact 241M student model using multi-task knowledge distillation with vocabulary-aware optimization.

Practical Impact

The SenBen dataset and the proposed training recipe can be used to develop more accurate and transparent content moderation systems. These systems can be used to classify images as safe or unsafe, and provide explanations for the classification, such as what sensitive behavior was detected, who is involved, and where it occurs. This can help to improve the transparency and accountability of content moderation systems, and enable more effective auditing and adaptation to different content policies.

Analogy / Intuitive Explanation

Imagine a content moderation system as a detective trying to solve a crime. The detective needs to identify the perpetrator, the crime, and the location of the crime. In the same way, a content moderation system needs to identify the sensitive behavior, the people involved, and the location of the behavior. The SenBen dataset and the proposed training recipe provide a way to train the detective (the content moderation system) to accurately identify the perpetrator, the crime, and the location, and to provide explanations for the classification.

Paper Information

Categories:

cs.CV cs.AI cs.LG cs.MM

Published Date:

arXiv ID:

2604.08819v1

Quick Actions

Back to Home