When Text Is Not Enough: Structural Limits of Text-Only Transformer-Based Emotion Classification

Avtorji

Szymon Chirowski
Breda University of Applied Sciences image/svg+xml
Maciej Czerniak
Breda University of Applied Sciences image/svg+xml
Ondrej Mitas
Univerza za uporabne znanosti v Bredi, Akademija za turizem
Maks Burchard
Breda University of Applied Sciences image/svg+xml

Kratka vsebina

This study investigates whether limitations observed in text-only transformer-based emotion classification pipelines reflect implementation shortcomings or structural constraints inherent to unimodal modeling. A pipeline was constructed using unscripted dialogue from MasterChef Polska, incorporating automated speech-to-text transcription, neural machine translation, and benchmarking across SVM, Bi-LSTM, and RoBERTa architectures. While the fine-tuned RoBERTa model achieved substantially higher accuracy (0.755), confusion matrix analysis and explainable AI techniques revealed persistent structural asymmetries, including uneven performance across emotion categories, high-arousal anger-joy confusion, and translation-induced distortions. Evaluation against automated labels further exposed a “Ground Truth Paradox,” where models are validating each other rather than a human-verified set of conclusions. Increased architectural capacity improves performance but does not resolve structural limitations of text-only emotion classification.

Biografije avtorja

Szymon Chirowski, Breda University of Applied Sciences

Breda, Nizozemska. E-pošta: 242621@buas.nl

Maciej Czerniak, Breda University of Applied Sciences

Breda, Nizozemska. E-pošta: 243552@buas.nl

Ondrej Mitas, Univerza za uporabne znanosti v Bredi, Akademija za turizem

Breda, Nizozemska. E-pošta: mitas.o@buas.nl

Maks Burchard, Breda University of Applied Sciences

Breda, Nizozemska. E-pošta: 240894@buas.nl

Izdano

5 junij 2026