When Text Is Not Enough: Structural Limits of Text-Only Transformer-Based Emotion Classification

Authors

Szymon Chirowski
Breda University of Applied Sciences image/svg+xml
Maciej Czerniak
Breda University of Applied Sciences image/svg+xml
Ondrej Mitas
Breda University of Applied Sciences, Academy for Tourism
Maks Burchard
Breda University of Applied Sciences image/svg+xml

Synopsis

This study investigates whether limitations observed in text-only transformer-based emotion classification pipelines reflect implementation shortcomings or structural constraints inherent to unimodal modeling. A pipeline was constructed using unscripted dialogue from MasterChef Polska, incorporating automated speech-to-text transcription, neural machine translation, and benchmarking across SVM, Bi-LSTM, and RoBERTa architectures. While the fine-tuned RoBERTa model achieved substantially higher accuracy (0.755), confusion matrix analysis and explainable AI techniques revealed persistent structural asymmetries, including uneven performance across emotion categories, high-arousal anger-joy confusion, and translation-induced distortions. Evaluation against automated labels further exposed a “Ground Truth Paradox,” where models are validating each other rather than a human-verified set of conclusions. Increased architectural capacity improves performance but does not resolve structural limitations of text-only emotion classification.

Author Biographies

Szymon Chirowski, Breda University of Applied Sciences

Breda, the Netherlands. E-mail: 242621@buas.nl

Maciej Czerniak, Breda University of Applied Sciences

Breda, the Netherlands. E-mail: 243552@buas.nl

Ondrej Mitas, Breda University of Applied Sciences, Academy for Tourism

Breda, the Netherlands. E-mail: mitas.o@buas.nl

Maks Burchard, Breda University of Applied Sciences

Breda, the Netherlands. E-mail: 240894@buas.nl

Published

June 5, 2026

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Chirowski, S., Czerniak, M., Mitas, O., & Burchard, M. (2026). When Text Is Not Enough: Structural Limits of Text-Only Transformer-Based Emotion Classification. In D. Vidmar, A. Pucihar, M. Kljajić Borštnar, R. W. H. Bons, M. Glowatz, & H.-D. Zimmermann (Eds.), & (Ed.), 39th Bled eConference: Co-Creating Human-Centred and Responsible Digital Futures; Conference Proceedings (Vols. 39., pp. 705-720). University of Maribor Press. https://press.um.si/index.php/ump/catalog/book/1128/chapter/1212