Prednosti in slabosti dvotirnega zapisovanja govora v slovenskih govornih virih: Darinka Verdonik, Mitja Trojar, Andreja Bizjak

Authors

Synopsis

Advantages and Disadvantages of Two-Tier Speech Transcription in Slovenian Speech Resources. Transcribing speech in speech corpora is undoubtedly the largest time investment in the process of creating a speech corpus and an important reason that speech corpora are considerably smaller than written ones. Speech transcription is a translation from an originally multimodal channel of communication, in which verbally expressed meaning is shaped by the voice and manner of speaking, body language, etc., and converted into a single, written modality. Due to the variability of speech at all linguistic levels, the transcriber constantly faces the question of how to transcribe what s/he hears. In order to make the transcription as exact as possible, but at the same time feasible when working with large amounts of data, a pronunciation-based transcription was introduced in Slovenian speech corpora along with the standardized transcription. However, two-tier transcription requires additional effort. For this reason, this paper critically assesses its rationale, comparing practices used elsewhere, estimates of the additional effort and its advantages. Additionally, we assess other challenging aspects of speech transcription.

Downloads

Published

July 18, 2024

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Prednosti in slabosti dvotirnega zapisovanja govora v slovenskih govornih virih: Darinka Verdonik, Mitja Trojar, Andreja Bizjak. (2024). In Stanje in perspektive uporabe govornih virov v raziskavah govora (pp. 63-80). University of Maribor Press. https://press.um.si/index.php/ump/catalog/book/898/chapter/48