Prednosti in slabosti dvotirnega zapisovanja govora v slovenskih govornih virih: Darinka Verdonik, Mitja Trojar, Andreja Bizjak
Synopsis
Advantages and Disadvantages of Two-Tier Speech Transcription in Slovenian Speech Resources. Transcribing speech in speech corpora is undoubtedly the largest time investment in the process of creating a speech corpus and an important reason that speech corpora are considerably smaller than written ones. Speech transcription is a translation from an originally multimodal channel of communication, in which verbally expressed meaning is shaped by the voice and manner of speaking, body language, etc., and converted into a single, written modality. Due to the variability of speech at all linguistic levels, the transcriber constantly faces the question of how to transcribe what s/he hears. In order to make the transcription as exact as possible, but at the same time feasible when working with large amounts of data, a pronunciation-based transcription was introduced in Slovenian speech corpora along with the standardized transcription. However, two-tier transcription requires additional effort. For this reason, this paper critically assesses its rationale, comparing practices used elsewhere, estimates of the additional effort and its advantages. Additionally, we assess other challenging aspects of speech transcription.