Jezikovni modeli za pripravo govornega korpusa: programi za prepoznavanje govora: Teodor Petrič

Authors

Synopsis

Language Models for Spoken Corpus Preparation: Speech Recognition Software. In the last decade, particularly in the last five years after the emergence of large language models based on transformer architectures, we have seen the development of a number of software tools that accelerate the creation of multi-layered corpora. We have tested software tools for speech recognition and conversion to written form (i.e. tools such as Razpoznavalnik, Microsoft Word Dictate, Vosk/Kaldi and OpenAI Whisper), which are crucial for accelerating the creation of spoken corpora. We have employed various criteria concerning ease of use, time-saving features, potential costs, ensuring speaker anonymity and various aspects of conversion quality (e.g. word error rates, number of substitutions, insertions and deletions). While the tools for converting speech to written form have made considerable progress, we would certainly wish for the ability to customize the output formats of these programmes to meet individual research needs, e.g. including discourse markers (such as the so-called ‘fillers’) or the actual spoken contracted word forms in the transcription.

Downloads

Published

July 18, 2024

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Jezikovni modeli za pripravo govornega korpusa: programi za prepoznavanje govora: Teodor Petrič. (2024). In Stanje in perspektive uporabe govornih virov v raziskavah govora (pp. 169-194). University of Maribor Press. https://press.um.si/index.php/ump/catalog/book/898/chapter/53