Skladenjska drevesnica govorjene slovenščine: stanje in perspektive: Kaja Dobrovoljc

Authors

Synopsis

Spoken Slovenian Treebank: Current Situation and Perspectives. In this paper we present the Spoken Slovenian Treebank (SST), the first syntactically annotated corpus of spoken Slovene containing a balanced and representative set of transcriptions from the Gos reference corpus of spoken Slovene, with manually annotated lemmas, morphological features and syntactic dependencies. The treebank is based on the Universal Dependencies (UD) annotation scheme, which aims at harmonised corpus annotation across languages and is increasingly applied to spoken data due to its interoperability, flexibility and the coverage of a wide range of grammatical structures, including speech-specific phenomena. After summarising the design, content and accessibility of the existing version of the SST, the second part of this paper describes the first results of the ongoing development, which includes the extension of the corpus with new data and the improvement of speech-specific annotation guidelines.

Downloads

Published

July 18, 2024

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Skladenjska drevesnica govorjene slovenščine: stanje in perspektive: Kaja Dobrovoljc. (2024). In Stanje in perspektive uporabe govornih virov v raziskavah govora (pp. 41-62). University of Maribor Press. https://press.um.si/index.php/ump/catalog/book/898/chapter/47