Skladenjska drevesnica govorjene slovenščine: stanje in perspektive: Kaja Dobrovoljc
Synopsis
Spoken Slovenian Treebank: Current Situation and Perspectives. In this paper we present the Spoken Slovenian Treebank (SST), the first syntactically annotated corpus of spoken Slovene containing a balanced and representative set of transcriptions from the Gos reference corpus of spoken Slovene, with manually annotated lemmas, morphological features and syntactic dependencies. The treebank is based on the Universal Dependencies (UD) annotation scheme, which aims at harmonised corpus annotation across languages and is increasingly applied to spoken data due to its interoperability, flexibility and the coverage of a wide range of grammatical structures, including speech-specific phenomena. After summarising the design, content and accessibility of the existing version of the SST, the second part of this paper describes the first results of the ongoing development, which includes the extension of the corpus with new data and the improvement of speech-specific annotation guidelines.