Preview

“Universal Dependencies” for Syntactic Analysis of the Kyrgyz Language: Current State and Prospects

https://doi.org/10.55491/2411-6076-2025-2-153-167

Abstract

Kyrgyz, a Turkic language with over 4.4 million speakers concentrated primarily in Kyrgyzstan and adjacent regions of Central Asia, faces a significant disparity in computational linguistic resources compared to languages with similar or even smaller speaker populations. Despite its status as a government language and cultural cornerstone, Kyrgyz remains underrepresented in the digital linguistic landscape. This investigation examines the application of the Universal Dependencies (UD) framework – an annotation system engineered to facilitate cross-linguistic syntactic comparability – to the structural complexities of Kyrgyz. We endeavor to identify optimal annotation strategies that faithfully represent Kyrgyz-specific syntactic phenomena while adhering to the principled constraints of the UD paradigm. The establishment of standardized syntactic resources for Kyrgyz carries dual significance: it advances linguistic typology by incorporating data from an underrepresented language family, while simultaneously laying groundwork for practical natural language processing applications crucial for Kyrgyz speakers’ participation in the digital sphere. Our methodological approach encompasses rigorous analysis of nascent Kyrgyz treebanks, comparative evaluation of annotation strategies employed for genetically related Turkic languages, and systematic examination of four fundamental annotation challenges: the representation of Kyrgyz’s defective copula system, the classification of multifunctional grammatical particles, the annotation of constructions with implicit heads, and the demarcation between inflectional and derivational morphology in this highly agglutinative language. Our analysis reveals that achieving the dual objectives of linguistic fidelity and cross-linguistic consistency necessitates judicious adaptation of UD guidelines to accommodate Kyrgyz-specific structures. We advance unified annotation solutions that preserve the integrity of Kyrgyz linguistic patterns while facilitating meaningful cross-linguistic comparison. This research not only contributes substantively to computational resources for Kyrgyz but also establishes annotation principles with broader applicability to typologically similar agglutinative languages. The practical implications extend to enhanced guidelines for Kyrgyz treebank development, which will consequently improve parser accuracy and catalyze the development of essential language technology tools for Kyrgyz speakers.

About the Authors

M. Ryspakova
I. Arabaev Kyrgyz State University
Kyrgyzstan

Meerim Ryspakova, Doctoral student

Bishkek



A. Tursunova
I. Arabaev Kyrgyz State University
Kyrgyzstan

Aigul Tursunova, Doctoral student

Bishkek



References

1. Aili, M., Mushajiang, W., Yibulayin, T., Liu, K.А. (2018) Universal dependencies for Uyghur. Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016). P. 44-50. (in English)

2. Benli, İ. (2023) UD_Kyrgyz-KTMU: Universal Dependency treebank for Kyrgyz. GitHub repository: https://github.com/UniversalDependencies/UD_Kyrgyz-KTMU (in English)

3. Çöltekin, Ç., Doğruöz, A., Çetinoğlu, Ö. (2022) Resources for Turkish natural language processing: A critical survey. Language Resources and Evaluation. (in English)

4. Dzhumalieva, G.K., Kasieva, A.A., Musazhanova, S.J. (2023) Adaptacija terminov vjeb-projekta universal`nye zavisimosti na kyrgyzskij jazyk. Bulleten` of KRSU. 23(6): 71-75. [Dzhumalieva, G.K., Kasieva, A.A., Musazhanova, S.J. (2023) Adaptation of Web Project Terms for Universal Dependencies in the Kyrgyz Language. Bulletin of KRSU. 23(6): 71- 75]. http://doi.org/10.36979/1694-500X-2023-23-6-71-75 (in Russian)

5. Kasieva, A., Knappen, J., Fischer, S., Teich, E. (2020) A new Kyrgyz corpus: sampling, compilation, annotation. Poster presented at: 42. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft. Hamburg, Germany. (in English)

6. Kasieva, A., Dzhumalieva, G., Thompson, A., Jumashev, M., Chontaeva, B., Washington, J. (2023) Issues of Kyrgyz syntactic annotation within the Universal Dependencies framework. In Proceedings of the XI International Conference on Computer Processing of Turkic Languages (TurkLang 2023). (in English)

7. Kornai, A. (2013) Digital Language Death. PLoS ONE 8(10): e77056. https://doi.org/10.1371/journal.pone.0077056 (in English)

8. Makazhanov, A., Sultangazina, A., Makhambetov, O., Yessenbayev, Z. (2015) Syntactic Annotation of Kazakh: Following the Universal Dependencies Guidelines. A report. In Proceedings of the 3rd International Conference on Computer Processing in Turkic Languages (TurkLang 2015). P. 338-350. (in English)

9. Merzhevich, T., Ferraz Gerardi, F. (2022) Introducing YakuToolkit. Yakut treebank and morphological analyzer. In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. P. 185-188. (in English)

10. Musazhanova, S.J., Kasieva, A.A., Dzhumalieva, G.K. (2023) Sintaksicheskaja annotacija kyrgyzskogo jazyka na osnove novossozdannogo korpusa. Vestnik Issyk-Kul`skogo universiteta. 54: 140-148. [Musazhanova, S.J., Kasieva, A.A., Dzhumalieva, G.K. (2023) Syntactic Annotation of the Newly-Created Kyrgyz Corpus. Bulletin of the Issyk-Kul University. 54: 140-148.] (in Russian)

11. Nivre, J., de Marneffe, M.C., Ginter, F., Goldberg, Y., Hajič, J., Manning, C.D., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R., Zeman, D. (2016) Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of LREC. P. 1659-1666. (in English)

12. Sulubacak, U., Gokirmak, M., Tyers, F., Çöltekin, Ç., Nivre, J., Eryiğit, G. (2016) Universal Dependencies for Turkish. In Proceedings of COLING. The 26th International Conference on Computational Linguistics: Technical Papers. P. 3444-3454. (in English)

13. Taguchi, C. (2022) UD Tatar-NMCTT: Universal Dependency corpus for Tatar. GitHub repository: https://github.com/UniversalDependencies/UD_Tatar-NMCTT. (in English)

14. Thompson, A. (2021) Syntactic Parallelism and Structure in Kyrgyz Proverbs. Bachelor's thesis. Bryn Mawr College, Pennsylvania. (in English)

15. Tyers, F., Washington, J. (2015) Towards a free/open-source universal-dependency treebank for Kazakh. In Proceedings of the 3rd International Conference on Computer Processing in Turkic Languages (TurkLang 2015). P. 276-289. (in English)

16. Tyers, F., Washington, J., Çöltekin, Ç., Makazhanov, A. (2017) An assessment of Universal Dependency annotation guidelines for Turkic languages. In Proceedings of the Fifth International Conference on Turkic Language Processing (TurkLang). P. 276-297. (in English)

17. Tyers, F., Sheyanova, M., Washington, J. (2018) UD Annotatrix: An annotation tool for Universal Dependencies. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories (TLT). P. 10-17. (in English)

18. Washington, J.N., Ipasov, M., Tyers, F.M. (2012) A finite-state morphological transducer for Kyrgyz. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). P. 934-940. (in English)

19. Washington, J., Tyers, F., Salimzianov, I. (2022) Non-finite verb forms in Turkic exhibit syncretism, not multifunctionality. Folia Linguistica 56(3): 693-742. https://doi.org/10.1515/flin-2022-2045 (in English)

20. Washington, J., Çöltekin, Ç., Akkurt, F., Chontayeva, B. Eslami, S., Dzhumaliyeva, G., Kasiyeva, A., Kuzgun, A., Marşan, B., Taguchi, C. (2023) Strategies for the Annotation of Pronominalised Locatives in Turkic Universal Dependency Treebanks. ArXiv preprint. (in English)


Review

For citations:


Ryspakova M., Tursunova A. “Universal Dependencies” for Syntactic Analysis of the Kyrgyz Language: Current State and Prospects. Tiltanym. 2025;(2):153-167. https://doi.org/10.55491/2411-6076-2025-2-153-167

Views: 72


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2411-6076 (Print)
ISSN 2709-135X (Online)