Preview

Technology of Distinguishing Homonyms Using Probabilistic and Statistical Methods in Compiling a Frequency Dictionary of Texts of the National Corpus of the Kazakh Language

https://doi.org/10.55491/2411-6076-2025-2-183-193

Abstract

The article presents a comprehensive overview of a probabilistic-statistical method for distinguishing homonyms in the process of compiling a frequency dictionary based on the Kazakh National Corpus. The research focuses on words that share identical written forms but differ in meaning and grammatical function. The agglutinative structure of the Kazakh language introduces significant challenges in the accurate recognition and classification of homonyms. To address this, an automatic morphological analyzer was employed to lemmatize word forms and assign appropriate parts of speech. In addition, a specialized probabilistic distribution table was developed to disambiguate homonyms based on contextual analysis and frequency estimation. The article details the stages of constructing the frequency dictionary, the applied algorithms, and the stylistic diversity of the corpus texts. Comparative examples demonstrate the frequency of homonyms across different grammatical categories, enabling the creation of semantic usage models supported by statistical data. The integration of frequency indicators with contextual analysis contributes to more precise interpretation of lexical units. The proposed technology enhances the accuracy and reliability of frequency dictionaries, facilitates the development of effective Kazakh language learning materials, and provides a foundation for applications in AI-based linguistic systems. Moreover, the approach holds promise for linguostatistics, semantic modeling, and the integration of corpus resources into digital education. The results of the study offer an innovative methodological framework for corpus linguistics, applied linguistics, and digital lexicography in the context of the Kazakh language.

About the Authors

Ye. Bessirov
Al-Farabi Kazakh National University
Kazakhstan

Yerkin Bessirov, Doctoral student

Almaty



A. Zhanabekova
A. Baitursynuly Institute of Linguistics
Kazakhstan

Aiman Zhanabekova, Doctor of Philological Sciences

Almaty



References

1. Abai shygarmalarynyng akademijalyq tolyq zhinagy. Tom 1, 2, 3. Almaty: Zhazushy, 2020. 604, 524, 488 bb. [Full academic collection of Abai's works. Volume 1, 2, 3. Almaty: Zhazushy, 2020. 604, 524, 488 pp.] (in Kazakh)

2. Abai tіlі sozdіgі. (1968) Almaty, 734 b. [Abai language dictionary. (1968) Almaty, 734 p.] (in Kazakh)

3. Ahanov, K. (1962) Til bilimine kirispe. Almaty: Qazaqtyng meml. oqu-pedagogika baspasy, 299 b. [Akhanov, K. (1962) Introduction to Linguistics. Almaty, 299 p.] (in Kazakh)

4. Ahmanova, O.S. (1957) Ocherki po obshchej i russkoj leksikologii. Moskva: Gos. uchebno-ped. izd., 294 s. [Akhmanova, O.S. (1957) Essays on general and Russian lexicology. Moscow: State Educational and Pedagogical Publishing House, 294 p.] (in Russian)

5. Anoshkina, N.G. (2000) Omonimija v russkom jazyke. Omsk: Izd-vo OmGPU, 118 s. [Anoshkina, N.G. (2000) Homonymy in the Russian language. Omsk: Publishing house of OmGPU, 118 p.] (in Russian)

6. Bektaev, Q.B., Belbotaev, A., Moldabekov, Q. (1973) G. Musіrepovtіng "Kezdespei ketken bіr beine" povesі tіlіnіng alfavittі-zhiilіk sozdіgі. Qazaq teksіnіng statistikasy: «Statistika-lingvistikalyq zertteu men avtomattandyru» tobynyng III shyguy. Almaty. B. 519-542. [Bektayev, K.B., Belbotayev, A., Moldabekov, K. (1973) Alphabetical-frequency dictionary of the language of G.Musrepov's story "An Unmet Image". Statistics of Kazakh texts: Works of the group "Statistical-linguistic research and automation". Issue III. Almaty. P. 519-542.] (in Kazakh)

7. Bektaev, Q.B., Zhubanov, A.Q., Myrzabekov, S., Belbotaev, A.B. (1995) M.O. Auezovtіng 20 tomdyq shygarmalar tekstеrining zhiilіk sozdіkterі. Almaty-Turkіstan. 346 b. [Bektayev, Q.B., Zhubanov, A.K., Myrzabekov, S., Belbotayev, A.B. (1995) Frequency dictionaries of the 20-volume works of M.O.Auezov. Almaty-Turkistan. 346 p.] (in Kazakh)

8. Belbaeva, M. (1988) Qazaq tilining omonimder sozdigi. Almaty: Mektep, 192 b. [Belbayeva, M. (1988) Dictionary of homonyms of the Kazakh language. Almaty: Mektep, 192 p.] (in Kazakh) Google Docs (n.d.) URL: https://docs.google.com/spreadsheets/d/1ZtdK2xCBvXTXVp1ImiaKLCVhzZ4mjJ_OSyuxrQGdtA/edit?gid=0#gid=0 [Accessed: 10 June 2025]. (online resource)

9. Kengesbaev, I. (1975) Qazirgi qazaq tili: leksika, fonetika. 2-basylymy. Almaty: Mektep, 304 p. [Kenesbayev, I. (1975) Modern Kazakh language: lexicon, phonetics. 2nd edition. Almaty: Mektep, 304 p.] (in Kazakh)

10. Kim, O.M. (1983) Omonimija na urovne chastej rechi na sovremennom anglijskom jazyke: materialy k speckursu. Tashkent: Tashkentskij gosudarstvennyj univ. imeni V. I. Lenina, 68 s. [Kim, O.M. (1983) Homonymy at the level of parts of speech in modern English: materials for the special course. Tashkent: Tashkent State Univ. named after V.I. Lenin, 68 p.] (in Russian)

11. Kobzareva, T.Yu., Afanas'ev, R.N. (2002) Universal'nyj modul' predsintaksicheskogo analiza omonimii chastej rechi v russkom jazyke na osnove slovarja diagnosticheskih situacij. Komp'juternaja lingvistika i intellektual'nye tehnologii. Trudy mezhdunarodnogo seminara «Dialog-2002». T. 2. Protivino. S. 258-268 [Kobzareva, T.Yu., Afanasyev, R.N. (2002) Universal module of pre-syntactic analysis of parts of speech homonymy in Russian based on the dictionary of diagnostic situations. Computational Linguistics and Intellectual Technologies. Proceedings of the international seminar «Dialogue-2002». V. 2. Protvino. P. 258-268.] (in Russian)

12. Kolesnikov, N.P. (1980) Slovar' omonimov russkogo jazyka. Moskva: Russkij jazyk, 302 s. [Kolesnikov, N.P. (1980) Dictionary of homonyms of the Russian language. Moscow: Russian Language, 302 p.] (in Russian)

13. Musabaev, G.G. (2014) Qazaq til bilimining maseleleri. Almaty: Abzal-Ai, 640 p. ISBN 978-601-7172-39-8. [Musabayev, G.G. (2014) Issues of Kazakh linguistics. Almaty: Abzal-Ai, 640 p.] (in Kazakh)

14. Wikipedia (n.d.) URL: https://kk.wikipedia.org/ [Accessed: 10 June 2025]. (online resource)

15. Zelenkov, Yu.G., Segalovich, I.V., Titov, V.A. (2005) Veroijatnostnaja model' snijatija morfologicheskoj omonimii na osnove normalizujushchih podstanovok i pozicij sosednih slov. Komp'juternaja lingvistika i intellektual'nye tehnologii. Trudy mezhdunarodnogo seminara «Dialog-2005». Moskva: Nauka, 616 s. [Zelenkov, Yu.G., Segalovich, I.V., Titov, V.A. (2005) Probabilistic model for resolving morphological homonymy based on normalizing substitutions and positions of neighboring words. Computational Linguistics and Intellectual Technologies. Proceedings of the international seminar “Dialogue-2005”. Moscow: Nauka, 616 p.] (in Russian)

16. Zhalpy bіlіm berudegі qazaq tіlіnіng zhiilіk sozdіgі. (2016) Almaty: Dauіr, 1472 b. [Frequency dictionary of the Kazakh language in general education. (2016) Almaty: Dauir, 1472 p.] (in Kazakh)

17. Zhubanov, A., Zhangabekova, A., Karbozova, B., Qozhahmetova, A. (2016) Qazaq tіlіnіng zhiilіk sozdіgі. Almaty: Qazaq tіlі baspasy. 665 b. [Zhubanov, A., Zhanabekova, A., Karbozova, B., Kozhakhmetova, A. (2016) Frequency dictionary of the Kazakh language. Almaty: Kazakh Language Publishing House, 665 p.] (in Kazakh)


Review

For citations:


Bessirov Ye., Zhanabekova A. Technology of Distinguishing Homonyms Using Probabilistic and Statistical Methods in Compiling a Frequency Dictionary of Texts of the National Corpus of the Kazakh Language. Tiltanym. 2025;(2):183-193. (In Kazakh) https://doi.org/10.55491/2411-6076-2025-2-183-193

Views: 1


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2411-6076 (Print)
ISSN 2709-135X (Online)