One of our recent projects is supported by the Russian Science Foundation.
Apart from several institutes of KFU, employees of the University of Oslo and Izhevsk State Technical University partake in the grant titled “Distributive-quantitative analysis of semantic changes based on large diachronic corpuses.”
For the first time, scientists will describe how, from the sixteenth century to the present, the meanings of hundreds of thousands of words in Russian and English have changed. Based on the mathematical models they created for changing the meanings of words, a general theory of the evolution of the language lexicon will be developed.
“The meaning of words changes over time. Such changes are not always striking. A simple example. One hundred years ago, the word "satellite" had the meaning of "traveling companion." In the middle of the 20th century, this word began to be used in the context of "artificial satellite of the Earth", and after a while - in the meaning of "communication satellite". One hundred years ago, this phrase would have seemed simply absurd," says Valery Solovyov, Professor, Chief Research Associate, and a co-participant of the project.
“Researchers now have gigantic collections of texts. For example, the Google Books collection contains texts with a total volume of more than 67 billion words in Russian alone, and in English - more than 500 billion. For each word that interests us, one can almost instantly find all its uses - thousands and millions of examples. This opportunity will be used in our study. For the first time, we will identify and describe the evolution of a large number of words in the Russian language. The study concerns both individual words and the entire vocabulary of the language as a whole. Many interesting questions still remain unexplored. Over time, does the meaning of words change faster or slower? Are synonyms converging or diverging in meaning? Within the framework of the project, we expect to receive answers to these and many other questions, to build a general theory of the evolution of the language lexicon, approaching the level of rigor to natural science theories,” continues Solovyov, an experienced specialist in mathematical and computer linguistics.
Project head, Professor Oleg Zholobov (Institute of Philology and Intercultural Communication) emphasizes that semantics is a rather labor-intensive research area and thus less developed than phonetics or grammar.
“New opportunities for studying semantic changes have appeared in recent years due to the creation of large and extra-large diachronic corps covering texts at time intervals of 100 years or more, as well as to the ever-increasing volumes of texts on social networks. The use of computer methods (including artificial intelligence methods) allows not only to detect changes in the meanings of specific words, but also to quantitatively study the dynamics of semantic changes and discover their new patterns,” comments Zholobov.
Recently, the processes of lexical changes have accelerated due to intensive communication in social networks. The new meanings of the words that researchers will identify can replenish lexicographic resources, such as WordNet thesauri, RuWordNet and new generation dictionaries, used in a variety of NLP applications, improving the quality of search and understanding of questions in recommendation systems.
The research will include materials in Russian and English languages based on the main accessible large diachronic corpuses: Google Books Ngram, the National Corpus of the Russian language, the Manuscript Corpus of Ancient and Medieval Slavic and Russian texts, Kazan electronic collection of written heritage of the 12th – 14th centuries, General Internet Corpus of the Russian language, including data from VKontakte and other social networks, as well as the Corpus of Historical American Language.
Source text: Larisa Busil
Translation: Yury Nurmeev