Form of presentation | Articles in international journals and collections |
Year of publication | 2023 |
Язык | английский |
|
Nevzorova Olga Avenirovna, author
|
|
Gizatullin Bulat Timurovich, author
|
Bibliographic description in the original language |
O. A. Nevzorova, and B. T. Gizatullin Analysis of the cluster structure of collections of mathematical papers with different UDC codes // Lobachevskii Journal of Mathematics, 2022, Vol. 43, No. 12, pp. 3597–3604. |
Annotation |
Lobachevskii Journal of Mathematics |
Keywords |
clustering, universal decimal classification, UDC code, mathematical paper |
The name of the journal |
Lobachevskii Journal of Mathematics
|
URL |
https://link.springer.com/article/10.1134/S1995080222150239#citeas |
Please use this ID to quote from or refer to the card |
https://repository.kpfu.ru/eng/?p_id=282670&p_lang=2 |
Full metadata record |
Field DC |
Value |
Language |
dc.contributor.author |
Nevzorova Olga Avenirovna |
ru_RU |
dc.contributor.author |
Gizatullin Bulat Timurovich |
ru_RU |
dc.date.accessioned |
2023-01-01T00:00:00Z |
ru_RU |
dc.date.available |
2023-01-01T00:00:00Z |
ru_RU |
dc.date.issued |
2023 |
ru_RU |
dc.identifier.citation |
O. A. Nevzorova, and B. T. Gizatullin Analysis of the cluster structure of collections of mathematical papers with different UDC codes // Lobachevskii Journal of Mathematics, 2022, Vol. 43, No. 12, pp. 3597–3604. |
ru_RU |
dc.identifier.uri |
https://repository.kpfu.ru/eng/?p_id=282670&p_lang=2 |
ru_RU |
dc.description.abstract |
Lobachevskii Journal of Mathematics |
ru_RU |
dc.description.abstract |
Clustering is the task of dividing data objects into groups of similar objects. The influence of the specifics of the texts of scientific articles of one subject area for clustering problems has been little studied at present. This article is devoted to the problem of clustering collections of mathematical papers that have the different Universal Decimal Classification (UDC) codes. The study was carried out on the collection of mathematical papers published in the «Izvestiya VUZov. Matematika” journal for 10 years. The size of this collection is about 1000 original papers with different UDC codes.
The collection contains subcollections of papers that have the same UDC code. The objective of our research is to analyze the cluster structure of repre-sentative subcollections of papers with the same UDC code, which will allow us to evaluate various parameters of the constructed clusters in the future.
We have performed the standard pre-processing (tokenization, lemmatiza-tion) of Russian math texts. The following text vectorization methods were in-vestigated: tf-idf, doc2vec trained on the original data, and a pretrained word2vec model. All vectors were normalized using the L2 norm.
To identify optimal hyperparameters and check the quality of clustering, we have used internal efficiency measures such as Silhouette coefficient, Calinski-Harabaz index. We also used the elbow method for hyperparameters tuning.The following clustering algorithms have been investigated: k-means, agglomerative clustering, affinity propagation, DBSCAN, spectral clustering. The optimal hyperparameters were selected for each method, and then the results of clustering were compared. As a result we have selected the optimal methods of clustering math papers.
|
ru_RU |
dc.language.iso |
ru |
ru_RU |
dc.subject |
clustering |
ru_RU |
dc.subject |
universal decimal classification |
ru_RU |
dc.subject |
UDC code |
ru_RU |
dc.subject |
mathematical paper |
ru_RU |
dc.title |
Analysis of the cluster structure of collections of mathematical papers with different UDC codes |
ru_RU |
dc.type |
Articles in international journals and collections |
ru_RU |
|