I.V. Konnov*, O.A. Kashina** , E.I. Gilmanova***
Kazan Federal University, Kazan, 420008 Russia
E-mail:*konn-igor@yandex.ru,**olga.kashina@mail.ru, ***elgilm21@gmail.com
Received October 17, 2018
DOI: 10.26907/2541-7746.2019.3.423-437
For citation: Konnov I.V., Kashina O.A., Gilmanova E.I. Solution of clusterization problem by graph optimization methods. Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, 2019, vol. 161, no. 3, pp. 423–437. doi: 10.26907/2541-7746.2019.3.423-437. (In Russian)
Abstract
The rapid growth in the volume of processed information that takes place nowadays determines the urgency of the development of methods for reducing the dimension of computational problems. One of the approaches to reducing the dimensionality of data is their clustering, i.e., uniting into maximally homogeneous groups. At the same time, it is desirable that representatives of different clusters should be as much as possible unlike each other. Along with the dimension reduction, clustering procedures have an independent value. For example, we know the market segmentation problem in economics, the feature typologization problem in sociology, faces diagnostics in geology, etc.
Despite the large number of known clusterization methods, the development and study of new ones remain relevant. The reason is that there is no algorithm that would surpass all the rest by all criteria (speed, insensitivity to clusters’ size and shape, number of input parameters, etc.).
In this paper, we propose a clustering algorithm based on the notions of the graph theory (namely, the maximum flow (the minimum cut) theorem) and compare the results obtained by it and by four other algorithms that belong to various classes of clusterization techniques.
Keywords: clustering, maximal flow, minimal cut, Ford–Fulkerson theorem, labeling method, k-means, hierarchical clusterization, Ward’s procedure, DBSCAN method, MaxFlow algorithm
Acknowledgments. The research was funded by the subsidy allocated to Kazan Federal University for the state assignment in the sphere of scientific activities (project no. 1.12878.2018/12.1).
The work of the first two authors was supported by the Russian Foundation for Basic Research (project no. 16-01-00109a).
The work of the first author was fulfilled as a part of the state task of the Ministry of Science and Higher Education (task no. 1.460.2016/1.4).
References
1. Xu D., Tian Y. A comprehensive survey of clustering algorithms. Ann. Data Sci., 2015, vol. 2, no. 2, pp. 165–193. doi: 10.1007/s40745-015-0040-1.
2. Sharan R., Shamir R. CLICK: A clustering algorithm with applications to gene expression analysis. Proc. Int. Conf. Intell. Syst. Mol. Biol. AAAI Press, 2000, pp. 307–316.
3. Ford L.R., Fulkerson D.R. Flows in Networks. Princeton Univ. Press, 1962. XII, 194 p.
4. Ford L.R. Jr., Fulkerson D.R. Maximal flow through a network. Can. J. Math., 1956, vol. 8, pp. 399–404. doi: 10.4153/CJM-1956-045-5.
5. Dinitz Y. Dinitz' algorithm: The original version and Even's version. In: Goldreich O., Rosenberg A.L., Selman A.L. (Eds.) Theoretical Computer Science. Lecture Notes in Computer Science. Vol. 3895. Berlin, Heidelberg, Springer, 2006, pp. 218–240.
6. Edmonds J., Karp R.M. Theoretical improvements in algorithmic efficiency for network flow problems. J. Assoc. Comput. Mach., 1972, vol. 19, no. 2, pp. 248–264.
7. Adel'son-Vel'sky G.M, Dinitz E.A., Karzanov A.V. Potokovye algoritmy [Flow Algorithms]. Nauka, Moscow, 1975. 119 p. (In Russian)
8. Sivogolovko E. Methods of evaluating the quality of clear clustering. Komp'yut. Instrum. Obraz., 2011, no. 4, pp. 14–31. (In Russian)
9. Steinhaus H. Sur la division des corps materiels en parties. Bull. Acad. Polon. Sci., Cl. III, 1956, vol. IV, no. 12, pp. 801–804. (In French)
10. Lloyd S. Least square quantization in PCM. Trans. Inf. Theory, 1982, vol. IT-28, no. 2, pp. 129–137.
11. MacQueen J. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symp. on Mathematical Statistics and Probability, 1967, pp. 281–297.
12. Johnson S. Hierarchical clustering schemes. Psychometrika, 1967, vol. 32, no. 3, pp. 241–254. doi: 10.1007/BF02289588.
13. Ward J.H. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc., 1963, vol. 58, no. 301, pp. 236–244. doi: 10.1080/01621459.1963.10500845.
14. Ester M., Kriegel H.-P., Sander J., Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96). AAAI Press, 1996, pp. 226–231.
15. Franti P., Sieranoja S. Clustering Basic Benchmark. Available at: http://cs.joensuu.fi/ sipu/datasets/.
The content is available under the license Creative Commons Attribution 4.0 License.