A.P. Kuleshova, A.V. Bernsteina,b, Yu.A. Yanovicha,b,c
aSkolkovo Institute of Science and Technology, Moscow, 143026 Russia
bKharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, 127051 Russia
cNational Research University Higher School of Economics, Moscow, 101000 Russia
Abstract
The problem of unknown high-dimensional density estimation has been considered. It has been suggested that the support of its measure is a low-dimensional data manifold. This problem arises in many data mining tasks. The paper proposes a new geometrically motivated solution to the problem in the framework of manifold learning, including estimation of an unknown support of the density.
Firstly, the problem of tangent bundle manifold learning has been solved, which resulted in the transformation of high-dimensional data into their low-dimensional features and estimation of the Riemann tensor on the data manifold. Following that, an unknown density of the constructed features has been estimated with the use of the appropriate kernel approach. Finally, using the estimated Riemann tensor, the final estimator of the initial density has been constructed.
Keywords: dimensionality reduction, manifold learning, manifold valued data, density estimation on manifold
Acknowledgements. The study by A.V. Bernstein and Yu.A. Yanovich was supported by the Russian Science Foundation (project no. 14-50-00150).
References
1. Seung H.S. Cognition: The manifold ways of perception. Science, 2000, vol. 290, no. 5500, pp. 2268–2269. doi: 10.1126/science.290.5500.2268.
2. Huo X., Ni X.S., Smith A.K. A survey of manifold-based learning methods. In: Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications. Singapore, World Sci., 2008, pp. 691–745. doi: 10.1142/9789812779861_0015.
3. Ma Y., Fu Y. Manifold Learning Theory and Applications. London, CRC Press, 2011. 314 p.
4. Müller E., Assent I., Krieger R., Gnnemann S., Seidl T. DensEst: Density estimation for data mining in high dimensional spaces. Proc. 2009 SIAM Int. Conf. on Data Mining. Philadelphia, Soc. Ind. Appl. Math., 2009. pp. 175–186, doi: 10.1137/1.9781611972795.16.
5. Kriegel H.P., Kroger P., Renz M., Wurst S. A generic framework for efficient subspace clustering of high-dimensional data. Proc. 5th IEEE Int. Conf. on Data Mining (ICDM'05). Houston, TX, IEEE, 2005, pp. 250–257. doi: 10.1109/ICDM.2005.5.
6. Zhu F., Yan X., Han J., Yu P.S., Cheng H. Mining colossal frequent patterns by core pattern fusion. Proc. 23rd IEEE Int. Conf. on Data Engineering. Istanbul, IEEE, 2007, pp. 706–715. doi: 10.1109/ICDE.2007.367916.
7. Bradley P., Fayyad U., Reina C. Scaling clustering algorithms to large databases. KDD-98 Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining. New York, Am. Assoc. Artif. Intell., 1998, pp. 9–15.
8. Weber R., Schek H.J., Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. Proc. 24th VLDB Conf.. New York, 1998, pp. 194–205.
9. Domeniconi C., Gunopulos D. An efficient density-based approach for data mining tasks. Knowl. Inf. Syst., 2004, vol. 6, no. 6, pp. 750–770. doi: 10.1007/s10115-003-0131-8.
10. Bennett K.P., Fayyad U., Geiger D. Density-based indexing for approximate nearest-neighbor queries. KDD-99 Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. New York, 1999, pp. 233–243. doi: 10.1145/312129.312236.
11. Scott D.W. Multivariate density estimation and visualization. In: Gentle J.E., Hrdle W.K., Mori Yu. (Eds.) Handbook of Computational Statistics. Berlin, Heidelberg, Springer, 2012, pp. 549–569. doi: 10.1007/978-3-642-21551-3.
12. Niyogi P., Smale S., Weinberger S. Finding the homology of submanifolds with high confidence from random samples. Discrete Comput. Geom., 2008, vol. 39, nos. 1–3, pp. 419–441. doi: 10.1007/s00454-008-9053-2.
13. Jost J. Riemannian Geometry and Geometric Analysis. Berlin, Heidelberg, Springer, 2005. xiii, 566 p. doi: 10.1007/3-540-28891-0.
14. Lee J. Manifolds and Differential Geometry. Vol. 107: Graduate Studies in Mathematics. Am. Math. Soc., 2009. 671 p.
15. Pennec X. Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements. Int. Workshop on Nonlenear Signal and Image Processing (NSIP-99). Antalya, 1999, pp. 194–198.
16. Pelletier B. Kernel density estimation on Riemannian manifolds. Stat. Probab. Lett., 2005, vol. 73, no. 3, pp. 297–304. doi: 10.1016/j.spl.2005.04.004.
17. Guillermo H., Munoz A., Rodriguez D. Locally adaptive density estimation on Riemannian manifolds. Sort: Stat. Oper. Res. Trans., 2013, vol. 37, no. 2, pp. 111–130.
18. Freedman D. Efficient simplicial reconstructions of manifolds from their samples. IEEE Trans. Pattern Anal. Mach. Intell., 2002, vol. 24, no. 10, pp. 1349–1357. doi: 10.1109/TPAMI.2002.1039206.
19. Kramer M.A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J., 1991, vol. 37, no. 2, pp. 233–243. doi: 10.1002/aic.690370209.
20. Dinh L., Sohl-Dickstein J., Bengio S. Density estimation using Real NVP. arXiv:1605.08803, 2016, pp. 1–32.
21. Zhang Z., Zha H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput., 2004, vol. 26, no. 1, pp. 313–338. doi: 10.1137/S1064827502419154.
22. Bengio Y., Paiement J.-F., Vincent P. Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering. Proc. 16th Int. Conf. on Neural Information Processing Systems, 2003, pp. 177–184. doi: 10.1.1.5.1709.
23. Zhang P., Qiao H., Zhang B. An improved local tangent space alignment method for manifold learning. Pattern Recognit. Lett., 2011, vol. 32, no. 2, pp. 181–189. doi: 10.1016/j.patrec.2010.10.005.
24. Bernstein A.V., Kuleshov A.P. Tangent bundle manifold learning via Grassmann & Stiefel eigenmaps. arXiv:1212.6031, 2012, pp. 1–25.
25. Bernstein A., Kuleshov A.P. Manifold Learning: Generalization ability and tangent proximity. Int. J. Software Inf., 2013, vol. 7, no. 3, pp. 359–390.
26. Genovese C.R., Perone-Pacifico M., Verdinelli I., Wasserman L. Minimax manifold estimation. J. Mach. Learn. Res., 2012, vol. 13, pp. 1263–1291.
27. Yanovich Yu. Asymptotic properties of local sampling on manifold. J. Math. Stat., 2016, vol. 12, no. 3, pp. 157–175. doi: 10.3844/jmssp.2016.157.175.
28. Yanovich Yu. Asymptotic properties of nonparametric estimation on manifold. Proc. 6th Workshop on Conformal and Probabilistic Prediction and Applications, 2017, vol. 60, pp. 18–38.
29. Rozza A., Lombardi G., Rosa M., Casiraghi E., Campadelli P. IDEA: Intrinsic dimension estimation algorithm. Proc. Int. Conf. ``Image Analysis and Processing (ICIAP 2011)''. Berlin, Heidelberg, Springer, 2011, pp. 433–442. doi: 10.1007/978-3-642-24085-0_45.
30. Campadelli P., Casiraghi E., Ceruti C., Rozza A. Intrinsic dimension estimation: Relevant techniques and a benchmark framework. Math. Probl. Eng., 2015, vol. 2015, art. 759567, pp. 1–21. doi: 10.1155/2015/759567.
31. Rosenblatt M. Remarks on some nonparametric estimates of a density function. Ann. Math. Stat., 1956, vol. 27, no. 3, pp. 832–837.
32. Parzen E. On estimation of a probability density function and mode. Ann. Math. Stat., 1962, vol. 33, no. 3, pp. 1065–1076.
33. Wagner T.J. Nonparametric estimates of probability densities. IEEE Trans. Inf. Theory, 1975, vol. 21, no. 4, pp. 438–440.
34. Henry G., Rodriguez D. Kernel density estimation on Riemannian manifolds: Asymptotic results. J. Math. Imaging Vis., 2009, vol. 34, no. 3, pp. 235–239. doi: 10.1007/s10851-009-0145-2.
35. Hendriks H. Nonparametric estimation of a probability density on a Riemannian manifold using Fourier expansions. Ann. Stat., 1990, vol. 18, no. 2, pp. 832–849.
36. Ozakin A., Gray A. Submanifold density estimation. Proc. Conf. ``Neural Information Processing Systems''(NIPS 2009), 2009, pp. 1–8.
37. Kuleshov A., Bernstein A., Yanovich Yu. High-dimensional density estimation for data mining tasks. Proc. 2017 IEEE Int. Conf. on Data Mining (ICDMW). New Orleans, LA, IEEE, 2017, pp. 523–530. doi: 10.1109/ICDMW.2017.74.
38. Bernstein A., Kuleshov A., Yanovich Yu. Asymptotically optimal method for manifold estimation problem. Proc. XXIX Eur. Meet. of Statisticians. Budapest, 2013, pp. 8–9.
39. Kuleshov A., Bernstein A. Manifold learning in data mining tasks. Proc. MLDM 2014: Machine Learning and Data Mining in Pattern Recognition, 2014, pp. 119–133. doi: 10.1007/978-3-319-08979-9_10.
40. Bernstein A., Kuleshov A., Yanovich Yu. Information preserving and locally isometric & conformal embedding via Tangent Manifold Learning. Proc. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA). Paris, IEEE, 2015, pp. 1–9. doi: 10.1109/DSAA.2015.7344815.
41. Xiong Y., Chen W., Apley D., Ding X. A non-stationary covariance-based Kriging method for metamodelling in engineering design. Int. J. Numer. Methods Eng., 2007, vol. 71, no. 6, pp. 733–756. doi: 10.1002/nme.1969.
Received
October 17, 2017
For citation: Kuleshov A.P., Bernstein A.V., Yanovich Yu.A. Manifold learning based on kernel density estimation. Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki, 2018, vol. 160, no. 2, pp. 327–338.
The content is available under the license Creative Commons Attribution 4.0 License.