A.I. Khayrullinaa*, T.I. Madzhidova**, R.I. Nugmanova***, V.A. Afoninaa****, I.I. Baskinb*****, A.A. Varneka,c******
aKazan Federal University, Kazan, 420008 Russia
bMoscow State University, Moscow, 119991, Russia
cUniversity of Strasbourg, Strasbourg, 67084 France
E-mail: *adelihajrullina@kpfu.ru, **Timur.Madzhidov@kpfu.ru, ***rainugmanov@kpfu.ru, ****ValAAfonina@kpfu.ru, *****igbaskin@gmail.com, ******varnek@unistra.fr
Received November 16, 2017
Full text PDF
Abstract
The key step in the computer analysis of chemical reactions is the determination of the correspondence between the atoms of reagents and products. This procedure is called atom-to-atom mapping (AAM). The presence of AAM is a key factor for establishing the mechanism and type of reaction, searching for similarities and substructures, modeling, checking the quality of data. A new approach has been proposed to the search for optimal atomic-atom mapping in chemical reactions based on the use of machine learning methods. The learning task is formulated as a classification: for each pair of the reagent-product atom, it is necessary to establish their assignment to the correct/incorrect mapping. We have used a simple naive Bayesian classifier. The approach described in this paper is the first example of a self-learning algorithm for AAM.
Keywords: atom-to-atom mapping, chemical reactions, machine learning, classification, chemoinformatics
Acknowledgments. The study was supported by the Russian Science Foundation (project no. 14-43-00024).
Figure Captions
Fig. 1. Atom-to-atom mapping in reactions. Numbers indicate the correspondence of atoms. The figure shows the correct AAM, with the broken C8–I7 bond and the created C8–N4 bond.
Fig. 2. Generation of pairs and corresponding values of AAM.
Fig. 3. Generation of fragment descriptors for CS6.
Fig. 4. Forming a bit string of hashed molecular fingerprints (X) based on fragment descriptors: a) CS6; b) CP6.
Fig. 5. Operations on bit strings of hashed molecular fingerprints.
Fig. 6. Table with the values of the probabilities of atom-atom mapping (pij,%), where i – row corresponding to product atoms, j – column corresponding to reagent atoms. Selected elements correspond to the pairs with the correct AAM.
Fig. 7. Dependence of the accuracy of predictions on the fragmentation length. Model quality when minimal fragment length is equal to one atom is shown by solid line, the one when minimal fragment length is equal to two atoms is shown by dotted line.
Fig. 8. Dependence of the accuracy of predictions on the length of the bit string.
Fig. 9. Comparison of the accuracy of predictions using different ways of atomic fingerprint combinations to form atomic pair bitstring (A – reagent atom fingerprint, B – product atom fingerprint).
Fig. 10. Reactions with correct AAM (a) and erroneously predicted AAM (b).
References
- Varnek A., Fourches D., Hoonakker F., Solov'ev V.P. Substructural fragments: An universal language to encode reactions, molecular and supramolecular structures. J. Comput.-Aided Mol. Des., 2005, vol. 19, nos. 9–10, pp. 693–703. doi: 10.1007/s10822-005-9008-0.
- Chen W.L., Chen D.Z., Taylor K.T. Automatic reaction mapping and reaction center detection. Wiley Interdiscip. Rev.: Comput. Mol. Sci., 2013, vol. 3, no. 6, pp. 560–593. doi: 10.1002/wcms.1140.
- Raymond J.W., Willett P. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput.-Aided Mol. Des., 2002, vol. 16, no. 7, pp. 521–533. doi: 10.1023/A:1021271615909.
- Madzhidov T.I., Baskin I.I., Varnek A.A. Khemoinformatika. Konspekt lektsii [Chemoinformatics. Lecture Notes]. Kazan, Izd. Kazan. Univ., 2014. 137 p. (In Russian)
- Madzhidov T.I., Nugmanov R.I., Gimadiev T.R., Lin A.I., Antipin I.S., Varnek A. Consensus approach to atom-to-atom mapping in chemical reactions. Butlerovskie Chteniya, 2015, vol. 44, no. 12, pp. 170–176. (In Russian)
- Lynch M.F., Willett P. The automatic detection of chemical reaction sites. J. Chem. Inf. Comput. Sci., 1978, vol. 18, no. 3, pp. 154–159.
- Vleduts G.E. Development of a combined WLN/CTR multilevel approach to the algorithmic analysis of chemical reactions in view of their automatic indexing. Report no 5399. London, Br. Libr. Res. Dev. Dep., 1977. 5399 p.
- McGregor J.J. Backtrack search algorithms and the maximal common subgraph problem. Software Pract. Exper., 1982, vol. 12, no. 1, pp. 23–34. doi: 10.1002/spe.4380120103.
- Funatsu K., Endo T., Kotera N., Sasaki S.I. Automatic recognition of reaction site in organic chemical reactions. Tetrahedron Comput. Methodol., 1988, vol. 1, no. 1, pp. 53–69. doi: 10.1016/0898-5529(88)90008-5.
- Jochum C., Gasteiger J., Ugi I. The principle of minimum chemical distance (PMCD). Angew. Chem., Int. Ed. Engl., 1980, vol. 19, no. 7, pp. 495–505. doi: 10.1002/anie.198004953.
- Akutsu T. Efficient extraction of mapping rules of atoms from enzymatic reaction data. J. Comput. Biol., 2004, vol. 11, nos. 2–3, pp. 449–462. doi: 10.1089/1066527041410337.
- Heinonen M., Lappalainen S., Mielikäinen T., Rousu J. Computing atom mappings for biochemical reactions without subgraph isomorphism. J. Comput. Biol., 2011, vol. 18, no. 1, pp. 43–58. doi: 10.1089/cmb.2009.0216.
- First E.L., Gounaris C.E., Floudas C.A. Stereochemically consistent reaction mapping and identification of multiple reaction mechanisms through integer linear optimization. J. Chem. Inf. Model., 2012, vol. 52, no. 1, pp. 84–92. doi: 10.1021/ci200351b.
- Mann M., Nahar F., Schnorr N., Backofen R., Stadler P., Flamm C. Atom mapping with constraint programming. Algorithms Mol. Biol., 2014, vol. 9, no. 23, pp. 1–12. doi: 10.1186/s13015-014-0023-3.
- Fontain E. The problem of atom-to-atom mapping. An application of genetic algorithms. Anal. Chim. Acta, 1992, vol. 265, no. 2, pp. 227–232. doi: 10.1016/0003-2670(92)85028-5.
- ICMAP. InfoChem GmbH. Available at: http://www.infochem.de/products/software/icmap.shtml.
- Moock T.E., Nourse J.G., Grier D., Hounshell W.D. The implementation of atom-atom mapping and related features in the reaction access system (REACCS). In: Chemical Structures: The International Language of Chemistry. Berlin, Heidelberg, Springer-Verlag, 1988, pp. 303–313.
- JChem Base. ChemAxon. Available at: https://www.chemaxon.com/products/jchem-base/.
- First E.L., Gounaris C.E., Floudas C.A. DREAM – Determination of Reaction Mechanisms. 2012. Available at: http://selene.princeton.edu/dream.
- Indigo Toolkit. EPAM. Life Sciences Open Source. Available at: http://epam.github.io/lifescience/ indigo/index.html.
- Domingos P., Pazzani M. On the optimality of the simple Bayesian classifier under zero-one loss. J. Mach. Learn., 1997, vol. 29, pp. 103–130. doi: 10.1023/A:1007413511361.
- Rish I. An empirical study of the naive Bayes classifier. Proc. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. Vol. 3. New York, 2001, pp. 41–46.
- Madzhidov T.I., Baskin I.I., Antipin I.S., Varnek A.A. Vvedenie v khemoinformatiku. Komp'uyternoe predstavlenie khimicheskikh struktur [Introduction to Chemoinformatics. Computer Representation of Chemical Structures]. Kazan, Izd. Kazan. Univ., 2013. 174 p. (In Russian)
- Kuhn H.W. Variants of the Hungarian method for assignment problems. J. Nav. Res. Logistics Q., 1956, vol. 3, no. 4, pp. 253–258. doi: 10.1002/nav.3800030404.
- Munkres J. Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math., 1957, vol. 5, no. 1, pp. 32–38. doi: 10.1137/0105003.
For citation: Khayrullina A.I., Madzhidov T.I., Nugmanov R.I., Afonina V.A., Baskin I.I., Varnek A.A. A new approach to atom-to-atom mapping using the naive Bayesian classifier. Uchenye Zapiski Kazanskogo Universiteta. Seriya Estestvennye Nauki, 2018, vol. 160, no. 2, pp. 200–213. (In Russian)
The content is available under the license Creative Commons Attribution 4.0 License.