Applying molecular similarity used for evaluating the accuracy of retention index predictions in gas chromatography using deep learning

Capa

Citar

Texto integral

Acesso aberto Acesso aberto
Acesso é fechado Acesso está concedido
Acesso é fechado Somente assinantes

Resumo

When predicting retention indices using deep learning, there is usually no way to assess the reliability of the prediction for a particular molecule. In this work, using stationary phases based on polyethylene glycol and the NIST 17 database as an example, it is shown that, on average, the closer the molecule in the training data set is to the compound being predicted, the more accurate the prediction. Tanimoto similarity of “molecular fingerprints” ECFP is the most appropriate molecular similarity calculation algorithm for this problem among the four considered. It is shown that for a number of transformation products of unsymmetrical dimethylhydrazine, whose structure was established using this prediction, it could be very unreliable.

Texto integral

Acesso é fechado

Sobre autores

D. Matyushin

A. N. Frumkin Institute of Physical Chemistry and Electrochemistry of the Russian Academy of Sciences

Email: shonastya@yandex.ru
Rússia, Moscow, 119071

A. Sholokhova

A. N. Frumkin Institute of Physical Chemistry and Electrochemistry of the Russian Academy of Sciences

Autor responsável pela correspondência
Email: shonastya@yandex.ru
Rússia, Moscow, 119071

M. Khrisanfov

A. N. Frumkin Institute of Physical Chemistry and Electrochemistry of the Russian Academy of Sciences; M. V. Lomonosov Moscow State University

Email: shonastya@yandex.ru
Rússia, Moscow, 119071; Moscow, 119991

S. Borovikova

A. N. Frumkin Institute of Physical Chemistry and Electrochemistry of the Russian Academy of Sciences

Email: shonastya@yandex.ru
Rússia, Moscow, 119071

Bibliografia

  1. Tarján G., Nyiredy S., Györ M. et al. // J. of Chromatography A. 1989. V. 472. P. 1. https://doi.org/10.1016/S0021-9673(00)94099-8
  2. Franke J.-P., Wijsbeek J., De Zeeuw R.A. // J. of Forensic Sciences. 1990. V. 35. № 4. P. 813. https://doi.org/10.1520/JFS12893J
  3. Zellner B.A., Bicchi C., Dugo P. et al. // Flavour and Fragrance J. 2008. V. 23. № 5. P. 297–314. https://doi.org/10.1002/ffj.1887
  4. Milman B.L., Zhurkovich I.K. // TrAC Trends in Analytical Chemistry. 2016. V. 80. P. 636–640. https://doi.org/10.1016/j.trac.2016.04.024
  5. Vinaixa M., Schymanski E.L., Neumann S. et al. // TrAC Trends in Analytical Chemistry. 2016. V. 78. P. 23. https://doi.org/10.1016/j.trac.2015.09.005
  6. Matyushin D.D., Sholokhova A.Yu., Karnaeva A.E. et al. // Chemometrics and Intelligent Laboratory Systems. 2020. V. 202. P. 104042. https://doi.org/10.1016/j.chemolab.2020.104042
  7. Schymanski E.L., Meringer M., Brack W. // Analytical Chemistry. 2011. V. 83. № 3. P. 903. https://doi.org/10.1021/ac102574h
  8. Dossin E., Martin E., Diana P. et al. // Analytical Chemistry. 2016. V. 88. № 15. P. 7539–7547. https://doi.org/10.1021/acs.analchem.6b00868
  9. Sholokhova A.Yu., Matyushin D.D., Grinevich O.I. et al. // Molecules. 2023. V. 28. № 8. P. 3409. https://doi.org/10.3390/molecules28083409
  10. Su Q.-Z., Vera P., Salafranca J. et al. // Resources, Conservation and Recycling. 2021. V. 171. P. 105640. https://doi.org/10.1016/j.resconrec.2021.105640
  11. Su Q.-Z., Vera P., Nerín C. et al. // Resources, Conservation and Recycling. 2021. V. 167. P. 105365. https://doi.org/10.1016/j.resconrec.2020.105365
  12. Sholokhova A.Yu., Grinevich O.I., Matyushin D.D. et al. // Chemosphere. 2022. V. 307. P. 135764. https://doi.org/10.1016/j.chemosphere.2022.135764
  13. Matyushin D.D., Buryak A.K. // IEEE Access. 2020. V. 8. P. 223140. https://doi.org/10.1109/ACCESS.2020.3045047
  14. Debus B., Parastar H., Harrington P. et al. // TrAC Trends in Analytical Chemistry. 2021. V. 145. P. 116459. https://doi.org/10.1016/j.trac.2021.116459
  15. Dong S., Wang P., Abbas K. // Computer Science Review. 2021. V. 40. P. 100379. https://doi.org/10.1016/j.cosrev.2021.100379
  16. Matyushin D.D., Sholokhova A.Yu., Buryak A.K. // Intern. J. of Molecular Sciences. 2021. V. 22. № 17. P. 9194. https://doi.org/10.3390/ijms22179194
  17. Matyushin D.D., Sholokhova A.Yu., Buryak A.K. // J. of Chromatography A. 2019. V. 1607. P. 460395. https://doi.org/10.1016/j.chroma.2019.460395
  18. Anjum A., Liigand J., Milford R. et al. // Ibid. 2023. V. 1705. P. 464176. https://doi.org/10.1016/j.chroma.2023.464176
  19. Qu C., Schneider B.I., Kearsley A.J. et al. // Ibid. 2021. V. 1646. P. 462100. https://doi.org/10.1016/j.chroma.2021.462100
  20. Vrzal T., Malečková M., Olšovská J. // Analytica Chimica Acta. 2021. V. 1147. P. 64. https://doi.org/10.1016/j.aca.2020.12.043
  21. Geer L.Y., Stein S.E., Mallard W.G. et al. // J. of Chemical Information and Modeling. 2024. V. 64. № 3. P. 690–696. https://doi.org/10.1021/acs.jcim.3c01758
  22. Raymond J.W., Gardiner E.J., Willett P. // The Computer J. 2002. V. 45. № 6. P. 631–644. https://doi.org/10.1093/comjnl/45.6.631
  23. Bender A., Glen R.C. // Organic & Biomolecular Chemistry. 2004. V. 2. № 22. P. 3204. https://doi.org/10.1039/B409813G
  24. Morehouse N.J., Clark T.N., McMann E.J. et al. // Nature Communications. 2023. V. 14. № 1. P. 308. https://doi.org/10.1038/s41467-022-35734-z
  25. Rogers D., Hahn M. // J. of Chem. Inform. and Modeling. 2010. V. 50. № 5. P. 742. https://doi.org/10.1021/ci100050t
  26. Hoo Z.H., Candlish J., Teare D. // Emergency Medicine J. 2017. V. 34. № 6. P. 357. https://doi.org/10.1136/emermed-2017-206735
  27. Polo T.C.F., Miot H.A. // J. Vascular Brasileiro. 2020. V. 19. P. e20200186. https://doi.org/10.1590/1677-5449.200186
  28. Popov M.S., Ul’yanovskii N.V., Kosyakov D.S. // Microchemical J. 2024. V. 197. P. 109833. https://doi.org/10.1016/j.microc.2023.109833

Arquivos suplementares

Arquivos suplementares
Ação
1. JATS XML
2. Fig. 1. Distribution of the number of molecules N in the NIST 17 retention index database (polar stationary phases) according to Smax values (maximum molecular similarity value for all pairs including the molecule in question and molecules from the training set) for the four molecular similarity calculation methods. Dark gray indicates “poorly predicted molecules” (absolute prediction error greater than 100), light gray indicates the remaining molecules.

Baixar (115KB)
3. Fig. 2. Dependence of the total number of molecules N (solid circles and lines) and the fraction of “poorly predicted molecules” (absolute prediction error greater than 100) F (rectangles) on the value of Smax (maximum molecular similarity value for all pairs including the considered molecule and molecules from the training set).

Baixar (120KB)
4. Fig. 3. Distribution of the number of molecules N over the absolute prediction error for different values of Smax (maximum molecular similarity value for all pairs including the molecule under consideration and molecules from the training set) for the two molecular similarity calculation methods.

Baixar (200KB)
5. Fig. 4. ROC curves (specificity-sensitivity curves) for predicting whether a molecule is “poorly predicted” (absolute prediction error greater than 100) using different molecular similarity calculation algorithms. Curves for algorithms for which the area under the curve differs by no more than 0.02 are labeled with a single line type for readability.

Baixar (106KB)
6. Fig. 5. Structures of the transformation products of unsymmetrical dimethylhydrazine proposed in [9] and Smax values (molecular similarity value between the molecule under consideration and the closest molecule from the training set) for each of them. ECFP molecular similarity calculation method.

Baixar (265KB)

Declaração de direitos autorais © Russian Academy of Sciences, 2025