Automated Prediction of the Endocrine Disruptive Potency of Chemicals detected with LC/ESI/HRMS based on Mass Spectral Networks

Supervisors: Kruve A.², Hawkes J.¹, Rebane R.³

Subject Specialist: Globisch D.¹ Examiner: Sjöberg P.¹

¹Uppsala University, ²Stockholm University, ³University of Tartu

Abstract

The widespread exposure to chemicals has raised concerns about their toxicity impact on public health and the environment. Identifying and quantifying these chemicals in complex samples is not always possible, making the assessment of their toxicities difficult.

In an effort to quickly screen chemicals for potential risks to human health, this study aims to predict toxicities based on tandem mass spectrometry MS² data. To achieve this goal, endocrine-disrupting activity data and other relevant human endpoints from the Tox21 Challenge were collected and combined with mass spectra from Mass Bank Europe. A k-nearest neighbors (k-NN) and a spectra network-based algorithm were implemented to predict the activity from MS² mass spectra. For k-NN, 5-fold cross-validation, the highest recall and precision were 47.1% and 44.4% (both for NR.AR), respectively. The implementation of a spectral similarity network enhanced the recall and precision to 81.8% and 75.0% (both for NR.AR), respectively. The spectral networks showed active clusters for the NR.AR, NR.ER, NR.AR.LBD, and NR.ER.LBD endpoints.

The approach was applied to retrospective analysis of MS² mass spectra of a wastewater sample, showing potential for toxicity alerts. The predictive capabilities of the model could further benefit from feature selection techniques, network optimization, and integration with datasets from other domains.

Mass Spectra Networks

Cosine similarity threshold

We can adapt the network according to the cosine similarity threshold. Use the slider to change the cosine threshold.

Prediction

Using the network, you can predict the toxicity of unknow mass spectra (white nodes). In this example, red nodes are active and the green ones are inactive to the NR-AR endpoint. The (+) sign means predicted as active and (-) as inactive.

Last remarks

This approach shows potential for the automated screening of mass spectra features in non-targeted analysis and the prioritization of samples.

We hope that further expansion of mass spectra databases, development of algorithms, and the integration of multi-domain databases could increase performance and chemical space for a more feasible study of contaminated areas.

Similar network approaches to this one could be beneficial in other fields as well, for example, in the pharmacognosy of bioactive compounds from natural sources.

Automated Prediction of the Endocrine Disruptive Potency of Chemicals detected with LC/ESI/HRMS based on Mass Spectral Networks

MNTox pinpoints MS² features with endocrine disruptive potency.

Abstract

Toxicity dataset

MS² dataset

Mass Spectra Networks

Cosine similarity threshold

Prediction

Sample analysis

Last remarks

Related Links

BibTeX

Automated Prediction of the Endocrine Disruptive Potency of Chemicals detected with LC/ESI/HRMS based on Mass Spectral Networks

MNTox pinpoints MS2 features with endocrine disruptive potency.

Abstract

Toxicity dataset

MS2 dataset

Mass Spectra Networks

Cosine similarity threshold

Prediction

Sample analysis

Last remarks

Related Links

BibTeX

MNTox pinpoints MS² features with endocrine disruptive potency.

MS² dataset