Preliminary resources for applying cheminformatics in search of treatments for the COVID-19 virus. The target modelling is based on an
early release by Diamond/XChem
which provides about a thousand unique structures for small fragment-like molecules. About 10% of these were found to bind to the Mpro
protein, which can be used as a true/false activity readout of sorts.
A second model has been built by combining a set of antivirals released
by Chemical Abstracts with presumed negatives from
ChEMBL 26. The latter was stripped of molecules with Tanimoto/ECFP6 similarity
over 0.5 to any of the antivirals. The actives do not necessarily have any useful effect against COVID-19, and the inactives are based on circular
reasoning, but the model can be used as a crude measurement of antiviral-likeness.
- fragments.bayesian: Bayesian model using the XChem fragments, with binding
as the activity. Uses open source implementation of ECFP4 fingerprints.
5-fold cross validation ROC = 0.801.
- fragments.ds: the fragments converted into structures, using
XML datasheet format.
- fragments.sdf: the same fragments, in SDfile format.
- antiviral.bayesian: Bayesian model of general antivirals vs. presumed inactives,
using ECFP6 fingerprints. The ROC is near perfect, which is an artifact of the similarity screen used for its construction.
- predictions.ds: molecules generated procedurally from accessible reaction
transforms using available reagents, and ranked based on their model prediction score.
- predictions.sdf: the same predictions, in SDfile format.
Questions, comments, bugs or requests to email@example.com