Preliminary resources for applying cheminformatics in search of treatments for the COVID-19 virus. The target modelling is based on an early release by Diamond/XChem which provides about a thousand unique structures for small fragment-like molecules. About 10% of these were found to bind to the Mpro protein, which can be used as a true/false activity readout of sorts.

A second model has been built by combining a set of antivirals released by Chemical Abstracts with presumed negatives from ChEMBL 26. The latter was stripped of molecules with Tanimoto/ECFP6 similarity over 0.5 to any of the antivirals. The actives do not necessarily have any useful effect against COVID-19, and the inactives are based on circular reasoning, but the model can be used as a crude measurement of antiviral-likeness.

Downloadable Resources

  • fragments.bayesian: Bayesian model using the XChem fragments, with binding as the activity. Uses open source implementation of ECFP4 fingerprints. 5-fold cross validation ROC = 0.801.
  • fragments.ds: the fragments converted into structures, using XML datasheet format.
  • fragments.sdf: the same fragments, in SDfile format.
  • antiviral.bayesian: Bayesian model of general antivirals vs. presumed inactives, using ECFP6 fingerprints. The ROC is near perfect, which is an artifact of the similarity screen used for its construction.
  • predictions.ds: molecules generated procedurally from accessible reaction transforms using available reagents, and ranked based on their model prediction score.
  • predictions.sdf: the same predictions, in SDfile format.






Questions, comments, bugs or requests to