A variety of structure-based property predictions are available to apps, both for single molecules and whole collections. These consist of calculated numbers, and properties that change or apply to selected portions of the molecule.
Given a molecular structure, it is possible to use the constituent atoms and bonds to predict a variety of properties about the molecule, such as its physical behaviour, or its affinity for biological targets. A number of calculated properties are available from mobile apps, in particular MMDS and MolPrime+.
Predicted properties vary considerably with regard to the ease with which computer software can make a prediction from the incoming structure. Some properties are very easy to calculate (e.g. molecular weight) and require negligible computing resources. Others are more difficult, and require more complicated algorithms. For example, properties such as the water/octanol partitioning coefficient (log P) can be estimated by counting up the occurrences of particular fragments within the molecule, which requires a substructure searching algorithm. Some calculations are very fast and can easily be done on a mobile device without the user ever noticing any kind of delay, while others may need to dwell in the background for as long as a few seconds in order to complete a property calculation.
Properties come in two main flavours: scalar properties are essentially numbers of short strings that describe a feature of the overall molecule (e.g. number of hydrogen bond donors, or molecular formula), while molecular properties draw attention to portions of the molecule (e.g. PAINS filters), or transformations into different kinds of molecules (e.g. tautomers). Some calculations can produce both: for example, viewing all of the R/S and E/Z stereocentres of a molecular structure is highly informative, but it is also often quite useful to tabulate how many stereocentres a molecule has.
The source materials for calculating structure-based properties are sometimes concise, but often they are derived by building a model from a huge collection of sample data, and distilling out only what is necessary to apply the predictions. For example, the logic for counting up the number of "rotatable" bonds follows simple rules (e.g. non-terminal single bonds that are not in a ring), whereas deriving models such as the PAINS filters involves hundreds of thousands of experimental datapoints and a great deal of analysis by scientists in order to identify a small list of common matches. In most cases, the amount of data required to apply the model is a tiny fraction of what was needed to build it, making inclusion of this functionality into a mobile app practical.
A significant list of calculated properties is available for both the Mobile Molecular DataSheet (MMDS) and MolPrime+ apps, which are described in this article. Because MolPrime+ is a simpler app that is designed to work with one molecule at a time, it provides single molecule property visualisation on demand. MMDS is designed to work with collections of molecules, so in addition to single molecule property visualisation, it can also calculate properties for a whole collection of molecules as a single operation.
Both MMDS and MolPrime+ share the same property display panel for single molecules, which are invoked in a similar way. MolPrime+ also has a preview screen as a precursor, which shows most of the same content, in a more condensed form factor.
From the main screen, pick one of the molecular structures, e.g. caffeine:
Bring up the menu bank, either by tapping on it a second time, or by touch-and-hold:
As shown above, command buttons are shown to the left, and on the right are partially obscured properties. These can be viewed by swiping to the left:
Swiping to reveal more of the properties collapses the menu bank, and uses the full width of the screen.
When the buttons and properties are first shown, the app will start calculating the properties in a background thread. The first few properties (e.g. molecular weight) appear instantly, since they take almost no time to compute. Other properties will appear after a short delay.
The fullscreen view can be brought up either by pressing the properties action button (if the menu bank has not yet been collapsed), or the fullscreen button at the bottom right. The fullscreen view is identical to the properties panel for MMDS, which is described below.
From the main screen, select a molecule and touch-and-hold to bring up the menu bank, then select the properties button:
This will bring up a fullscreen panel that shows calculated properties for the structure:
This panel is the same for both MMDS and MolPrime+. Note that it is also possible to launch the panel from the Interoperativity sub-bank within the sketcher:
As with the preview panel of MolPrime+, the property calculations are performed in a background thread. When the panel is first shown, the fast-to-calculate properties are shown immediately. The panel is updated with more content each time new properties become available.
Most of the graphical properties shown below are omitted when there is no information to display, e.g. if the molecule has no stereocentres, no tautomers, etc., the corresponding sections will not appear.
A scalar property can easily be represented as a number or a short formatted string. These are displayed in the first block of the property viewing panel.
- Molecular Formula: The atoms in the molecular structure are added up, and converted into a condensed molecular formula. Implied or specified hydrogen atoms are included (according to the molecule definition), with atom counts shown as subscripts. Specific isotopes (e.g. deterium: 2H) are split out and displayed separately. Properly defined atom abbreviations are added up as the sum of their parts. Non-element labels are treated as if they were separate element types.
- Molecular Weight: The exact molecular weight is determined, by adding up atoms in the same way as for the formula. By default, atoms are assumed to have natural abundance, but can be overridden with a specific single isotope, and this will be taken into account. If the atom has any non-element labels without a corresponding abbreviation definition, the element is considered to have a weight of 0.
- Heavy Atoms: The total number of non-hydrogen atoms is summed. Note that structures can contain non-element labels (e.g. R, X, etc.), and these are also excluded from the total, unless they correspond to abbreviations.
- H-Acceptors: Hydrogen bond acceptors are summed according to a very simple formula. Any nitrogen, oxygen or sulfur atom that does not bear a positive charge is considered to be a potential H-bond acceptor.
- H-Donors: As with hydrogen bond acceptors, a very simply formula is used. Any nitrogen, oxygen or sulfur atom that has at least one hydrogen atom (whether explicit or actual) is considered to be a potential H-bond donor. This crude method does not take into account more subtle effects, such as polar C-H bonds, etc.
- Rotatable Bonds: Counts the number of bonds between two heavy atoms that are single, do not occur in a ring, and are not terminal (with regard to heavy atoms, e.g. a methyl group is considered to be terminal). This provides a rough indication of how bendable an organic molecule is, e.g. flat aromatic/resonating molecules like caffeine that are composed mainly of ring blocks have few or no rotatable bonds and are extremely rigid, whereas molecules with hydrocarbon connectors have many rotatable bonds and are very flexible.
The octanol/water partitioning coefficient, also known as log P, is a valuable property for medicinal chemistry because
it correlates strongly to desirable properties, such as the ability to be physiologically absorbed. It is one of the components
of the famous, and deceptively simple,
Lipinski rule of fives. There have been
many algorithms designed to calculate this property. In this case the method used is that described by:
Gordon M. Crippen; Scott A. Wildman: "Prediction of Physicochemical Parameters by Atomic Contributions",
Journal of Chemical Information and Computer Science39, 868-873 (1999) link
- Molar Refractivity: Calculated using the same method described above (Crippen et al).
- Stereoambiguity: In addition to creating a visual overlay of stereochemistry labels (see below), a scalar field is also created which shows the total number of stereocentres, and the proportion of them that are ambiguous (which includes unlabelled chiral centres, as well as chiral centres and stereo-active double bonds that are explicitly marked as unknown).
Sites for atom- and bond-centred stereochemistry are determined by examining the molecular graph. Main group atoms with nominally tetrahedral geometry that have 4 different substituents are potentially chiral centres, and alkene-like double bonds for which the cis/trans forms represent different structures are considered to be bond stereocentres. The chiral parity is determined by examining the coordinates and bond geometry. For the most part molecules are 2D sketches, which need to use the up/down (wedge/hash) notation to show geometry. While MMDS and MolPrime+ are mainly intended for 2D sketches, it is also possible to encode 3D coordinates, and in this case the geometry is used as-specified. For double bond stereochemistry, either 2D or 3D coordinates are sufficient to fully specify the state.
In cases where the stereochemistry is specified and can be determined, a label is computed (R/S for chiral atoms, E/Z for restricted rotation stereobonds). If a chiral centre does not have enough clues to infer a single enantiomer, or the site is explicitly marked as racemic using the wavy bond type, it is marked as unknown. Double bond stereochemistry is generally always implied by the structure, unless it is explicitly marked as unknown, or drawn with a very odd geometry.
Note that the chirality determination does not currently distinguish between linked stereocentres, or take action regarding meso planes of symmetry. Other stereochemistry that is common with nonorganic structures, such as square planar or octahedral centres, is not marked. These may be implemented at a later date.
Many organic molecules of biological interest exist in aqueous solution as more than one distinct molecular structure, due to
the ability of certain functional groups to transfer hydrogen atoms to another part of the molecule, and adjust the single/double
bond pattern to satisfy valence rules. While the exact mechanisms that occur in the reality of a chemical ensemble are due to
a balance of kinetics and thermodynamics that is too nuanced to be perfectly predicted by a small set of rules, it is possible
to obtain a very useful approximation. The apps implement a slightly modified version of a literature recipe:
Frank Oellien; Jörg Cramer; Carsten Beyer; Wolf-Dietrich Ihlenfeldt; Paul M. Selzer: "The Impact of Tautomer Forms on Pharmacophore-Based Virtual Screening",
Once all of the tautomer transformation rules have been applied, the set is reduced so that only unique compounds are retained. Shown below is the collection of states displayed for guanine, which is a popular example used to demonstrate a diabolical case of numerous plausible tautomeric forms:
Because there are many tautomers, the collection can be scrolled from left to right by swiping. Each of the molecules is highlighted with a green backdrop to show which of the atoms were affected by at least one tautomer shift which, in the case of guanine, is all of them.
As well as showing all of the unique molecules induced by tautomeric transformations, the individual molecular representations are also adjusted to show the loss of distinct stereochemistry. Consider the chiral centre in the acetylacetone derivative below:
Because the chiral centre interconverts with an sp2 planar tautomer, the chirality is not preserved, and the original use of the wedge bond to indicate the (R) enantiomer is revealed to be misleading, at least in aqueous solution.
There are many ways to draw a molecule incorrectly, and for the most part it is not for the software algorithm to decide what a chemist is allowed to describe, but for first row p-block elements, there are hard rules that are generally considered inviolable. Essentially, the valence counts of C, N, O and F must add up to 8, while S, P, Cl, Br and I are slightly more flexible (due to d-electrons) and are granted the option of having an extended shell. Any molecule that has violations of these rules is worth drawing attention to, because it is more likely to be a mistake than anything else. There may be a legitimate reason, but judicious use of the zero-order bond and control over implicit hydrogens makes it possible to represent almost any real world molecule with a plausible and compliant valence state for its light main group constituents.
As shown above, the highlighted nitrogen atom has 4 substituents, which is invalid: the valence adds up to 9. In order to be correct, the nitrogen centre should have a positive charge.
For any molecule that is being postulated as a possible bioactive molecule, the PAINS filters are likely to be of interest:
Jonathan B. Baell; Georgina A. Holloway: "New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries
and for Their Exclusion in Bioassays",
In short, the filters are a collection of substructure fragment queries that have been identified by cherry picking from molecules that were found to hit a large number of targets in high throughput screens, for the wrong reasons. If a molecule contains one of the PAINS fragments, then it is highly likely that there is something wrong with it, e.g. it causes a false alarm by triggering the detection event without binding, or it binds to everything and is therefore useless for therapeutic purposes, or it is reactive, or a covalent binder, etc. Matching one of these PAINS filters should be considered as a strong warning, and at the very least any kind of positive activity measurement should be considered with an extra helping of scepticism.
PAINS filter matches are shown graphically, with each hit showing the substructure overlap:
These filters are relatively slow to calculate, since there are a large number of queries to consider, and each of which requires that the app fire up a considerable amount of computational machinery. For this reason there is usually a delay of several seconds before any PAINS matches are shown.
Collections of Molecules
The Mobile Molecular DataSheet (MMDS) app operates with the datasheet being its basic unit, which is a tabular collection of molecules and other data. Browsing calculated properties for one molecule at a time is useful, especially since it is convenient to show a variety of properties that are best interpreted graphically, but there are many workflows that require a new column to be created. In this case a value is calculated for each property and each molecule.
To calculate properties, open the menu for a regular user-defined datasheet (i.e. not the scratch sheet or structure templates), and select the property calculation action button (which uses the same icon as for individual molecules):
This will bring up the preparation dialog, which inquires as to which properties should be calculated. The default selection involves calculating all properties except for the very slowest, which are only determined if requested:
The calculation can be interrupted without consequences. Also note that by default, properties will only be calculated if they are new, which is convenient for when new molecules are added to a collection: the missing properties can be updated without having to force the entire datasheet to be recalculated. By activating the corresponding checkbox, it is possible to force the recalculation, which may be useful if some of the structures have been changed.
Only scalar versions of property calculations are determined. These can be viewed in detail mode:
While the results from the more exotic calculations lack the visual representations, these can be easily called up one molecule at a time, should the short-hand form indicate that there are properties of interest. Perhaps more importantly, once these properties are calculated, it is possible to share the data in a variety of ways, which makes it possible to use MMDS as a calculation engine that can feed data to other software packages.
MMDS and MolPrime+ both provide a selection of useful property calculations, some of which are quite advanced. All properties are calculated on the device itself. Some of these properties are numbers or text descriptions, while others are visualised by molecular overlays. Properties can be calculated for one molecule at a time, or for whole datasheets, where each property is stored in a separate column.