Given a molecular structure, it is possible to use the constituent atoms and bonds to predict a variety of properties about the molecule, such as its physical behaviour, or its affinity for biological targets. A number of calculated properties are available from mobile apps, in particular MMDS and MolPrime+.
Predicted properties vary considerably with regard to the ease with which computer software can make a prediction from the incoming structure. Some properties are very easy to calculate (e.g. molecular weight) and require negligible computing resources. Others are more difficult, and require more complicated algorithms. For example, properties such as the water/octanol partitioning coefficient (log P) can be estimated by counting up the occurrences of particular fragments within the molecule, which requires a substructure searching algorithm. Some calculations are very fast and can easily be done on a mobile device without the user ever noticing any kind of delay, while others may need to dwell in the background for as long as a few seconds in order to complete a property calculation.
Properties come in two main flavours: scalar properties are essentially numbers of short strings that describe a feature of the overall molecule (e.g. number of hydrogen bond donors, or molecular formula), while molecular properties draw attention to portions of the molecule (e.g. PAINS filters), or transformations into different kinds of molecules (e.g. tautomers). Some calculations can produce both: for example, viewing all of the R/S and E/Z stereocentres of a molecular structure is highly informative, but it is also often quite useful to tabulate how many stereocentres a molecule has.
The source materials for calculating structure-based properties are sometimes concise, but often they are derived by building a model from a huge collection of sample data, and distilling out only what is necessary to apply the predictions. For example, the logic for counting up the number of "rotatable" bonds follows simple rules (e.g. non-terminal single bonds that are not in a ring), whereas deriving models such as the PAINS filters involves hundreds of thousands of experimental datapoints and a great deal of analysis by scientists in order to identify a small list of common matches. In most cases, the amount of data required to apply the model is a tiny fraction of what was needed to build it, making inclusion of this functionality into a mobile app practical.
A significant list of calculated properties is available for both the Mobile Molecular DataSheet (MMDS) and MolPrime+ apps, which are described in this article. Because MolPrime+ is a simpler app that is designed to work with one molecule at a time, it provides single molecule property visualisation on demand. MMDS is designed to work with collections of molecules, so in addition to single molecule property visualisation, it can also calculate properties for a whole collection of molecules as a single operation.
Both MMDS and MolPrime+ share the same property display panel for single molecules, which are invoked in a similar way. MolPrime+ also has a preview screen as a precursor, which shows most of the same content, in a more condensed form factor.
From the main screen, pick one of the molecular structures, e.g. caffeine:
Bring up the menu bank, either by tapping on it a second time, or by touch-and-hold:
As shown above, command buttons are shown to the left, and on the right are partially obscured properties. These can be viewed by swiping to the left:
Swiping to reveal more of the properties collapses the menu bank, and uses the full width of the screen.
When the buttons and properties are first shown, the app will start calculating the properties in a background thread. The first few properties (e.g. molecular weight) appear instantly, since they take almost no time to compute. Other properties will appear after a short delay.
The fullscreen view can be brought up either by pressing the properties action button (if the menu bank has not yet been collapsed), or the fullscreen button at the bottom right. The fullscreen view is identical to the properties panel for MMDS, which is described below.
From the main screen, select a molecule and touch-and-hold to bring up the menu bank, then select the properties button:
This will bring up a fullscreen panel that shows calculated properties for the structure:
This panel is the same for both MMDS and MolPrime+. Note that it is also possible to launch the panel from the Interoperativity sub-bank within the sketcher:
As with the preview panel of MolPrime+, the property calculations are performed in a background thread. When the panel is first shown, the fast-to-calculate properties are shown immediately. The panel is updated with more content each time new properties become available.
Most of the graphical properties shown below are omitted when there is no information to display, e.g. if the molecule has no stereocentres, no tautomers, etc., the corresponding sections will not appear.
A scalar property can easily be represented as a number or a short formatted string. These are displayed in the first block of the property viewing panel.
Sites for atom- and bond-centred stereochemistry are determined by examining the molecular graph. Main group atoms with nominally tetrahedral geometry that have 4 different substituents are potentially chiral centres, and alkene-like double bonds for which the cis/trans forms represent different structures are considered to be bond stereocentres. The chiral parity is determined by examining the coordinates and bond geometry. For the most part molecules are 2D sketches, which need to use the up/down (wedge/hash) notation to show geometry. While MMDS and MolPrime+ are mainly intended for 2D sketches, it is also possible to encode 3D coordinates, and in this case the geometry is used as-specified. For double bond stereochemistry, either 2D or 3D coordinates are sufficient to fully specify the state.
In cases where the stereochemistry is specified and can be determined, a label is computed (R/S for chiral atoms, E/Z for restricted rotation stereobonds). If a chiral centre does not have enough clues to infer a single enantiomer, or the site is explicitly marked as racemic using the wavy bond type, it is marked as unknown. Double bond stereochemistry is generally always implied by the structure, unless it is explicitly marked as unknown, or drawn with a very odd geometry.
Note that the chirality determination does not currently distinguish between linked stereocentres, or take action regarding meso planes of symmetry. Other stereochemistry that is common with nonorganic structures, such as square planar or octahedral centres, is not marked. These may be implemented at a later date.
Many organic molecules of biological interest exist in aqueous solution as more than one distinct molecular structure, due to
the ability of certain functional groups to transfer hydrogen atoms to another part of the molecule, and adjust the single/double
bond pattern to satisfy valence rules. While the exact mechanisms that occur in the reality of a chemical ensemble are due to
a balance of kinetics and thermodynamics that is too nuanced to be perfectly predicted by a small set of rules, it is possible
to obtain a very useful approximation. The apps implement a slightly modified version of a literature recipe:
Frank Oellien; Jörg Cramer; Carsten Beyer; Wolf-Dietrich Ihlenfeldt; Paul M. Selzer: "The Impact of Tautomer Forms on Pharmacophore-Based Virtual Screening",
Once all of the tautomer transformation rules have been applied, the set is reduced so that only unique compounds are retained. Shown below is the collection of states displayed for guanine, which is a popular example used to demonstrate a diabolical case of numerous plausible tautomeric forms:
Because there are many tautomers, the collection can be scrolled from left to right by swiping. Each of the molecules is highlighted with a green backdrop to show which of the atoms were affected by at least one tautomer shift which, in the case of guanine, is all of them.
As well as showing all of the unique molecules induced by tautomeric transformations, the individual molecular representations are also adjusted to show the loss of distinct stereochemistry. Consider the chiral centre in the acetylacetone derivative below:
Because the chiral centre interconverts with an sp2 planar tautomer, the chirality is not preserved, and the original use of the wedge bond to indicate the (R) enantiomer is revealed to be misleading, at least in aqueous solution.
There are many ways to draw a molecule incorrectly, and for the most part it is not for the software algorithm to decide what a chemist is allowed to describe, but for first row p-block elements, there are hard rules that are generally considered inviolable. Essentially, the valence counts of C, N, O and F must add up to 8, while S, P, Cl, Br and I are slightly more flexible (due to d-electrons) and are granted the option of having an extended shell. Any molecule that has violations of these rules is worth drawing attention to, because it is more likely to be a mistake than anything else. There may be a legitimate reason, but judicious use of the zero-order bond and control over implicit hydrogens makes it possible to represent almost any real world molecule with a plausible and compliant valence state for its light main group constituents.
As shown above, the highlighted nitrogen atom has 4 substituents, which is invalid: the valence adds up to 9. In order to be correct, the nitrogen centre should have a positive charge.
For any molecule that is being postulated as a possible bioactive molecule, the PAINS filters are likely to be of interest:
Jonathan B. Baell; Georgina A. Holloway: "New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries
and for Their Exclusion in Bioassays",
In short, the filters are a collection of substructure fragment queries that have been identified by cherry picking from molecules that were found to hit a large number of targets in high throughput screens, for the wrong reasons. If a molecule contains one of the PAINS fragments, then it is highly likely that there is something wrong with it, e.g. it causes a false alarm by triggering the detection event without binding, or it binds to everything and is therefore useless for therapeutic purposes, or it is reactive, or a covalent binder, etc. Matching one of these PAINS filters should be considered as a strong warning, and at the very least any kind of positive activity measurement should be considered with an extra helping of scepticism.
PAINS filter matches are shown graphically, with each hit showing the substructure overlap:
These filters are relatively slow to calculate, since there are a large number of queries to consider, and each of which requires that the app fire up a considerable amount of computational machinery. For this reason there is usually a delay of several seconds before any PAINS matches are shown.
Collections of Molecules
The Mobile Molecular DataSheet (MMDS) app operates with the datasheet being its basic unit, which is a tabular collection of molecules and other data. Browsing calculated properties for one molecule at a time is useful, especially since it is convenient to show a variety of properties that are best interpreted graphically, but there are many workflows that require a new column to be created. In this case a value is calculated for each property and each molecule.
To calculate properties, open the menu for a regular user-defined datasheet (i.e. not the scratch sheet or structure templates), and select the property calculation action button (which uses the same icon as for individual molecules):
This will bring up the preparation dialog, which inquires as to which properties should be calculated. The default selection involves calculating all properties except for the very slowest, which are only determined if requested:
The calculation can be interrupted without consequences. Also note that by default, properties will only be calculated if they are new, which is convenient for when new molecules are added to a collection: the missing properties can be updated without having to force the entire datasheet to be recalculated. By activating the corresponding checkbox, it is possible to force the recalculation, which may be useful if some of the structures have been changed.
Only scalar versions of property calculations are determined. These can be viewed in detail mode:
While the results from the more exotic calculations lack the visual representations, these can be easily called up one molecule at a time, should the short-hand form indicate that there are properties of interest. Perhaps more importantly, once these properties are calculated, it is possible to share the data in a variety of ways, which makes it possible to use MMDS as a calculation engine that can feed data to other software packages.
MMDS and MolPrime+ both provide a selection of useful property calculations, some of which are quite advanced. All properties are calculated on the device itself. Some of these properties are numbers or text descriptions, while others are visualised by molecular overlays. Properties can be calculated for one molecule at a time, or for whole datasheets, where each property is stored in a separate column.