Dissecting the Hype With Cheminformatics

A recent paper in Nature Biotechnology reported the use of AI in the discovery of inhibitors of DDR1, a kinase which has been implicated in fibrosis. Many who have worked in kinase drug discovery may have noticed that the most prominent compound (Compound 1) from this paper bears a striking resemblance to a certain marketed drug. Let's assume the compound appears familiar, but we can't specifically place it. How can we use Cheminformatics to find drugs similar to the compound highlighted in this paper? Let's take look.

In case you're interested in running the code, there is a Jupyter notebook on Github.  If you don't like code, skip to the bottom of this post for the punch line.  If you want a really easy way to run the code, try this link to a Binderized version.   This will allow you to run the notebook without downloading anything. Thanks to Peter Rose for showing me how to do this.

First, we will import the Python libraries that we'll need.  We'll also set a couple of flags to make the chemical structures prettier.
from chembl_webresource_client.new_client import new_client
from rdkit import Chem
from rdkit.Chem.Draw import MolsToGridImage
from rdkit.Chem.Draw import IPythonConsole
import pandas as pd
from rdkit.Chem import rdFMCS
from tqdm import tqdm
from rdkit.Chem import PandasTools
from rdkit.Chem import AllChem
from rdkit.Chem import rdDepictor
from rdkit.Chem.Fingerprints import FingerprintMols
from rdkit import DataStructs
IPythonConsole.ipython_useSVG = True

We will start with Compound 1 from the paper.
compound_1_smiles = "Cc1ccc2c(Nc3cccc(c3)C(F)(F)F)noc2c1C#Cc1cnc2cccnn12"compound_1_mol = Chem.MolFromSmiles(compound_1_smiles)

A quick check to ensure that we have the correct SMILES.

Now let's use the newly released ChEMBL API to grab the SMILES for all of the small molecule drugs.  Once we have the SMILES and ChEMBL Ids for the drugs, we'll put this into a Pandas dataframe. 
molecule = new_client.molecule
approved_drugs = molecule.filter(max_phase=4)
small_molecule_drugs = [x for x in approved_drugs if x['molecule_type'] == 'Small molecule']
struct_list = [(x['molecule_chembl_id'],x['molecule_structures'])for x in small_molecule_drugs if x]
smiles_list = [(a,b['canonical_smiles']) for (a,b) in struct_list if b]
smiles_df = pd.DataFrame(smiles_list)
smiles_df.columns = ['ChEMBL_ID','SMILES']

Let's add a molecule column to the dataframe to make it easier to view the chemical structures

Now we'll add a fingerprint column to our Pandas table so that we can do a similarity search.
smiles_df['fp'] = [FingerprintMols.FingerprintMol(x) for x in smiles_df.Mol]

Next, we can generate a fingerprint for Compound 1 and use that to do a similarity search.
compound_1_fp = FingerprintMols.FingerprintMol(compound_1_mol)
smiles_df['fp_sim'] = [DataStructs.TanimotoSimilarity(compound_1_fp,x) for x in smiles_df.fp]

Let's look at the 5 most similar compounds.
top5_sim_df = smiles_df.head()
MolsToGridImage(top5_sim_df.Mol,molsPerRow=5,legends=["%.2f" % x for x in top5_sim_df.fp_sim])

Hmmm, the first compound above looks a lot like Compound 1.  Just to be certain that we've found what we need, why don't we try an alternate method of calculating similarity.  In this case, we'll calculate the number of atoms in the maximum common subgraph (MCS) for Compound 1 and each of the small molecule drugs.  MCS calculations are time-consuming so this isn't the sort of thing we want to do with a large database.  However, in this case we only have a few thousand drugs, so the calculation isn't prohibitive.  On my MacBook Pro, this takes about a minute and a half.

We'll start by defining a function that calculates the number of atoms in the MCS for two molecules.
def mcs_size(mol1,mol2):
    mcs = rdFMCS.FindMCS([mol1,mol2])
    return mcs.numAtoms

Next, we will run this over the small molecule drugs in our dataframe.
mcs_list = []
for mol in tqdm(smiles_df.Mol):

We can add the number of atoms in the MCS to our dataframe.
smiles_df['mcs'] = mcs_list

Now we sort the dataframe by the number of atoms in the MCS and take a look at the 5 compounds with the largest MCS.
top5_mcs_df = smiles_df.head()
MolsToGridImage(top5_mcs_df.Mol,molsPerRow=5,legends=["%d" % x for x in top5_mcs_df.mcs])

While the two most similar molecules are the same as those we identified using the similarity search, we can see that the MCS search uncovers a few different molecules. Let's learn a bit more about the drug that is most similar to Compound 1. We can start by getting its ChEMBL identifier.

We can use the ChEMBL API to get the names associated with this molecule.
molecule = new_client.molecule
m1 = molecule.get('CHEMBL1171837')
pd.DataFrame([(x['molecule_synonym'],x['syn_type']) for x in m1['molecule_synonyms']],columns=['molecule_synonym','syn_type'])

This molecule is Pontatinib.  A quick Google search shows us that this is a marketed drug originally developed as an inhibitor of BCR-ABL. Ponatinib is also a promiscuous inhibitor of a number of other kinases including DDR1.  In fact, extensive SAR around the activity of Ponatinib analogs against DDR1 was reported in a 2013 paper  in J.Med.Chem.

Let's generate a side-by-side visualization of Compound 1 and Ponatinib with the MCS highlighted.
ponatinib_mol = top5_mcs_df.Mol.to_list()[0]
compound_1_mcs = rdFMCS.FindMCS([compound_1_mol,ponatinib_mol])
mcs_query = Chem.MolFromSmarts(compound_1_mcs.smartsString)
for m in [compound_1_mol,ponatinib_mol]: AllChem.GenerateDepictionMatching2DStructure(m,mcs_query)
compound_1_match = compound_1_mol.GetSubstructMatch(mcs_query)
ponatinib_match = ponatinib_mol.GetSubstructMatch(mcs_query)
MolsToGridImage([compound_1_mol,ponatinib_mol],highlightAtomLists=[compound_1_match,ponatinib_match],subImgSize=(400, 400))

Given the similarity of Compound 1 to a marketed drug with 9nM biochemical activity against DDR1 and good pharmacokinetics (PK), the activity and PK profiles of Compound 1 are not particularly surprising.


  1. To understand the GENTRL paper you may need to study this paper first: https://arxiv.org/abs/1605.05396

    and going through the history of GANs: https://www.theverge.com/2018/12/17/18144356/ai-image-generation-fake-faces-people-nvidia-generative-adversarial-networks-gans

    the fact that this approach can generate valid molecules that are different from the template but still work is pretty dramatic.

  2. Dear Unknown,

    I've read the paper you mention and am familiar with GANs. In fact, I cowrote a book on "Deep Learning for the Life Sciences" (see Chapter 9).

    My point is that I don't personally view a simple isosteric replacement of an amide carbonyl by an isoxazole to produce an equipotent compound as dramatic.

  3. If I understand what you mean, it sems AI needs to provide radically different compounds to merit their use over good old human intelligence. What if we use this result to provide a different insight: DDR1 has evolved to accept only a very limited number of specific molecules or else it would be easy to interrupt their biological function, causing problems to the organism. Failure to find anything else, or anything that is significantly different, supports this view, Although it is far from actual proof that there is nothing else significantto be found, accepting this possibility would signal researchers to look in other directions, such as small peptides or targeting other pathway proteins. Not accepting this view could lead to a dead end in which researchers spend resources looking for a small molecule that might not exists. I hope you find my comment of value to you.

  4. Dear Noel,

    I believe that molecules generated by an AI should be judged by the same standards as molecules generated by the brain of a medicinal chemist. Let's assume that a medicinal chemist submitted a paper where the starting point was Ponatinib and the endpoint was Compound 1, and the biochemical activity, cellular activity, and PK profiles of the two compounds were roughly equivalent. This paper would not be published. In fact, it probably wouldn't even be sent for review.

  5. Pat, thank you very much for your brilliant job. It's absolutely clear that all molecules modelated with AI should be checked for similarity with known compounds. Honestly such articles create information noises so it's more and more difficult to find a pearl in a heap. From the other side the article in Nature has some science value: new methodology of fast in silico search, two non-active compounds (negative results are always useful). If isoxazole pattern was really introduced with deep learning it should be considered as achievement. In all fairness in the most jobs on SAR there is spinning off around known structures with just varing of substituents. I didn't found new ideas in Gao paper, all leads (7rh, 7 rj) are absolutely identical to Ponatinib (and it's also not mentioned there), so the value of the article is near zero (although if we use medchem standards the article is good one).

  6. Ponatinib is shown in Figure 1 of the Gao paper. The authors also mention it several times in the text. The objective of Gao work was to establish selectivity for Bcr-Abl, c-Kit, and DDR2. Gao and coworkers found that they could modulate selectivity by replacing the imidazo[1,2‐b]pyridazine in Ponatinib with a pyrazolopyrimidine.

    1. Pat, I must disagree with you. I don't know why you defend chinese colleagues. Their article is classic patent circumvention. The most of compounds from original paper (10.1021/jm100395q) are DDR1 inhibitors with nanomolar activity (for Ponatinib see 10.1016/j.jmb.2014.04.014). Gao and coworkers studied positional isomers of Ponatinib (7ra has exact brutto), but they used nilotinib and dasatinib as reference compounds! And I didn't found in the article any mention about rationale for the such choice. I do not blame them, the patent circumvention is the important part of my own work.

    2. Oh, sorry for my English:
      ...all leads (7rh, 7 rj) are absolutely identical to Ponatinib (and it's also not mentioned there) ... --> (this fact was also not mentioned there).

    3. My intent was not to defend anyone, or to comment on the quality of the paper by Gao et al. I simply wanted to point out that Gao and coworkers did mention Ponatinib and show its structure in Figure 1.

  7. Pat - thanks not only for this well expressed comment on the Nat Biotech paper but also for this efficient bit of code for pulling data out of Chembl This may be a native linux or Mac vs PC issue, but I ran into difficulties running the MolsToGridImage statements on my setup with anaconda python (running 64 bit Python 3 with anaconda under Windows 10). The problem can be fixed by converting the first input to a list like this:

    MolsToGridImage(top5_sim_df.Mol.to_list()[0:5],molsPerRow=5, legends=["%.2f" % x for x in top5_sim_df.fp_sim])

    rather than


  8. Thanks Byron, I think this is due to a bug in an older version of the RDKit. Either way, I made your suggested change to the code. I also added a Binderized version for people who don't want to download the code and set up an environment.

  9. A colleague just brought this to my attention. Very happy to see that the colleagues at Relay took interest in this work and can do a similarity search. This paper is not about the molecules.
    Instead of judging the molecule, I recommend developing a similar generative model, generating a few compounds (there was more than one win low-digit nanomolar activity and many very cool molecules, just more difficult to synthesize), synthesizing and testing. It will require a few failures to get it to work. And for some strange reason, two years after we developed these models nobody did this exercise and published. Alex Z

  10. Dear Alex,

    Thanks for your response. While the paper may not have been about the molecule, the surrounding hype (“AI Designs a Drug in 21 Days”) certainly was. I’m a fan of generative models, but I believe their results need to be put in the appropriate context. I would like to suggest three simple guidelines around results from a generative model.
    1. The chemical structures of training set molecules should be made available in electronic form.
    2. The most similar training set molecule to each reported molecule should be presented in a table.
    3. Molecules generated by an AI should be judged by the same criteria as molecules generated by human imagination.

    1. Thanks, Pat, it is understandable. But the hype did not hurt and more people entered this field. IHMO any cool story in science should be promoted because most of the hype in the society is happening outside science. Plus, we never claimed to have a drug. Only drug-like. I was a bit surprised to see it take off like that since it is not even in top 5 cool things we did in AI to date but it is certainly a motivation to publish more of the internal stuff.
      1. It is relatively easy to assemble the training sets. We followed the best practices. And we even maintain this - https://arxiv.org/abs/1811.12823 . It will be expanded in the next update and we will publish something cool on top of it.
      2. We also made the 30K structures available and it is possible to see the other molecules designed by GENTRL we could not synthesize and test all of them.

      But points taken.

    2. Alex, I have to disagree with you here. I believe that the hype Is hurting the field in two important ways.
      1. It creates unrealistic expectations among the general public and even some more naive members of the scientific community.
      2. It causes our scientific colleagues to look askance at all applications of AI in Drug Discovery, and overshadows a number of legitimate advances.
      Btw, you didn’t provide any evidence that you fulfilled any of the three criteria I outlined above.

  11. Alex, I have to disagree here.

    There is nothing in this paper except the molecules. All you show in this paper is potency, selectivity, and some ADMET of the synthesized compounds. You suggest that the paper should be considered based on the technology and say that GANs are hard to train to generate good hypotheses. But you did not compare your GANs to basic bioisosteric replacement that has existed forever. Nor did you compare the GANs to other tools, like Drug-Guru, that have been around for more than a decade.

    In the paper, you did not define your success criteria or any success criteria and did not perform baseline experiments. Furthermore, you did not adhere to the basic ML practice of measuring the similarity of your results to the training data.

    You generated a set of 30k hypotheses and filtered them based on their similarity to the very training data you used to build your GAN model.

    Frankly, if you knew how to perform similarity search to these known DDR1 binders as well as Pat and (I'm sure) the rest of Relay, this conversation would have been moot. Speaking of similarity searches, SOM is just a glorified nearest neighbor. So, it is not surprising that you select the molecules out of your 30k ones that are most similar to the known data. To top that, you filter the remaining of the hypotheses based on pharmacophore similarity to known DDR1 binders which have been co-crystallized -- no word about the similarity between these compounds and the filtered ones. Finally, these molecules were chosen by WuXi's medicinal chemists, and we all already believed that they can make analogs of known kinase inhibitors.

    In my opinion, this paper should not have been published because of a lack of novel discoveries and certainly should not have been published because of low ML standards and lack of rigor.

    1. Geoff, I am not sure if your choice of nickname justifies a response so I will be brief.
      We did compare GENTRL to the other generative models (supplementary). WuXi medicinal chemists did not look at the molecules before synthesis. There are also many additional structures that were not synthesized. Regarding the ML rigor, come to NIPS and see the mathematical extension of this work.

  12. Alex, you are deflecting rather than addressing the clear arguments I made (and I see that you also ignored Pat's 3-point criteria above).
    Let me concretely address the diversion that you are trying to make here.

    > We did compare GENTRL to the other generative models (supplementary).
    True, but irrelevant. First, you compared your method to other reinforcement algorithms on a TOY example (!!!), not on the real problem in hand. Second, these reinforcement learning algorithms are not baselines models. The claim that you are trying to make is that you can do better than a medical chemist, not better than some other AI mumbo-jumbo algorithm. I would think that basic bioisosteric replacement is de-facto the baseline you should try to beat, and you have failed to demonstrate that.

    > WuXi medicinal chemists did not look at the molecules before synthesis. There are also many additional structures that were not synthesized.
    Interesting, so how did the chemists assess the synthetic feasibility without ever looking at the compounds? There is no discussion in your manuscript regarding the selection criteria.

    > Regarding the ML rigor, come to NIPS and see the mathematical extension of this work.
    Math is not an indication of scientific rigor. You don't need complicated math to define your success criteria, nor do you need complicated math to perform baseline experiments. Furthermore, you certainly do not need fancy math to measure the similarity of your results to the training data and assess data leakage.

    You chose to ignore all the relevant critics that I, and others on this blog, have raised, and decided to comment about my name.

    I'd be interested in getting your response about (i) success criteria, (ii) proper separation of training data and predictions, and (iii) performance of basic baseline methods on the actual problem in hand.



Post a Comment

Popular posts from this blog

How to (Not) Get a Job in Science

How (Not) to Get a Job in Science - Part 2 - The Interview