Scaffold Hopping? It's Complicated

As seems to be the case these days, this post was motivated by a comment I saw in the blogosphere.  In one of the myriad discussions on applications of AI in drug discovery, someone wrote: 

"I have yet to encounter a machine learning algorithm which predicts a true scaffold hop (say from Viagra to Cialis). From that standpoint, a tool like ROCS which looks at abstract but general features like shape and electrostatics is better than a lot of ML."

The comment got me thinking about something that has bugged me for a long time.  
  • What exactly constitutes a scaffold hop? 
  • Should we consider Viagra to Cialis a scaffold hop? (hint, I don't think so, stick with me and I’ll explain) 
What is a scaffold hop? Let’s start by taking a step back and looking at some of the classic work of Hans-Joachim Böhm and Martin Stahl.  In their 2004 paper, Böehm and Stahl highlighted three different three-dimensional approaches to scaffold hopping.  



The scaffold replacement approach provided the basis for a number of early computational efforts such as the CAVEAT program developed by Georges Lauri and Paul Bartlett.  In CAVEAT and related approaches, the idea was to replace an existing core, or scaffold, with a new core that had similar vectors connecting the core to the substituents.   Another approach to scaffold hopping is shape similarity, where molecules with similar shapes and arrangements of chemical features are considered to potentially have similar biological activity.  The canonical example of this approach is the ROCS program, originally developed by Andrew Grant and Anthony Nicholls.  In a third approach, pharmacophore similarity, the objective is to find molecules with similar orientations of key pharmacophoric elements such as hydrogen bonding groups and aromatic rings.  One of the earliest examples of this approach is the ALLADIN program developed by Yvonne Martin, John van Drie, and Dave Weininger.   One key, often unstated, component of all of the above approaches is that the objective is to find new molecules that make the same interactions as an existing molecule.

Ok, back to Viagra and Cialis.  If we accept the premise that scaffold hops involve molecules making similar interactions, we should only consider Viagra to Cialis to be a scaffold hop if the two molecules are making similar interactions in the same binding site.   Fortunately, there are crystal structures of Viagra (1udt) and Cialis (1udu) bound to Phosphodiesterase 5 in the PDB, so we can see for ourselves.   Let's take a look, it's pretty easy if you have PyMol. We can grab the crystal structures, align the protein structures, and look at how much the ligands overlap.  If you'd like to see this (you know you want to), you can simply paste the following text at the PyMol command prompt.  Go ahead, I'll wait.

fetch 1udt, async=0
fetch 1udu, async=0
select chain_b, chain B
remove chain_b
align 1udt, 1udu
select 1udt_lig, %1udt and resi 1000
select 1udu_lig, %1udu and resi 1003
select ligands, 1udt_lig or 1udu_lig
show sticks, ligands
zoom ligands
deselect

Note: I'm fanatical about reproducibility and I try my best to ensure that everything in this blog can be reproduced with Open Source software.  PyMol isn't exactly Open Source these days, but it's the closest thing we have. 

The script above fetches the two pdb files, aligns them, sets some display options and zooms in on the ligands.  If all went as expected, you should see something like this.  Viagra is the compound shown as green sticks, while Cialis is shown in cyan.


Update: Thanks to Dave Koes and the wonder that is 3Dmol.js, you don't need PyMol to view the 3D structures. You can simply rotate the image below.  In this case, Viagra is in yellow and Cialis is in blue.  


Rotate the structures around a bit and look at them.  You'll notice that, although both molecules bind in the same site, they make very different interactions.  In addition to simply staring at the structures, we can be a bit more quantitative.  We can use the awesome ProDy Python package to compare the interactions made by each of the ligands.  Installing ProDy is as simple as

pip install biopython prody

Now that we have ProDy installed, we can run the short python script below to compare the residues within 3.5 angstroms of each of the ligands.  Note that the code below requires Python 3.6.  

from prody import *


def compare_contacts(prot_A, resid_A, prot_B, resid_B, dist_cutoff):
    m_A = parsePDB(prot_A)
    m_B = parsePDB(prot_B)
    contacts_A = set(m_A.select(f"protein and chain A within {dist_cutoff} of resname {resid_A}").getResnums())
    contacts_B = set(m_B.select(f"protein and chain A within {dist_cutoff} of resname {resid_B}").getResnums())
    contacts_A_and_B = contacts_A.intersection(contacts_B)
    contacts_A_not_B = contacts_A.difference(contacts_B)
    contacts_B_not_A = contacts_B.difference(contacts_A)
    return contacts_A_and_B, contacts_A_not_B, contacts_B_not_A


def test_compare_contacts():
    contacts_common, contacts_viagra, contacts_cialis = compare_contacts("1udt", "VIA", "1udu", "CIA", 3.5)
    print(f"Viagra Only = {contacts_viagra}")
    print(f"Cialis Only = {contacts_cialis}")
    print(f"Common = {contacts_common}")


if __name__ == "__main__":
    test_compare_contacts()

If we run the script above we get the following output.

Viagra Only = {664, 779}
Cialis Only = {663, 786, 783}
Common = {612, 804, 816, 817, 820}

Update: I had a couple labels switched in the code above.  Thanks, Manish for pointing out my error.  

Note that there are only 5 residues that are within 3.5 angstroms of both Viagra and Cialis.  There are 2 residues that are close to only Viagra and 3 residues that are close to only Cialis.

So back to our original question, is Viagra to Cialis a scaffold hop?  I'd say no.  For the computational chemists in the audience, take your favorite molecular superposition program and try to superimpose Viagra and Cialis. The result won't be anywhere close to the crystallographic alignment (nor should it be).  Try to define a pharmacophore that will enable you to find Cialis from Viagra, I bet you can't.  Put Cialis in with an appropriately matched set of decoys and see if you can find it.  Again, I bet you can't.

Perhaps this is simply semantics, perhaps given a binding site one could move from Viagra to Cialis.  My point here is that there are a lot of subtleties to be considered.  Just because two compounds inhibit the same target doesn't imply that they do so by the same mechanism.  In some cases, two compounds that inhibit the same target may bind to completely different sites.  Try this comparison of p38 inhibitors in PyMol.

fetch 1bmk, async=0
fetch 3obj, async=0
align 1bmk, 3obj
select 1bmk_lig, %1bmk and resi 800
select 3obj_lig, %3obj and resi 361
select ligands, 1bmk_lig or 3obj_lig
show sticks, ligands
zoom ligands
deselect

Update: Here are the same p38 inhibitors as displayed by 3Dmol.js.





As the title states, it's complicated.

This brings us to the final topic of this particular rant, scaffold-based cross-validation.  This is a method that has become popular lately, particularly among the DeepChem crowd.  Initially, this seems like a logical choice.  If I have a dataset containing two different chemotypes, I can train a predictive model on chemotype A and assess its ability to predict chemotype B.  Sounds great, but what if, like the cases above, the ligands are not making the same interactions.  Is this a valid comparison?  It's complicated.











Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Comments

Popular posts from this blog

We Need Better Benchmarks for Machine Learning in Drug Discovery

AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)

Getting Real with Molecular Property Prediction