Building on the Fragments From the Diamond/XChem SARS-CoV-2 Main Protease (MPro) Fragment Screen (Part I)

Like many in the community, I've been spending time looking at the results of a fragment screen of the SARS-CoV-2 main protease (MPro) by the group at the Diamond Light Source.  I thought it might be useful to share some of the techniques and the code that I use when working with the results of fragment screens.  One of the first things I like to do is to virtually expand the fragment by adding a set of substituents and looking for ways that the fragment can be expanded to make new interactions and increase potency.

Once I've added the substituents, I'll generate 3D conformers for the resulting expanded fragments.  The conformers are constrained so that the coordinates of the fragment are fixed.  Since we've fixed the coordinates of the core fragment, we can trivially place these expanded fragments in the original binding site and evaluate their interactions.  This approach is similar to the one we described in a 2004 paper in J. Med. Chem. In this post, I'll describe one approach to computationally expanding a fragment hit.  In a subsequent post, I'll detail the method for generating the 3D conformations and evaluating these conformations in the binding site.

I recently found this preprint and the accompanying GitHub repo from Pavel Polishchuk describing a software tool called CReM (Chemically Reasonable Mutations). The CReM software is like a Swiss army knife for molecular modification and provides several useful capabilities for growing, mutating, and linking molecules. In this post, we'll take a look at a few ways that CReM can be used to modify one of the fragments from the Diamond COVID-19 structures. All of the code and data accompanying this post is available in a Jupyter notebook on GitHub.

CReM takes a simple but effective approach to modifying molecules. The workflow begins by creating a database of matched molecular pairs based on a set of input molecules. The CReM distribution contains a couple of databases compiled from the ChEMBL database. Each matched molecular pair contains a context that specifies an atom environment around which the matched molecular pair was found. When replacing or attaching functional groups, CReM matches the context around the attachment point and uses this context to identify relevant functional groups. The user can specify a radius for the context.  Larger values for this radius specify a more specific context (fewer replacements) while small values specify a less specific environment (more replacements). As mentioned above, CReM provides the ability to mutate, grow, and link molecules. In this post, we'll focus on the "grow" capability, since this provides an excellent way to explore around fragment hits.

We'll start out with this fragment, labeled x0161, in the Diamond set.

To begin, we will create an RDKit molecule from the fragment and generate 2D coordinates.

frag_smi = "COC(=O)C=1C=CC(=CC1)S(=O)(=O)N"
frag_mol = Chem.MolFromSmiles(frag_smi)

Next, we have to specify the location of the database that CReM will use.

crem_db_file = '/Users/pwalters/crem/replacements_sc2.5.db'

I found that some of the substituents that CReM uses contained undesirable (reactive, potentially toxic) functional groups.  In order to remove these, I defined a REOS (Rapid Elimination Of Swill) class that uses some of the code and ideas I laid out in my post on Filtering Chemical Libraries.  The REOS class is initialized with a set of filtering rules.

reos = REOS("bms_filters.csv")

Now that our setup is complete, we can  add substituents to our fragment.

grow_list = list(grow_mol(frag_mol, db_name=crem_db_file,max_atoms=8,return_mol=True))
mol_list = [RemoveHs(x[1]) for x in grow_list]
_ = [AllChem.GenerateDepictionMatching2DStructure(m,frag_mol) for m in mol_list]

Note that I performed two operations on the resulting molecules to improve the aesthetics and make it easier to view the molecules.

  • Remove explicit hydrogens
  • Align the molecules to the fragment so that they're all oriented the same way 
Let's grab a random sample of the molecules we generated and look at them. 


There are several interesting ideas here, but this also points to the fact that we probably need to add a couple of additional functional group filters. In particular, we need to add a filter for long alkyl chains. We may also be able to get at this by adding a filter for rotatable bonds. 

One of the other cool features in CReM is the ability to exclude certain atoms from expansion or to only expand from a predefined set of atoms.  For instance, let's assume that the sulfonamide NH2 in the fragment is making a hydrogen bond, and we don't want to substitute that part of the molecule. The grow_mol function allows us to specify the atom numbers for the atoms that should not be substituted. In order to use this, we need to specify the atom numbers for the atoms that should not be substituted. To make this easier to use, we'll define a function to return atom numbers matching a SMARTS pattern.  

def get_matching_atoms(mol_in,smarts):
    pat = Chem.MolFromSmarts(smarts)
    return [x[0] for x in mol_in.GetSubstructMatches(pat)]

We can now use this function to get the atom number for the sulfonamide nitrogen.

sulfonamide_N = get_matching_atoms(frag_mol,"[NH2]")

This time we'll add substituents to the molecule, but avoid the sulfonamide nitrogen.

grow_list = list(grow_mol(frag_mol, db_name=crem_db_file,max_atoms=8,protected_ids=sulfonamide_N,return_mol=True))
mol_list = [x for x in mol_list if reos.eval_mol(x)]
mol_list = [RemoveHs(x[1]) for x in grow_list]
_ = [AllChem.GenerateDepictionMatching2DStructure(m,core) for m in mol_list]


Note that in this case, the sulfonamide is not substituted. 

Finally, we can also specify only the parts of the molecule that will be substituted.  Let's say that we only want to substitute the aromatic carbons on the fragment. 

aromatic_atoms = get_matching_atoms(frag_mol,"[cH]")

grow_list = list(grow_mol(frag_mol, db_name=crem_db_file,max_atoms=5,replace_ids=aromatic_atoms,return_mol=True))
mol_list = [x for x in mol_list if reos.eval_mol(x)]
mol_list = [RemoveHs(x[1]) for x in grow_list]
_ = [AllChem.GenerateDepictionMatching2DStructure(m,core) for m in mol_list]


This is just a quick overview of how an Open Source package can be used to generate ideas from hits coming out of a fragment screen.  Hopefully someone will find this useful.  In the next post I'll detail some of the tools that can be used to assess these expanded fragments. 


  1. Thanks! This post was really educational. Please keep up with the posts, also sharing them on Twitter.

  2. Thank you for this post
    Had a project on hold waiting for this post :)
    Just a question. I don't understand how to specify parts of the molecule to be substituted.
    Why ortho position of the sulfonamide is never substituted? And can we specify position to be replaced using atom numbers?

    1. The default behavior for CReM is to use an atom environment radius of 3. This means that all all of the atom environments 1,2 and 3 atoms out from the position of substitution must match. If you reduce the radius to 1, you generate 3900 molecules that are substituted ortho to the sulfonamide. Try something like this.
      grow_list = list(grow_mol(frag_mol, db_name=crem_db_file,max_atoms=8,return_mol=True,radius=1))

      You can pass a list of atom numbers for the positions you want to substitute.

      grow_list = list(grow_mol(frag_mol, db_name=crem_db_file,max_atoms=5,replace_ids=[5,6,8,0],return_mol=True))

    2. This seems like a bug. I believe that there should be relevant replacements for ortho-hydrogen relative to the sulfonamide group at context radius 3 in that database of fragments. Actually, I cannot generate any ortho-derivatives at any context radius. That's weird.
      I'll look on this. Thank you!
      You may use github to report such issues

    3. Nope, that is not a bug, that's a feature :)
      For exactly this example carbon atoms in ortho-position to SO2NH2 have ids 6 and 8. So, the following command will return ortho-derivatives
      list(grow_mol(frag_mol, "replacements_sc2.5.db", replace_ids = [6,8], max_atoms = 3, radius = 3))


      But if we will specify id of only one atom of two equivalent atoms it may not work. If we specify replace_ids = [8], it will not work. But if set replace_ids = [6] it will work as above. This was caused by RDKit MMP fragmenter which returns only distinct fragments removing equivalent ones. This is mentioned in the docs. So, to be on a safe side specify always all ids, even for equivalent atoms.

      Not this time, but I believe some bugs in implementation exist. If you will find inconsistencies or unexpected behavior please contact me by email ( or post a code example in github issues.

  3. Hi Pat, Thanks for sharing such a beautiful analysis.

    I would like to draw your attention to another class of peptide based protease inhibitor z-VRPR-fmk (J.Mol.Biol.2012[419(1-2):4–21.). It appears to me that these class of inhibitors share great deal of similarity in terms of active site occupancy and pharmacophore with compound 13b, mainprotease inhibitor (Ref: X-ray structure, PDB ID: 6Y2G, Compound 13b, Ref: L. Zhang et al., Science, 10.1126/science.abb3405 (2020)). It might be worth exploring these inhibitors as for mainprotease activity. This is just a thought.

  4. Hi, Pat!
    This is Pavel. Thank you for so nice representation of the tool. :) I found your blog by the sharply increased number of visitors of the repository. The tool is still under development and there are some features which I want to implement to make it more convenient.
    In my group we use the same strategy trying to decorate X-ray ligands of Mpro and evaluate them with docking. So, I'm waiting for the second part of your study!

    I'm curious, which criteria for selection of compounds used for fragmentation would you suggest?
    I considered different criteria and it seems that Lilly filters + synthetic complexity + PAINS (optional) look reasonable but decrease search space. Thus, currently we use only synthetic complexity filter and make post-filtration of generated compounds.

    1. Thanks for such a great tool Pavel! I've found it incredibly useful. I started using the BMS filters because CReM was outputting aldehydes, epoxides, etc. I haven't put a synthetic complexity filter in because I'm primarily trying to generate ideas.

  5. I'm primarily trying to generate ideas


Post a Comment

Popular posts from this blog

The Solubility Forecast Index

AI in Drug Discovery 2020 - A Highly Opinionated Literature Review

Useful RDKit Utilities