A Simple Tool for Exploring Structural Alerts

 When working in drug design, we often need filters to identify molecules containing functional groups that may be toxic, reactive, or could interfere with an assay.  A few years ago, I collected the functional group filters available in the ChEMBL database and wrote some Python code that made applying these filters to an arbitrary set of molecules easy.  This functionality is available in the pip installable useful_rdkit_utils package that's available on PyPI and GitHub.  Applying these filters is easy.  If we have a Pandas dataframe with a SMILES column, we can do something like this. 

import useful_rdkit_utils as uru

reos = uru.REOS("BMS")  #optionally specify the rule set to use
df[['rule_set','reos']] = df.SMILES.apply(reos.process_smiles).tolist()

This adds two columns, rule_set, and reos, to the dataframe with the name of the rule_set and the name of the rule matched by each molecule.  If the molecule doesn't match any rules, both columns contain 'ok'.   This is nice, but I'm not intimately familiar with each of these rule sets.  Sometimes, I'd like to look at chemical structures and see what was matched.  To make my life, and hopefully yours, easier, I've written a simple interactive viewer for functional group filters.  This tool takes advantage of the lasso_highlight_image capability recently released by the Datamol team. 

To use this tool, we need to get the SMARTS that were used by the filtering rules. We can do this by adding one line to the code above.  The new code is below. Note that we also added a "smarts" column to the dataframe. 

import useful_rdkit_utils as uru

reos = uru.REOS('BMS')
reos.set_output_smarts(True) # the new code
df[['rule_set','reos','smarts']] =  df.SMILES.apply(reos.process_smiles).tolist()

Now that we have the SMARTS, we can create an interactive tool using ipywidgets.  The tool, shown in the movie below, has a menu and a slider.  The menu is arranged according to functional group filter frequency, with the most frequently matching filter shown at the top.  The value in parentheses is the number of molecules matching that filter.  Changing the menu selection will change the highlighted chemical structure below the menu.  The slider enables us to move through and view individual molecules matching the rule shown in the menu. The slider can be operated by clicking and dragging or by clicking on the slider and using the arrow keys to navigate.  




As I finished this post, the Datamol team released an update with another great addition to the lasso_highlight_image method.  In the new version, we can pass a list of SMILES or RDKit molecules and display a grid of structures with substructures highlighted. The movie below shows an example of how we can display multiple example molecules matching each structural alert. 


The Jupyter notebook demonstrating this functionality can found in the notebooks directory in the useful_rdkit_utils package.  I hope others find it useful.  Please let me know if you run into issues or have suggestions for improvements.   Please note that this notebook requires the latest versions of datamol and useful_rdkit_tuils; make sure you update. 
pip install -U useful_rdkit_utils datamol

I'd like to thank Hadrien Mary from the Datamol team for a quick bug fix and feature enhancements.  I'll be posting more uses for lasso_highlight_image and other datamol capabilities soon. 





Comments

Popular posts from this blog

We Need Better Benchmarks for Machine Learning in Drug Discovery

AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)

Getting Real with Molecular Property Prediction