A Simple Tool for Exploring Structural Alerts
When working in drug design, we often need filters to identify molecules containing functional groups that may be toxic, reactive, or could interfere with an assay. A few years ago, I collected the functional group filters available in the ChEMBL database and wrote some Python code that made applying these filters to an arbitrary set of molecules easy. This functionality is available in the pip installable useful_rdkit_utils package that's available on PyPI and GitHub. Applying these filters is easy. If we have a Pandas dataframe with a SMILES column, we can do something like this.
import useful_rdkit_utils as uru
reos = uru.REOS("BMS") #optionally specify the rule set to use
df[['rule_set','reos']] = df.SMILES.apply(reos.process_smiles).tolist()
This adds two columns, rule_set, and reos, to the dataframe with the name of the rule_set and the name of the rule matched by each molecule. If the molecule doesn't match any rules, both columns contain 'ok'. This is nice, but I'm not intimately familiar with each of these rule sets. Sometimes, I'd like to look at chemical structures and see what was matched. To make my life, and hopefully yours, easier, I've written a simple interactive viewer for functional group filters. This tool takes advantage of the lasso_highlight_image capability recently released by the Datamol team.
To use this tool, we need to get the SMARTS that were used by the filtering rules. We can do this by adding one line to the code above. The new code is below. Note that we also added a "smarts" column to the dataframe.
import useful_rdkit_utils as uru
reos = uru.REOS('BMS')
reos.set_output_smarts(True) # the new code
df[['rule_set','reos','smarts']] = df.SMILES.apply(reos.process_smiles).tolist()
Now that we have the SMARTS, we can create an interactive tool using ipywidgets. The tool, shown in the movie below, has a menu and a slider. The menu is arranged according to functional group filter frequency, with the most frequently matching filter shown at the top. The value in parentheses is the number of molecules matching that filter. Changing the menu selection will change the highlighted chemical structure below the menu. The slider enables us to move through and view individual molecules matching the rule shown in the menu. The slider can be operated by clicking and dragging or by clicking on the slider and using the arrow keys to navigate.
Comments
Post a Comment