Showing posts from January, 2022

The Solubility Forecast Index

Introduction Recently, I've seen a number of deep learning models designed to predict the aqueous solubility of drug-like molecules.  Despite the advantages brought about by techniques like graph neural networks, I have yet to see a commercial or open-source method that outperforms the venerable Solubility Forecast Index (SFI).  I've written about the challenges associated with predicting aqueous solubility before , so I won't revisit that discussion.  Needless to say, this is a difficult problem.   The SFI, published in 2010 by Alan Hill and Robert Young at GSK, provides a simple, elegant equation for estimating aqueous solubility.   SFI   =   c L og D pH7.4   +   #Ar Where  c L og D pH7.4   is the calculated partition coefficient of all neutral and ionic species of a molecule between pH 7.4 buffer and an organic phase, and #Ar is the number of aromatic rings.  This seems pretty simple and should be easy to calculate.  The number of aromatic rings can be trivially calcul

Useful RDKit Utilities

There's a lot of useful functionality in the RDKit .  My problem is remembering where all of the most useful bits are, and how to use them.  In order to make my life, and perhaps yours, a little easier, I put together a Python package called " useful_rdkit_utils ".  Some of what's in there is simply a repackaging of existing functionality to make it easier to use (at least for me).  In other cases, there are functions I borrowed from elsewhere, and there are a few new ideas introduced.  One interesting component in the library is a REOS class that encapsulates the functionality in the rd_filters package I released a few years ago.   I made the package easy to install.  All you have to do is " pip install useful_rdkit_utils ".  The GitHub repo also has Jupyter notebooks that demonstrate some of the functions in the package.  I'm planning to continue to add to the package, and I'm very open to pull requests with corrections and additions.   This is m