The Solubility Forecast Index
Introduction Recently, I've seen a number of deep learning models designed to predict the aqueous solubility of drug-like molecules. Despite the advantages brought about by techniques like graph neural networks, I have yet to see a commercial or open-source method that outperforms the venerable Solubility Forecast Index (SFI). I've written about the challenges associated with predicting aqueous solubility before , so I won't revisit that discussion. Needless to say, this is a difficult problem. The SFI, published in 2010 by Alan Hill and Robert Young at GSK, provides a simple, elegant equation for estimating aqueous solubility. SFI = c L og D pH7.4 + #Ar Where c L og D pH7.4 is the calculated partition coefficient of all neutral and ionic species of a molecule between pH 7.4 buffer and an organic phase, and #Ar is the number of aromatic rings. This seems pretty simple and should be easy to calculate. The number of aromatic rings can be trivially calcul