Getting Real with Molecular Property Prediction
Introduction If you believe everything you read in the popular press, this AI business is easy. Just ask ChatGPT, and the perfect solution magically appears. Unfortunately, that's not the reality. In this post, I'll walk through a predictive modeling example and demonstrate that there are still a lot of subtleties to consider. In addition, I want to show that data is critical to building good machine learning (ML) models. If you don't have the appropriate data, a simple empirical approach may be better than an ML model. A recent paper from Cheng Fang and coworkers at Biogen presents prospective evaluations of machine learning models on several ADME endpoints. As part of this evaluation, the Biogen team released a large dataset of measured in vitro assay values for several thousand commercially available compounds. One component of this dataset is 2,173 solubility values measured at pH 6.8 using chemiluminescent nitrogen detection (CLND), a technique currently consid...