My Response to Peter Kenny's Comments on "AI in Drug Discovery - A Practical View From the Trenches"
As I've said before, my goal is not to use this blog as a soapbox. I prefer to talk about code, but I thought I should respond to Peter Kenney's comments on my post, AI in Drug Discovery - A Practical View From the Trenches. I wanted to just leave this as a comment on Peter's blog. Alas, what I wrote is too long for a comment, so here goes.
Thanks for the comments, Pete. I need to elaborate on a few areas where I may have been unclear.
In defining ML as “a relatively well-defined subfield of AI” I was simply attempting to establish the scope of the discussion. I wasn’t implying that every technique used to model relationships between chemical structure and physical or biological properties is ML or AI.
I should have expanded a bit on the statement that ML is “assigning labels based on data”, a description that I borrowed from Cassie Kozyrkov at Google. I never meant to imply that I was only talking about classification problems. The way I think about it, a numeric value can be considered a label that we are learning. Machine learning is certainly not limited to classification. All of the commonly used ML methods in Cheminformatics, random forest, gradient boosting, SVM, and neural nets support both classification and regression.
You make the assertion that ML may be better for classification than regression, but don't explain why.
"I also have a suspicion that some of the ML approaches touted for drug design may be better suited for dealing with responses that are categorical (e.g. pIC50 > 6 ) rather than continuous (e.g. pIC50 = 6.6)"In my experience, the choice of regression vs classification is often dictated by the data rather than the method. If you have a dataset with 3-fold error and one log of dynamic range, you probably shouldn’t be doing regression. If you have a dataset that spans a reasonable dynamic range and isn’t, as you point out, bunched up at the ends of the distribution, you may be able to build a regression model.
Your argument about the number of parameters is interesting.
"One of my concerns with cheminformatic ML is that it is not always clear how many parameters have been used to build the models (I’m guessing that, sometimes, even the modelers don’t know) and one does need to account for numbers of parameters if claiming that one model has outperformed another. "I think this one is a bit more tricky than it appears. In classical QSAR, many people use a calculated LogP. Is this one parameter? There were scores of fragment contributions and dozens of fudge factors that went into the LogP calculation, how do we account for these? Then again, the LogP parameters aren't adjustable in the QSAR model. I need to ponder the parameter question and how it applies to ML models which use things like regularization and early stopping to prevent overfitting.
I’m not sure I understand your arguments regarding chemical space. You conclude with the statement
“It is typically difficult to perceive structural relationships between compounds using models based on generic molecular descriptors”.One of the most popular descriptors used with machine learning today is the ECFP fingerprint (or its open source cousin the Morgan fingerprint). These are the same descriptors typically used to calculate molecular similarity and establish notions of chemical space. It is relatively easy, and often good practice, to construct a representation such as a self-organizing map to understand the relationships between chemical structures and ML predictions to see if they are consistent with your current understanding of the SAR for a drug discovery project.
I don’t see your assertion that global models are simply collections of local models as heretical. In many cases, we’re using ensemble methods like random forest where the model is a collection of predictors. I also agree that the selection of training and test sets is tricky business. Even splitting a compound set and putting the compounds synthesized earlier in a drug discovery program into the training set and those synthesized later into the test set (time-split cross-validation) does not completely remove bias. As I pointed out in my scaffold hopping post, I’m not a huge fan of dividing training and test sets based on the scaffold. In many cases, different scaffolds may make different interactions in the same binding site. If this is the case, should scaffold A enable you to make a prediction about scaffold B? Again, these factors are going to impact any predictive model, regardless of how it’s constructed.
Validation is a lot harder than it looks. Our datasets tend to contain a great deal of hidden bias. There is a great paper from the folks at Atomwise that goes into detail on this and provides some suggestions on how to measure this bias and to construct training and test sets that limit the bias.
You are correct that determining the applicability domain of ML models is still an open question. This is true of any QSAR model, regardless of how it was constructed. There is a lot more to this than simply asking if a molecule being predicted is similar to a molecule in the training set.
It’s true that early in a drug discovery project we may not have enough data to build an ML model. As I mentioned in my post, one technique which may enable us to apply ML in situations where we have limited data is transfer learning. In transfer learning, we are fine-tuning a model that was trained on a larger set of related data. This technique is widely used in image analysis but has yet to be fully validated in our field. My feeling is that transfer learning will become more relevant when we can generate molecular descriptors that capture the fundamental biophysics of a binding event.
I’m also not advocating ML models as a panacea. As computational chemists, we have a number of tools and techniques at our disposal. Like any good golfer, we should choose the right club at the right time.
I have to disagree with the statement that starts your penultimate paragraph.
“While I do not think that ML models are likely to have significant impact for prediction of activity against primary targets in drug discovery projects, they do have more potential for prediction of physicochemical properties and off-target activity (for which measured data are likely to be available for a wider range of chemotypes than is the case for the primary project targets).”
Lead optimization projects where we are optimizing potency against a primary target are often places where ML models can make a significant impact. Once we’re into a lead-opt effort, we typically have a large amount of high-quality data, and can often identify sets of molecules with a consistent binding mode. In many cases, we are interpolating rather than extrapolating. These are situations where an ML model can shine. In addition, we are never simply optimizing activity against a primary target. We are simultaneously optimizing multiple parameters. In a lead optimization program, an ML model can help you to predict whether the change you are making to optimize a PK liability will enable you to maintain the primary target activity. This said, your ML model will be limited by the dynamic range of the observed data. The ML model won't predict a single digit nM compound if it has only seen uM compounds.
In contrast, there are a couple of confounding factors that make it more difficult to use ML to predict things like off-target activity. In some (perhaps most) cases, the molecules known to bind to an off-target may look nothing like the molecules you’re working on. This can make it difficult to determine whether your molecules fall within the applicability domain of the model. In addition, the molecules that are active against the off-target may bind to a number of different sites in a number of different ways.
At the end of the day, ML is one of many techniques that can enable us to make better decisions on drug discovery projects. Like any other computational tool used in drug discovery, it shouldn’t be treated as an oracle. We need to use these tools to augment, rather than replace, our understanding of the SAR.