Generative Molecular Design - We Need to Raise the Bar
While it's great that we're now seeing papers describing the experimental validation of generative algorithms for molecular design, we need to consider the significance of these findings and put them into the appropriate context.
Over the last five years, we've seen an explosion in the number of papers describing methods for generative molecular design. The 2018 paper by Gómez-Bombarelli, which launched the field, has already been cited more than 2,100 times. For those unfamiliar with the area, generative molecular design algorithms learn the distributions and associations of chemical functionality from a training set, then sample these distributions to generate new molecules. This molecule generation task can be coupled with one or more scoring functions to generate molecules meeting a specific objective, such as a predicted binding affinity. These methods can be considered similar in spirit to techniques for generating photorealistic images, art, or text that have been widely reported in the popular press. Several excellent reviews on generative molecular design are available for those interested in the field.
While these methods provided an alternate means of generating ideas for new molecules, the results of initial studies in the literature could have been more compelling. Most early papers focused on optimizing simple objectives such as a calculated LogP and/or some computed measure of drug-likeness. As the field progressed, published studies moved toward slightly more realistic objectives typically defined by QSAR models that predicted the binding affinity for a particular biological target. More recently, the field has taken the next step, and several groups have published papers reporting the synthesis and biological testing of molecules designed using generative algorithms. While this is a necessary next step, many reported results would not be considered significant in a drug discovery context.
For example, several papers have reported the application of generative algorithms to the design of kinase inhibitors. Protein kinases are enzymes that catalyze the transfer of a phosphate group from ATP to serine, threonine, tyrosine, or histidine residues. This transfer typically brings about a conformational change that leads to a signal transfer in a cell. Kinases are important targets in several therapeutic areas and have been of particular interest in oncology. As of 2021, there are over 70 FDA-approved drugs that are kinase inhibitors. These drugs are used to treat a variety of conditions, including cancer, autoimmune disorders, and heart disease. Some examples of FDA-approved kinase inhibitors include imatinib (Gleevec), dasatinib (Sprycel), and lapatinib (Tykerb).
The fact that data on thousands of published kinase inhibitors is in databases like ChEMBL makes kinase inhibitor design an obvious target for generative algorithms. These molecules' chemical structures and the large sets of associated biological and structural data can be used for training and tuning generative algorithms. So far, this all sounds easy, but there's a catch. Kinases are responsible for a wide array of functions in a cell, and inhibiting the wrong kinase can have disastrous consequences. For example, inhibition of the platelet-derived growth factor receptor (PDGFR) kinases can interfere with blood clotting and cause bleeding in patients. Other side effects of kinase inhibitors may include gastrointestinal problems such as nausea and diarrhea, skin rashes, and fatigue. Some kinase inhibitors can also cause damage to the heart and liver and may increase the risk of infections. It's important to note that the side effects vary depending on the specific kinase inhibitor and the condition being treated.
In kinase inhibitor discovery, teams aim to design a molecule that binds to the kinase of interest and avoids kinases whose inhibition could produce undesired side effects. This concept, selectivity, which is at the core of kinase inhibitor design, has been largely ignored in the current generative molecular design literature. A non-selective inhibitor that binds many kinases is likely to have a side effect profile that precludes its therapeutic utility. There's another aspect to this that makes it even more tricky. Most inhibitors bind the kinase in the orthosteric site where ATP typically binds. By binding more tightly to the kinase than ATP, the inhibitors block the protein's catalytic activity and inhibit its function. Since the catalytic sites in kinases evolved to bind ATP, they are very similar, and designing a selective orthosteric kinase inhibitor is highly challenging.
Is it possible to modify the structure of a promiscuous inhibitor that binds many kinases and transform it into a selective molecule? Yes, but there are only a handful of cases where a team has been able to modify a promiscuous kinase inhibitor to have the desired selectivity profile. Most drug discovery teams prefer to start with a hit that inhibits a small number of kinases and modify the inhibitor structure to impart additional selectivity. Before embarking on an optimization campaign, one typically runs a kinase panel assay to understand a molecule's selectivity profile. These panel assays will typically assess the binding of a molecule to more than 300 representative kinases. The results of these assays, which are often visualized on a tree view of the kinome, estimate the selectivity optimization challenges facing a team. By combining the results of a kinase panel assay with experimental or predicted protein structures, a team can sometimes formulate strategies to avoid kinases whose inhibition would produce undesired side effects.
Selectivity is also a factor when examining the results of cell assays. Some recent papers have shown that molecules designed by generative algorithms can inhibit the growth of cancer cells. While this sounds like a validation of the generative approach, we need to look closely at what this result means and what it doesn't. A proliferation assay tells you that a molecule is reducing the growth of cancer cells; it doesn't tell you why that cell growth is being reduced. Some might ask why this matters. Reducing tumor growth is good, right? Not necessarily. A promiscuous inhibitor that binds many kinases may kill cancer cells, but it will probably also kill healthy cells and give rise to unwanted side effects. There are several ways to assess the promiscuity of kinase inhibitors in cells. In some cases, groups have compared the cellular IC50 (compound concentration required to inhibit 50% of cell growth) of a molecule in a cancer cell line to the IC50 in a non-cancer cell line. While this can provide a small degree of confidence, it's far from definitive. Another approach is to compare the cellular IC50 in a cancer cell line, A, dependent on the target of interest, with the IC50 in a cancer cell line, B, not dependent on the target of interest. If the compound is selective, we should see inhibition in A but not in B. Drug discovery teams will typically compare several cell lines that are or are not dependent on the target of interest to understand the selectivity of a molecule. In addition, teams will try to confirm that the inhibition is "on mechanism" by running additional cell assays to look for a reduction in the downstream substrates of the protein of interest.
After that short diversion, let's return to generative molecular design. We apply generative models to molecular design in drug discovery for two reasons, to obtain starting points for subsequent optimization and to optimize those starting points to meet a particular set of objectives. If we’re going to assess molecules designed to be kinase inhibitors, we can’t simply look at their binding to a single kinase target. Due to the similarity of the ATP sites in kinases, an initial hit in a kinase screen will almost always bind to multiple kinases. To determine that hit’s suitability for optimization, we need to examine its selectivity profile. This is standard practice in any kinase drug discovery project and should also be applied to studies involving generative design for kinase targets.
When reading or reviewing papers describing the application of generative algorithms to molecular design, I’ve been considering a few criteria that I thought might be worth sharing.
1. Code and training data available in a public repository. I’ve written extensively about this topic, so I’ll be brief. For science to advance, others need to be able to build upon and extend your work. While a well-written methods section in a paper is critical, a reference implementation is essential for reproducibility.
2. Contributions of human experts (e.g., medicinal chemists) indicated. The roles of chemists and algorithms in the design of new molecules should be specified. One framework for describing and assessing the contributions of machines and human experts to automated chemical design is described in a 2022 paper by Goldman.
3. A preliminary assessment of novelty. Many papers providing experimental validation of generative algorithms feature familiar kinase scaffolds like aminopyrimidines and aminobenzimidazoles. It’s important to put these discoveries in context and demonstrate that generative algorithms can produce novel molecules. One relatively simple approach to demonstrating novelty is to report the most similar training set molecules to those identified as hits. In addition, a report of the most similar molecules in the ChEMBL database can provide some notion of novelty and potential off-target liabilities.
4. Scientific significance. When assessing the output of generative algorithms, we must carefully consider the result's scientific impact. Authors should be realistic about their claims and indicate the work's limitations. As a reviewer, I ask myself whether a paper describing the same discovery by a team of medicinal chemists would be considered significant. For example, a kinase-focused medicinal chemistry paper without selectivity data would be unlikely to receive positive reviews. Similarly, a paper showing cellular data without mechanistic evidence would probably be rejected.
Some may argue I'm too harsh and the field needs to walk before it can run. I agree, but we need to admit that we're walking. At this point, it's not appropriate to equate generative molecular design to computer programs that play games like chess and go at the level of grandmasters. In molecular design, generative algorithms aren't even playing checkers; they're playing tic-tac-toe.