AI in Drug Discovery - A Practical View From the Trenches

It has never been my intent to use this blog as a personal soapbox, but I feel the need to respond to a recent article on AI in drug discovery.  
A recent viewpoint by Allan Jordan in ACS Medicinal Chemistry Letters suggests that we are nearing the zenith of the hype curve for Artificial Intelligence (AI) in drug discovery and that this hype will be followed by an inevitable period of disillusionment.   Jordan goes on to discuss the hype created around computer-aided drug design and draws parallels to current work to incorporate AI technologies in drug discovery.  While the author does make some reasonable points, he fails to highlight specific problems or to define what he means by AI.   This is understandable.  While the term AI is used frequently, most available definitions are still unclear.  Wikipedia defines AI as “ intelligence demonstrated by machines", not a particularly helpful phrase.  We wouldn’t consider a person who can walk around a room without bumping into thing…

Self-Organizing Maps - 90s Fad or Useful Tool? (Part 1)

In this post, I will explain how self-organizing maps (SOMs) work.  In the first part of this post, I'll explain the technological underpinnings of the technique.  If you're impatient and just want to get to the implementation, skip to part 2.

A few years ago I was having a discussion with a computational chemistry colleague and the topic of self-organizing maps (SOMs) came up.   My colleague remarked, "weren't SOMs one of those 90s fads, kind of like Furbys"?  While there were a lot of publications on SOMs in the early 1990s, I would argue that SOMs continue to be a useful and somewhat underappreciated technique.

What Problem Are We Trying to Solve?

In many situations in drug discovery, we want to be able to arrange a set of molecules in some sort of logical order.  This can be useful in a number of cases.
Clustering.  Sometimes we want to be able to put a set of molecules into groups and select representatives from each group.  This may be the case when we only h…

Self-Organizing Maps - The Code (Part 2)

In this post, we will look at examples of how two different open source Python libraries can be used to generate self-organizing maps.  The MiniSom library is great for building SOMs for smaller sets with fewer than 10K molecules.   The Somoclu library can use either a GPU or multiple CPU cores to generate a SOM, so it's well suited to larger libraries.  While Somoclu is a lot faster than MiniSom, installation on non-Linux platforms can require a bit of extra work.

I've provided example use cases for both libraries as Jupyter notebooks.  Hopefully, this will make it easier for readers to experiment with these methods.
MiniSom The MiniSom library is great for generating SOMs for smaller datasets consisting of thousands to tens of thousands of molecules. I found the MiniSom library easy to install on a Mac or a Linux platform.   The MiniSom example notebook can be found here on GitHub.

Here's some benchmarking data using MiniSom.  In the plot below we compare the time require…

My Science/Programming Journey

Note: This post is purely self-indulgent and probably won't be interesting to anyone who is not me.  You have been warned.  

A few recent tweets on programming languages got me thinking about my scientific/programming journey.  A long time ago in a galaxy far away ...
Phase I Varian/Analytichem 1984-1990 When I graduated from college in 1984, I got my first full-time job in science.  I was hired as the head of manufacturing chemistry at a small company called Analytichem International in Harbor City California.  Prior to this, while I was an undergrad at UCSB, I worked part-time for a company called Petrarch Systems synthesizing siloxane polymers for, among other things, medical devices and gas chromatography columns.  This experience doing siloxane chemistry was the reason I got the job at Analytichem, where they were doing similar chemistry on surfaces.  I started work at Analytichem with the intent of having a career as a polymer chemist. I'd had very little experience with…

Assigning Bond Orders to PDB Ligands - The Easy Way

In this post, I'll walk through how we can combine a couple of Open Source software tools to easily and reliably assign bond orders to ligands from protein-ligand complexes from the PDB.  As usual, the associated code is in GitHub
One of the many things that frustrate me about the PDB file format is the absence of bond order information (please don't talk to me about double CONECT records).  Since the bond order information is missing, we typically have to assign the bond orders, either manually or algorithmically.  Anyone who has tried to implement bond order perception from PDB files will tell you that it's a difficult problem.  For a detailed explanation of what's necessary, take a look at this 2001 talk from Roger Sayle.  The problem is confounded by the fact that the geometry of many of the ligands in the PDB is less than ideal.  For a detailed explanation of the many issues with PDB ligand geometries, take a look a Greg Warren's work on creating the IRIDIU…

Some Notes From the 2018 RDKit UGM

Last week I had the pleasure of attending the RDKit User Group meeting in Cambridge, UK.  This was my first RDKit UGM, and it was great.  I had the opportunity to catch up with a lot of people I hadn’t seen for a while and learned about a lot of exciting Open Source Cheminformatics. In this post, I’ve tried to summarize some of what took place and to present some links to relevant software and literature.  This won’t be a complete recitation of everything that took place, but hopefully, it will provide an overview for those who’d like to dig deeper.  I’ll link the slide decks as they become available.  Please let me know if I’ve missed or misinterpreted anything.

Slides from the meeting are available in GitHub

Wednesday, September 19thGreg Landrum, KNIME/T5 Informatics, Welcome and Intro (slides)
Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.
C++ code has been modernized to C++ 14, greatly simplifying thi…