Posts

Showing posts from November, 2019

Interactive Plots with Chemical Structures

Image
It’s often useful to be able to associate chemical structures with a set of points on a scatter plot. While it’s easy to do this with commercial software like Spotfire or Vortex, I haven’t found an easy way to integrate an interactive plot like this into a Python script.  In this post, I’ll cover how I was able to generate an interactive scatter plot with about a page of Python code, most of which was boilerplate.  I was able to pull this off by integrating Dash, a Python library for interactive dashboards from the nice folks who brought you the Plotly plotting library, with the RDKit.  For a quick view of what the application does, check out the movie below.   For a higher quality version, try this YouTube link.  Actually, just grab the code from GitHub and run it, the application looks a lot better in real life than it does in the video. 

At this point, this is more of a Saturday afternoon hack than a complete application.  I did this as proof of concept to prove to myself that it …

Visualizing Chemical Space

Image
In many cases in Cheminformatics, we need to be able to create a graphical representation of the chemical space covered by a set of molecules.  In this space, similar molecules will be close together and molecules that are different will be far apart.  As an example, we may have an existing set of screening compounds and we want to see how a new set of compounds we plan to purchase will complement the chemical diversity of the existing collection.  Another common use case is examining the similarity of molecules in a training set to those in a test set.  One of the most common methods of evaluating chemical space coverage is by performing principal component analysis, better known as PCA.  In PCA, sets of correlated variables in a higher dimensional space are combined to produce a set of variables in a lower-dimensional space.  For instance, given 2048 bit fingerprints as a high dimensional representation of a set of molecules, many cheminformaticians will perform PCA to reduce this …