What Do Molecules That Look LIke This Tend To Do?

In this post, we'll take a look at how we can use an Open Source Python library to search the ChEMBL database and investigate the biology associated with compounds similar to a screening hit.  The approach we'll discuss is easy to set up and doesn't require any database installation or configuration.  As usual, the code associated with this post is available on GitHub .  Introduction A question that invariably comes up when examing screening hits is "what do molecules that look like this tend to do?".   This question can come up in a couple of contexts.  You've run a target-based screen, found a compound that's active in a functional assay, and you'd like to identify other targets that the compound could hit.  This might provide a pointer to selectivity assays that could/should be run.  You've run a phenotypic screen, and you'd like some hints on targets that could be responsible for the observed activity.  One approach to answering these quest

A Collection of Things I Frequently Forget How To Do With Seaborn Scatterplots

 This post is just a placeholder so that people can find the notebook I created to show a few tricks with Seaborn scatterplots.  The notebook, which can be found  here , points to this gist.   Note that GitHub can be kind of flakey about displaying Jupyter notebooks.  

Examining the Data From the ChEMBL SARS-CoV-2 Drug Repurposing Screens

 One interesting dataset in the ChEMBL 27 release is a compilation of several drug repurposing screens for SARS-CoV-2 .  Given recent comments around the lack of consistency in these screens, I was eager to take a look at the data.  I thought it might also be interesting to share some of the techniques I used to explore the screening results.  It's my hope that readers will find some of the ways I manipulate data useful for their analyses.  As usual, all of the code associated with this post is in a Jupyter notebook on GitHub .  1. Getting the data from ChEMBL As a first step, we need to construct an appropriate SQL query to extract the data we want to examine.  I'm using MySQL to access the ChEMBL data, not because it's the world's greatest database, but because I've been using it for a long time, and I have all of the necessary commands memorized.  I'm not going to use a fancy Object Relational Mapper (ORM); this is a one-off analysis, so plain old SQL is j

Wicked Fast Cheminformatics with NVIDIA RAPIDS

Graphics Processing Units (GPUs) have revolutionized scientific computing.  Scientists have been using GPUs to achieve significant speed-ups in fields ranging from molecular dynamics to machine learning.  Unfortunately, programming GPUs is a rather painful process that requires considerable expertise. Fortunately for those of us who'd prefer to forgo the travails of CUDA programming, NVIDIA has released the RAPIDS library, which makes it easy to perform a wide array of data science operations on a GPU.  In this post, I'll present a few examples of how we can use RAPIDS to speed-up a few tasks that we commonly perform in Cheminformatics.  As usual, a Jupyter notebook containing all of the code associated with this post is available on GitHub . 2020 -06-23 I made a couple of changes to the code that slightly changed the runtimes and the trustworthiness values for t-SNE.  The conclusions are the same, RAPIDS ROCKS! Installation I've been following RAPIDS since its ini