Examining the Data From the ChEMBL SARS-CoV-2 Drug Repurposing Screens
One interesting dataset in the ChEMBL 27 release is a compilation of several drug repurposing screens for SARS-CoV-2 . Given recent comments around the lack of consistency in these screens, I was eager to take a look at the data. I thought it might also be interesting to share some of the techniques I used to explore the screening results. It's my hope that readers will find some of the ways I manipulate data useful for their analyses. As usual, all of the code associated with this post is in a Jupyter notebook on GitHub . 1. Getting the data from ChEMBL As a first step, we need to construct an appropriate SQL query to extract the data we want to examine. I'm using MySQL to access the ChEMBL data, not because it's the world's greatest database, but because I've been using it for a long time, and I have all of the necessary commands memorized. I'm not going to use a fancy Object Relational Mapper (ORM); this is a one-off analysis, so plain old SQL is j