Similarity Search and Some Cool Pandas Tricks

In this post, we're going to take a look at molecular similarity searches.  Molecular similarity is central to a lot of what we do in Cheminformatics.  It's important for identifying analogs and understanding SAR.  Molecular similarity is also at the core of many clustering methods that we use to understand datasets or design screening libraries.   In this example, we'll be using the chemfp package by Andrew Dalke.  Chemfp has both free and paid tiers.  With the free tier, you can perform similarity searches on smaller datasets, like the one we're using here.  For larger datasets, you need to purchase the paid version.  Chemfp is a great package. If you're using it for production drug discovery, you should buy a license.   In addition to performing searches with chemfp, we'll also go over a few Pandas tricks that will enable us to rapidly process the output from chemfp.  Here's a link to the tutorial notebook on  Google Colab  and on GitHub .  I'd like

Building a multiclass classification model

 A pointer to the fastpages site. 

Practical Cheminformatics - The Directory

In no particular order, here's a hopefully useful, topical organization of the posts I've written over the past few years. Resources and Reviews A Highly Opinionated List of Open Source Cheminformatics Resources AI in Drug Discovery 2020 - A Highly Opinionated Literature Review Clustering Viewing Clustered Chemical Structures in a Jupyter Notebook Clustering 2.1 Million Compounds for $5 With a Little Help From Amazon & Facebook Self-Organizing Maps - 90s Fad or Useful Tool? (Part 1) Self-Organizing Maps - The Code (Part 2) Molecule Generation Automatic Analog Generation With Common R-group Replacements Predictive Models Predicting Aqueous Solubility - It's Harder Than It Looks Assessing Interpretable Models High-Performance Computing Fast Parallel Cheminformatics Workflows With Dask Wicked Fast Cheminformatics with NVIDIA RAPIDS Databases What Do Molecules That Look LIke This Tend To Do? Adding Chemical Structures to a Recent COVID-19 Drug Repurposing Dataset Filtering

Viewing Clustered Chemical Structures in a Jupyter Notebook

In Cheminformatics, we frequently run into cases where we want to look at leader/follower relationships between chemical structures.  For instance, if we've clustered a set of molecules, we might want to start by looking at a table with one example structure for each cluster.  We'd then like to be able to select one or more "interesting" clusters and drill down to the cluster members.  While this is a frequent workflow, I'm not aware of commercially or freely available tools that do a great job of supporting the exploration of leader/follower relationships with chemical structures.  In this post, we'll look at one way of connecting a couple of open source libraries to view cluster representatives and cluster members.  As usual, the code and data associated with this post are available on GitHub .  Rather than trying to explain this more fully, let's consider an example.  In this example, we'll look at a set of 1,495 drugs from the ChEMBL  database.  If

Automatic Analog Generation With Common R-group Replacements

 Another pointer to the FastPages site

Assessing Interpretable Models

 This is just a pointer to the fastpages blog post . 

Fast Parallel Cheminformatics Workflows With Dask

 This is just a pointer to my new fastpages blog site .