Similarity Search and Some Cool Pandas Tricks

In this post, we're going to take a look at molecular similarity searches.  Molecular similarity is central to a lot of what we do in Cheminformatics.  It's important for identifying analogs and understanding SAR.  Molecular similarity is also at the core of many clustering methods that we use to understand datasets or design screening libraries.  

In this example, we'll be using the chemfp package by Andrew Dalke.  Chemfp has both free and paid tiers.  With the free tier, you can perform similarity searches on smaller datasets, like the one we're using here.  For larger datasets, you need to purchase the paid version.  Chemfp is a great package. If you're using it for production drug discovery, you should buy a license.  

In addition to performing searches with chemfp, we'll also go over a few Pandas tricks that will enable us to rapidly process the output from chemfp. 

Here's a link to the tutorial notebook on Google Colab and on GitHub

I'd like to thank Paul Charifson for inspiring this post. 


Popular posts from this blog

AI in Drug Discovery 2022 - A Highly Opinionated Literature Review

Generative Molecular Design - We Need to Raise the Bar

Mining Ring Systems in Molecules for Fun and Profit