Similarity Search and Some Cool Pandas Tricks

In this post, we're going to take a look at molecular similarity searches.  Molecular similarity is central to a lot of what we do in Cheminformatics.  It's important for identifying analogs and understanding SAR.  Molecular similarity is also at the core of many clustering methods that we use to understand datasets or design screening libraries.  

In this example, we'll be using the chemfp package by Andrew Dalke.  Chemfp has both free and paid tiers.  With the free tier, you can perform similarity searches on smaller datasets, like the one we're using here.  For larger datasets, you need to purchase the paid version.  Chemfp is a great package. If you're using it for production drug discovery, you should buy a license.  

In addition to performing searches with chemfp, we'll also go over a few Pandas tricks that will enable us to rapidly process the output from chemfp. 

Here's a link to the tutorial notebook on Google Colab and on GitHub



I'd like to thank Paul Charifson for inspiring this post. 


Comments

Popular posts from this blog

AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)

Generative Molecular Design Isn't As Easy As People Make It Look

AI in Drug Discovery - A Highly Opinionated Literature Review (Part II)