Similarity Search and Some Cool Pandas Tricks

September 12, 2021

In this post, we're going to take a look at molecular similarity searches. Molecular similarity is central to a lot of what we do in Cheminformatics. It's important for identifying analogs and understanding SAR. Molecular similarity is also at the core of many clustering methods that we use to understand datasets or design screening libraries.

In this example, we'll be using the chemfp package by Andrew Dalke. Chemfp has both free and paid tiers. With the free tier, you can perform similarity searches on smaller datasets, like the one we're using here. For larger datasets, you need to purchase the paid version. Chemfp is a great package. If you're using it for production drug discovery, you should buy a license.

In addition to performing searches with chemfp, we'll also go over a few Pandas tricks that will enable us to rapidly process the output from chemfp.

Here's a link to the tutorial notebook on Google Colab and on GitHub.

I'd like to thank Paul Charifson for inspiring this post.

Search This Blog

Practical Cheminformatics

Similarity Search and Some Cool Pandas Tricks

Comments

Post a Comment

Popular posts from this blog

We Need Better Benchmarks for Machine Learning in Drug Discovery

Some Thoughts on Splitting Chemical Datasets

Comparing Classification Models - You’re Probably Doing It Wrong