Similarity Search and Some Cool Pandas Tricks
In this post, we're going to take a look at molecular similarity searches. Molecular similarity is central to a lot of what we do in Cheminformatics. It's important for identifying analogs and understanding SAR. Molecular similarity is also at the core of many clustering methods that we use to understand datasets or design screening libraries.
In this example, we'll be using the chemfp package by Andrew Dalke. Chemfp has both free and paid tiers. With the free tier, you can perform similarity searches on smaller datasets, like the one we're using here. For larger datasets, you need to purchase the paid version. Chemfp is a great package. If you're using it for production drug discovery, you should buy a license.
In addition to performing searches with chemfp, we'll also go over a few Pandas tricks that will enable us to rapidly process the output from chemfp.