Exploratory Data Analysis With mols2grid and Bemis-Murcko Frameworks

One of the most common tasks in Cheminformatics is exploratory data analysis (EDA).  Given a new dataset, we often need to rapidly explore the chemistry in a set containing hundreds, or even thousands, of molecules.  One useful technique for EDA is the Bemis-Murcko framework .  This technique, originally published by Guy Bemis and Mark Murcko, provides a simple but elegant means of grouping molecules.  Bemis-Murcko frameworks (also known as scaffolds) are created by successively removing monovalent atoms until only ring atoms and linker atoms remain.  There are a few nuances having to do with the removal of exocyclic double bonds and the maintenance of aromaticity, but the method itself is very easy to understand.  There are two versions of the Bemis-Murcko framework, which are sometimes confused.  In the first version, illustrated in the top row of the figure below, monovalent atoms are removed until only ring atoms and linker atoms remain.  In the second version, a generic framework