Some Notes From the 2018 RDKit UGM

Slides from the meeting are available in GitHub
Wednesday, September 19th
Greg Landrum, KNIME/T5 Informatics, Welcome and Intro (slides)Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.
- C++ code has been modernized to C++ 14, greatly simplifying things like iteration
- New sensible defaults for many functions
- Coordinate generation code from Schrödinger for prettier depictions
- Code to depict fingerprint bits
- A JSON-based format for interchange between programs
- New 3D descriptors
- SVG rendering with chemical metadata
Pat showed a very nice open source implementation of fingerprint similarity searching on a GPU.
- Reported being able to search 17M compounds in 0.05 seconds
- The current implementation should be scalable to 4 billion compounds
Sereina Described some enhancements to Experimental Torsion Knowledge Distance Geometry (ETKDG) conformer generation method
- Work is underway for better handling of aliphatic rings
- Additional optimizations have been added for macrocycle conformation generation
- The method performed well in an evaluation study
Alpha discussed ways of dealing with uncertainty in molecular deep learning
- Referenced Gisbert Schneider’s work on active learning
- Pointed out that graph convolutions only learn fingerprints for training set molecules
Paulo showed some nice examples of how the RDKit can be seamlessly integrated with Cresset’s Flare toolkit, which is accessible from Python
Tim Dudgeon, Informatics Matters, Lightning Talk
Tim showed a couple of things, the first was Squonk, which appears to be a Jupyter (IPython) Notebook on steroids, definitely worth a look
Susan Leung, GSoC RDKit MolVS Integration Project (slides)
Susan discussed her work on a Google Summer of Code project to integrate some of the features from the MolVS virtual screening toolkit into the RDKit
- A number of features for molecule standardization, validation, separation of salts and charge neutralization will be integrated into the RDKit
Boran discussed another Google Summer of Code project designed to unify the many different fingerprint implementations and interfaces in the RDKit under a single consistent framework. This work should also simplify the addition of new fingerprint types.
Nicholas Firth, Evariste Technologies, Multiparameter optimization using RDKit and scipy: what's the chance of success? (slides)
Nicholas described some of the approaches taken toward quantitative drug design at Evariste Technologies
- MOARF - Integrated workflow for multi-objective optimization
- Referenced an interesting preprint on model evaluation
Marina provided an overview of computational tools being designed to optimize the yield of biosynthetic pathways
- Building genome-scale models of metabolic pathways
- Reactions currently encoded as ChemAxon SMIRKS, some interest in translating to Reaction SMARTS
- Pathways are traced based on Tanimoto similarity to the target compound
Brian demonstrated how he could hack SMILES strings as a tool for de-novo design of novel molecules.
Pat Walters, Relay Therapeutics, A Few (Hopefully) Interesting Open Source Projects Built On The RDKit (slides)
A bunch of stuff that you probably already read about in this blog.
Thursday, September 20th
Joshua Meyers and Matthew Sellwood, BenevolentAI, Rediscovering R-Group Descriptors with RDKit
Joshua and Matthew described the implementation of R-group descriptors and their subsequent use in identifying bioisosteres
- The group agreed that there is a need for an open bioisostere database. Given the availability of software for identifying matched molecular pairs and data available in ChEMBL, it should be possible to collaboratively generate such a database.
- Some recent work by the Bajorath group in identifying congeneric series may provide a good starting point.
Daria provided a very nice overview of the many ways that the KNIME platform integrates with the RDKit
Noel O'Boyle, NextMove Software, A de facto standard or a free-for-all? A benchmark for reading SMILES (slides)
Noel described efforts to establish benchmark sets for the parsing and interpretation of SMILES strings
Thomas described his work in combining multiple machine learning methods to build models which
have performed well in a number of recent blind challenges.
Benjamin Tehan + Rob Smith, Heptares, RDKit in the modern Biotech (slides)
Ben and Rob described the many ways that the RDKit is being used at Heptares
- Fragment similarity using Open3D Align
- Bioisostere searching
Lewis Mervin + Natalia Aniceto, University of Cambridge, In silico protein target prediction with reliability-density neighborhood applicability domain analysis (slides)
Lewis and Natalia described several iterations of the PIDGIN program for target prediction using Random Forests
Marwin Segler, BenevolentAI, Computer-aided synthesis planning
Marwin described his work on search algorithms for synthesis planning
Jacob Spiegel, University of Pittsburgh, Genetic algorithm for de novo computer-aided drug design
Jacob described some recent developments in the AutoGrow program that uses genetic algorithms for de-novo drug design
Andrea Morger, Charité - Universitätsmedizin Berlin, Machine learning and conformal prediction to support in silico toxicology
Andrea described work on using conformal prediction to establish the domain of applicability for predictive models
Lukas Pravda, PDB Europe, The use of RDKit in the Protein Data Bank in Europe to handle small molecules chemistry and display protein-ligand interactions (slides)
Lukas described a number of interesting Python utilities that the European PDB is developing to integrate the processing of proteins and small molecules, many of these tools integrate with the RDKit
Roger Sayle, NextMove Software, Deceptively Simple: How some cheminformatics problems can be more complicated than they appear (slides)
The undisputed king of optimization shows us that few things are as simple as we think they are
- Calculating molecular weight
- Counting lines in a text file
- Determining percentages
A Few Other Interesting Projects
EyeMol: The most interactive molecular dataset manager
Fast Clustering of Large Datasets
An Open Source molecule editor
LiteMol JavaScript PDB viewer

This work is licensed under a Creative Commons Attribution 4.0 International License.
Post a Comment