Some Notes From the 2018 RDKit UGM

Last week I had the pleasure of attending the RDKit User Group meeting in Cambridge, UK. This was my first RDKit UGM, and it was great. I had the opportunity to catch up with a lot of people I hadn’t seen for a while and learned about a lot of exciting Open Source Cheminformatics. In this post, I’ve tried to summarize some of what took place and to present some links to relevant software and literature. This won’t be a complete recitation of everything that took place, but hopefully, it will provide an overview for those who’d like to dig deeper. I’ll link the slide decks as they become available. Please let me know if I’ve missed or misinterpreted anything.

Slides from the meeting are available in GitHub https://github.com/rdkit/UGM_2018

Wednesday, September 19th

Greg Landrum, KNIME/T5 Informatics, Welcome and Intro (slides)
Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.

C++ code has been modernized to C++ 14, greatly simplifying things like iteration
New sensible defaults for many functions
Coordinate generation code from Schrödinger for prettier depictions
Code to depict fingerprint bits
A JSON-based format for interchange between programs
New 3D descriptors
SVG rendering with chemical metadata

Pat Lorton, Schrödinger, Similarity searching a billion compounds in real time
Pat showed a very nice open source implementation of fingerprint similarity searching on a GPU.

Reported being able to search 17M compounds in 0.05 seconds
The current implementation should be scalable to 4 billion compounds
https://github.com/schrodinger/gpusimilarity

Sereina Riniker, ETH, News from 3D (new features of the conformer generator)
Sereina Described some enhancements to Experimental Torsion Knowledge Distance Geometry (ETKDG) conformer generation method

Work is underway for better handling of aliphatic rings
Additional optimizations have been added for macrocycle conformation generation
https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654
The method performed well in an evaluation study https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00221

Alpha Lee, Cambridge University, Uncertainty quantification with Bayesian molecular deep learning (slides)
Alpha discussed ways of dealing with uncertainty in molecular deep learning

Referenced Gisbert Schneider’s work on active learning https://www.sciencedirect.com/science/article/pii/S1359644614004735?via%3Dihub
Pointed out that graph convolutions only learn fingerprints for training set molecules

Paulo Tosco, Cresset, Lightning Talk
Paulo showed some nice examples of how the RDKit can be seamlessly integrated with Cresset’s Flare toolkit, which is accessible from Python

https://www.cresset-group.com/tag/flare/

Tim Dudgeon, Informatics Matters, Lightning Talk
Tim showed a couple of things, the first was Squonk, which appears to be a Jupyter (IPython) Notebook on steroids, definitely worth a look

Tim also described a graph database implementation of a fragment network similar to the one published by Astex

Susan Leung, GSoC RDKit MolVS Integration Project (slides)
Susan discussed her work on a Google Summer of Code project to integrate some of the features from the MolVS virtual screening toolkit into the RDKit

A number of features for molecule standardization, validation, separation of salts and charge neutralization will be integrated into the RDKit
https://molvs.readthedocs.io/en/latest/

Boran Adas, RDKit's new fingerprint generators (slides)
Boran discussed another Google Summer of Code project designed to unify the many different fingerprint implementations and interfaces in the RDKit under a single consistent framework. This work should also simplify the addition of new fingerprint types.

Nicholas Firth, Evariste Technologies, Multiparameter optimization using RDKit and scipy: what's the chance of success? (slides)
Nicholas described some of the approaches taken toward quantitative drug design at Evariste Technologies

MOARF - Integrated workflow for multi-objective optimization https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00073
https://pubs.acs.org/doi/10.1021/acs.jcim.8b00376
Referenced an interesting preprint on model evaluation https://arxiv.org/abs/1807.08926

Marina Fedorova, Novo Nordisk, Computational tools for metabolic pathways prediction (slides)
Marina provided an overview of computational tools being designed to optimize the yield of biosynthetic pathways

Building genome-scale models of metabolic pathways
Reactions currently encoded as ChemAxon SMIRKS, some interest in translating to Reaction SMARTS
Pathways are traced based on Tanimoto similarity to the target compound

Brian Cole, DE Shaw Research, SMILES-driven de novo design engine
Brian demonstrated how he could hack SMILES strings as a tool for de-novo design of novel molecules.

Pat Walters, Relay Therapeutics, A Few (Hopefully) Interesting Open Source Projects Built On The RDKit (slides)
A bunch of stuff that you probably already read about in this blog.

Thursday, September 20th

Joshua Meyers and Matthew Sellwood, BenevolentAI, Rediscovering R-Group Descriptors with RDKit
https://pubs.acs.org/doi/10.1021/ci025589v
Joshua and Matthew described the implementation of R-group descriptors and their subsequent use in identifying bioisosteres

The group agreed that there is a need for an open bioisostere database. Given the availability of software for identifying matched molecular pairs and data available in ChEMBL, it should be possible to collaboratively generate such a database.
Some recent work by the Bajorath group in identifying congeneric series may provide a good starting point. https://www.future-science.com/doi/abs/10.4155/fsoa-2017-0135

Daria Goldmann, KNIME, It's not just for Python! Interacting with chemical data using KNIME and the RDKit (slides)
Daria provided a very nice overview of the many ways that the KNIME platform integrates with the RDKit

https://www.knime.com/rdkit

Noel O'Boyle, NextMove Software, A de facto standard or a free-for-all? A benchmark for reading SMILES (slides)
Noel described efforts to establish benchmark sets for the parsing and interpretation of SMILES strings

Thomas Evangelidis, CEITEC, Ligand scaffold optimization guided by artificial intelligence
Thomas described his work in combining multiple machine learning methods to build models which
have performed well in a number of recent blind challenges.

https://drugdesigndata.org/about/grand-challenge-3

Benjamin Tehan + Rob Smith, Heptares, RDKit in the modern Biotech (slides)
Ben and Rob described the many ways that the RDKit is being used at Heptares

Fragment similarity using Open3D Align
Bioisostere searching https://pubs.acs.org/doi/10.1021/ci0503964

Lewis Mervin + Natalia Aniceto, University of Cambridge, In silico protein target prediction with reliability-density neighborhood applicability domain analysis (slides)

Lewis and Natalia described several iterations of the PIDGIN program for target prediction using Random Forests

Marwin Segler, BenevolentAI, Computer-aided synthesis planning

Marwin described his work on search algorithms for synthesis planning

Jacob Spiegel, University of Pittsburgh, Genetic algorithm for de novo computer-aided drug design

Jacob described some recent developments in the AutoGrow program that uses genetic algorithms for de-novo drug design

Andrea Morger, Charité - Universitätsmedizin Berlin, Machine learning and conformal prediction to support in silico toxicology

Andrea described work on using conformal prediction to establish the domain of applicability for predictive models

Lukas Pravda, PDB Europe, The use of RDKit in the Protein Data Bank in Europe to handle small molecules chemistry and display protein-ligand interactions (slides)

Lukas described a number of interesting Python utilities that the European PDB is developing to integrate the processing of proteins and small molecules, many of these tools integrate with the RDKit

https://gitlab.ebi.ac.uk/pdbe/ccdutils

Roger Sayle, NextMove Software, Deceptively Simple: How some cheminformatics problems can be more complicated than they appear (slides)

The undisputed king of optimization shows us that few things are as simple as we think they are

Calculating molecular weight
Counting lines in a text file
Determining percentages

A Few Other Interesting Projects

EyeMol: The most interactive molecular dataset manager
https://pikairos.eu/eyemol/

Fast Clustering of Large Datasets
https://github.com/iwatobipen?tab=repositories
https://github.com/fujimizu/bayon

An Open Source molecule editor
Https://github.com/EBjerrum/rdeditor

LiteMol JavaScript PDB viewer
http://webchemdev.ncbr.muni.cz/Litemol/Viewer/

This work is licensed under a Creative Commons Attribution 4.0 International License.

Search This Blog

Practical Cheminformatics