Some Notes From the 2018 RDKit UGM

Last week I had the pleasure of attending the RDKit User Group meeting in Cambridge, UK.  This was my first RDKit UGM, and it was great.  I had the opportunity to catch up with a lot of people I hadn’t seen for a while and learned about a lot of exciting Open Source Cheminformatics. In this post, I’ve tried to summarize some of what took place and to present some links to relevant software and literature.  This won’t be a complete recitation of everything that took place, but hopefully, it will provide an overview for those who’d like to dig deeper.  I’ll link the slide decks as they become available.  Please let me know if I’ve missed or misinterpreted anything.


Slides from the meeting are available in GitHub https://github.com/rdkit/UGM_2018

Wednesday, September 19th

Greg Landrum, KNIME/T5 Informatics, Welcome and Intro (slides)
Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.
  • C++ code has been modernized to C++ 14, greatly simplifying things like iteration
  • New sensible defaults for many functions
  • Coordinate generation code from Schrödinger for prettier depictions
  • Code to depict fingerprint bits
  • A JSON-based format for interchange between programs
  • New 3D descriptors
  • SVG rendering with chemical metadata
Pat Lorton, Schrödinger, Similarity searching a billion compounds in real time 
Pat showed a very nice open source implementation of fingerprint similarity searching on a GPU.
Sereina Riniker, ETH, News from 3D (new features of the conformer generator)
Sereina Described some enhancements to Experimental Torsion Knowledge Distance Geometry (ETKDG) conformer generation method
Alpha Lee, Cambridge University, Uncertainty quantification with Bayesian molecular deep learning (slides)
Alpha discussed ways of dealing with uncertainty in molecular deep learning
Paulo Tosco, Cresset, Lightning Talk
Paulo showed some nice examples of how the RDKit can be seamlessly integrated with Cresset’s Flare toolkit, which is accessible from Python
Tim Dudgeon, Informatics Matters, Lightning Talk
Tim showed a couple of things, the first was Squonk, which appears to be a Jupyter (IPython) Notebook on steroids, definitely worth a look
Tim also described a graph database implementation of a fragment network similar to the one published by Astex
Susan Leung, GSoC RDKit MolVS Integration Project (slides)
Susan discussed her work on a Google Summer of Code project to integrate some of the features from the MolVS virtual screening toolkit into the RDKit
Boran Adas, RDKit's new fingerprint generators (slides)
Boran discussed another Google Summer of Code project designed to unify the many different fingerprint implementations and interfaces in the RDKit under a single consistent framework.  This work should also simplify the addition of new fingerprint types.

Nicholas Firth, Evariste Technologies, Multiparameter optimization using RDKit and scipy: what's the chance of success? (slides)
Nicholas described some of the approaches taken toward quantitative drug design at Evariste Technologies
Marina Fedorova, Novo Nordisk, Computational tools for metabolic pathways prediction (slides)
Marina provided an overview of computational tools being designed to optimize the yield of biosynthetic pathways
  • Building genome-scale models of metabolic pathways
  • Reactions currently encoded as ChemAxon SMIRKS, some interest in translating to Reaction SMARTS
  • Pathways are traced based on Tanimoto similarity to the target compound
Brian Cole, DE Shaw Research, SMILES-driven de novo design engine 
Brian demonstrated how he could hack SMILES strings as a tool for de-novo design of novel molecules.

Pat Walters, Relay Therapeutics,  A Few (Hopefully) Interesting Open Source Projects Built On The RDKit (slides)
A bunch of stuff that you probably already read about in this blog.

Thursday, September 20th

Joshua Meyers and Matthew Sellwood, BenevolentAI, Rediscovering R-Group Descriptors with RDKit
https://pubs.acs.org/doi/10.1021/ci025589v
Joshua and Matthew described the implementation of R-group descriptors and their subsequent use in identifying bioisosteres
  • The group agreed that there is a need for an open bioisostere database.  Given the availability of software for identifying matched molecular pairs and data available in ChEMBL, it should be possible to collaboratively generate such a database.  
  • Some recent work by the Bajorath group in identifying congeneric series may provide a good starting point. https://www.future-science.com/doi/abs/10.4155/fsoa-2017-0135
Daria Goldmann, KNIME, It's not just for Python! Interacting with chemical data using KNIME and the RDKit (slides)
Daria provided a very nice overview of the many ways that the KNIME platform integrates with the RDKit
Noel O'Boyle, NextMove Software, A de facto standard or a free-for-all? A benchmark for reading SMILES (slides)
Noel described efforts to establish benchmark sets for the parsing and interpretation of SMILES strings
Thomas Evangelidis, CEITEC, Ligand scaffold optimization guided by artificial intelligence
Thomas described his work in combining multiple machine learning methods to build models which
have performed well in a number of recent blind challenges.
Benjamin Tehan + Rob Smith, Heptares, RDKit in the modern Biotech (slides)
Ben and Rob described the many ways that the RDKit is being used at Heptares
Lewis Mervin + Natalia Aniceto, University of Cambridge, In silico protein target prediction with reliability-density neighborhood applicability domain analysis (slides)
Lewis and Natalia described several iterations of the PIDGIN program for target prediction using Random Forests
Marwin Segler, BenevolentAI,  Computer-aided synthesis planning
Marwin described his work on search algorithms for synthesis planning 
Jacob Spiegel, University of Pittsburgh,  Genetic algorithm for de novo computer-aided drug design
Jacob described some recent developments in the AutoGrow program that uses genetic algorithms for de-novo drug design
Andrea Morger, Charité - Universitätsmedizin Berlin, Machine learning and conformal prediction to support in silico toxicology
Andrea described work on using conformal prediction to establish the domain of applicability for predictive models
Lukas Pravda, PDB Europe, The use of RDKit in the Protein Data Bank in Europe to handle small molecules chemistry and display protein-ligand interactions (slides)
Lukas described a number of interesting Python utilities that the European PDB is developing to integrate the processing of proteins and small molecules, many of these tools integrate with the RDKit
Roger Sayle, NextMove Software, Deceptively Simple: How some cheminformatics problems can be more complicated than they appear (slides)
The undisputed king of optimization shows us that few things are as simple as we think they are
  • Calculating molecular weight
  • Counting lines in a text file
  • Determining percentages

A Few Other Interesting Projects

EyeMol: The most interactive molecular dataset manager
https://pikairos.eu/eyemol/

Fast Clustering of Large Datasets
https://github.com/iwatobipen?tab=repositories
https://github.com/fujimizu/bayon

An Open Source molecule editor
Https://github.com/EBjerrum/rdeditor

LiteMol JavaScript PDB viewer
http://webchemdev.ncbr.muni.cz/Litemol/Viewer/


Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Comments

Popular posts from this blog

We Need Better Benchmarks for Machine Learning in Drug Discovery

AI in Drug Discovery 2023 - A Highly Opinionated Literature Review (Part I)

Getting Real with Molecular Property Prediction