Some Notes From the 2018 RDKit UGM
Last week I had the pleasure of attending the RDKit User Group meeting in Cambridge, UK. This was my first RDKit UGM, and it was great. I had the opportunity to catch up with a lot of people I hadn’t seen for a while and learned about a lot of exciting Open Source Cheminformatics. In this post, I’ve tried to summarize some of what took place and to present some links to relevant software and literature. This won’t be a complete recitation of everything that took place, but hopefully, it will provide an overview for those who’d like to dig deeper. I’ll link the slide decks as they become available. Please let me know if I’ve missed or misinterpreted anything.
Slides from the meeting are available in GitHub https://github.com/rdkit/UGM_2018
Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.
Pat showed a very nice open source implementation of fingerprint similarity searching on a GPU.
Sereina Described some enhancements to Experimental Torsion Knowledge Distance Geometry (ETKDG) conformer generation method
Alpha discussed ways of dealing with uncertainty in molecular deep learning
Paulo showed some nice examples of how the RDKit can be seamlessly integrated with Cresset’s Flare toolkit, which is accessible from Python
Tim Dudgeon, Informatics Matters, Lightning Talk
Tim showed a couple of things, the first was Squonk, which appears to be a Jupyter (IPython) Notebook on steroids, definitely worth a look
Susan Leung, GSoC RDKit MolVS Integration Project (slides)
Susan discussed her work on a Google Summer of Code project to integrate some of the features from the MolVS virtual screening toolkit into the RDKit
Boran discussed another Google Summer of Code project designed to unify the many different fingerprint implementations and interfaces in the RDKit under a single consistent framework. This work should also simplify the addition of new fingerprint types.
Nicholas Firth, Evariste Technologies, Multiparameter optimization using RDKit and scipy: what's the chance of success? (slides)
Nicholas described some of the approaches taken toward quantitative drug design at Evariste Technologies
Marina provided an overview of computational tools being designed to optimize the yield of biosynthetic pathways
Brian demonstrated how he could hack SMILES strings as a tool for de-novo design of novel molecules.
Pat Walters, Relay Therapeutics, A Few (Hopefully) Interesting Open Source Projects Built On The RDKit (slides)
A bunch of stuff that you probably already read about in this blog.
https://pubs.acs.org/doi/10.1021/ci025589v
Joshua and Matthew described the implementation of R-group descriptors and their subsequent use in identifying bioisosteres
Daria provided a very nice overview of the many ways that the KNIME platform integrates with the RDKit
Noel O'Boyle, NextMove Software, A de facto standard or a free-for-all? A benchmark for reading SMILES (slides)
Noel described efforts to establish benchmark sets for the parsing and interpretation of SMILES strings
Thomas described his work in combining multiple machine learning methods to build models which
have performed well in a number of recent blind challenges.
Benjamin Tehan + Rob Smith, Heptares, RDKit in the modern Biotech (slides)
Ben and Rob described the many ways that the RDKit is being used at Heptares
https://pikairos.eu/eyemol/
Fast Clustering of Large Datasets
https://github.com/iwatobipen?tab=repositories
https://github.com/fujimizu/bayon
An Open Source molecule editor
Https://github.com/EBjerrum/rdeditor
LiteMol JavaScript PDB viewer
http://webchemdev.ncbr.muni.cz/Litemol/Viewer/
This work is licensed under a Creative Commons Attribution 4.0 International License.
Slides from the meeting are available in GitHub https://github.com/rdkit/UGM_2018
Wednesday, September 19th
Greg Landrum, KNIME/T5 Informatics, Welcome and Intro (slides)Greg provided a bit of history of the RDKit as well as an intro to some of the newer features.
- C++ code has been modernized to C++ 14, greatly simplifying things like iteration
- New sensible defaults for many functions
- Coordinate generation code from Schrödinger for prettier depictions
- Code to depict fingerprint bits
- A JSON-based format for interchange between programs
- New 3D descriptors
- SVG rendering with chemical metadata
Pat showed a very nice open source implementation of fingerprint similarity searching on a GPU.
- Reported being able to search 17M compounds in 0.05 seconds
- The current implementation should be scalable to 4 billion compounds
- https://github.com/schrodinger/gpusimilarity
Sereina Described some enhancements to Experimental Torsion Knowledge Distance Geometry (ETKDG) conformer generation method
- Work is underway for better handling of aliphatic rings
- Additional optimizations have been added for macrocycle conformation generation
- https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00654
- The method performed well in an evaluation study https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00221
Alpha discussed ways of dealing with uncertainty in molecular deep learning
- Referenced Gisbert Schneider’s work on active learning https://www.sciencedirect.com/science/article/pii/S1359644614004735?via%3Dihub
- Pointed out that graph convolutions only learn fingerprints for training set molecules
Paulo showed some nice examples of how the RDKit can be seamlessly integrated with Cresset’s Flare toolkit, which is accessible from Python
Tim Dudgeon, Informatics Matters, Lightning Talk
Tim showed a couple of things, the first was Squonk, which appears to be a Jupyter (IPython) Notebook on steroids, definitely worth a look
- https://squonk.it/
- https://github.com/InformaticsMatters/squonk
- https://github.com/InformaticsMatters/pipelines
Susan Leung, GSoC RDKit MolVS Integration Project (slides)
Susan discussed her work on a Google Summer of Code project to integrate some of the features from the MolVS virtual screening toolkit into the RDKit
- A number of features for molecule standardization, validation, separation of salts and charge neutralization will be integrated into the RDKit
- https://molvs.readthedocs.io/en/latest/
Boran discussed another Google Summer of Code project designed to unify the many different fingerprint implementations and interfaces in the RDKit under a single consistent framework. This work should also simplify the addition of new fingerprint types.
Nicholas Firth, Evariste Technologies, Multiparameter optimization using RDKit and scipy: what's the chance of success? (slides)
Nicholas described some of the approaches taken toward quantitative drug design at Evariste Technologies
- MOARF - Integrated workflow for multi-objective optimization https://pubs.acs.org/doi/abs/10.1021/acs.jcim.5b00073
- https://pubs.acs.org/doi/10.1021/acs.jcim.8b00376
- Referenced an interesting preprint on model evaluation https://arxiv.org/abs/1807.08926
Marina provided an overview of computational tools being designed to optimize the yield of biosynthetic pathways
- Building genome-scale models of metabolic pathways
- Reactions currently encoded as ChemAxon SMIRKS, some interest in translating to Reaction SMARTS
- Pathways are traced based on Tanimoto similarity to the target compound
Brian demonstrated how he could hack SMILES strings as a tool for de-novo design of novel molecules.
Pat Walters, Relay Therapeutics, A Few (Hopefully) Interesting Open Source Projects Built On The RDKit (slides)
A bunch of stuff that you probably already read about in this blog.
Thursday, September 20th
Joshua Meyers and Matthew Sellwood, BenevolentAI, Rediscovering R-Group Descriptors with RDKithttps://pubs.acs.org/doi/10.1021/ci025589v
Joshua and Matthew described the implementation of R-group descriptors and their subsequent use in identifying bioisosteres
- The group agreed that there is a need for an open bioisostere database. Given the availability of software for identifying matched molecular pairs and data available in ChEMBL, it should be possible to collaboratively generate such a database.
- Some recent work by the Bajorath group in identifying congeneric series may provide a good starting point. https://www.future-science.com/doi/abs/10.4155/fsoa-2017-0135
Daria provided a very nice overview of the many ways that the KNIME platform integrates with the RDKit
Noel O'Boyle, NextMove Software, A de facto standard or a free-for-all? A benchmark for reading SMILES (slides)
Noel described efforts to establish benchmark sets for the parsing and interpretation of SMILES strings
- https://nextmovesoftware.com/blog/2018/06/06/can-we-agree-on-the-structure-represented-by-a-smiles-string/
- https://github.com/nextmovesoftware/smilesreading
- https://www.nextmovesoftware.com/products/SMILESBenchmark_ICCS_May2018.pdf
Thomas described his work in combining multiple machine learning methods to build models which
have performed well in a number of recent blind challenges.
Benjamin Tehan + Rob Smith, Heptares, RDKit in the modern Biotech (slides)
Ben and Rob described the many ways that the RDKit is being used at Heptares
- Fragment similarity using Open3D Align
- Bioisostere searching https://pubs.acs.org/doi/10.1021/ci0503964
Lewis Mervin + Natalia Aniceto, University of Cambridge, In silico protein target prediction with reliability-density neighborhood applicability domain analysis (slides)
Lewis and Natalia described several iterations of the PIDGIN program for target prediction using Random Forests
Marwin Segler, BenevolentAI, Computer-aided synthesis planning
Marwin described his work on search algorithms for synthesis planning
- https://www.nature.com/articles/nature25978
- https://www.nature.com/articles/d41586-018-03977-w
- https://onlinelibrary.wiley.com/doi/10.1002/anie.201506101
Jacob Spiegel, University of Pittsburgh, Genetic algorithm for de novo computer-aided drug design
Jacob described some recent developments in the AutoGrow program that uses genetic algorithms for de-novo drug design
Andrea Morger, Charité - Universitätsmedizin Berlin, Machine learning and conformal prediction to support in silico toxicology
Andrea described work on using conformal prediction to establish the domain of applicability for predictive models
- https://pubs.acs.org/doi/abs/10.1021/ci5001168
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5821614/
- https://pubs.rsc.org/en/content/articlelanding/2017/tx/c6tx00252h
Lukas Pravda, PDB Europe, The use of RDKit in the Protein Data Bank in Europe to handle small molecules chemistry and display protein-ligand interactions (slides)
Lukas described a number of interesting Python utilities that the European PDB is developing to integrate the processing of proteins and small molecules, many of these tools integrate with the RDKit
Roger Sayle, NextMove Software, Deceptively Simple: How some cheminformatics problems can be more complicated than they appear (slides)
The undisputed king of optimization shows us that few things are as simple as we think they are
- Calculating molecular weight
- Counting lines in a text file
- Determining percentages
A Few Other Interesting Projects
EyeMol: The most interactive molecular dataset managerhttps://pikairos.eu/eyemol/
Fast Clustering of Large Datasets
https://github.com/iwatobipen?tab=repositories
https://github.com/fujimizu/bayon
An Open Source molecule editor
Https://github.com/EBjerrum/rdeditor
LiteMol JavaScript PDB viewer
http://webchemdev.ncbr.muni.cz/Litemol/Viewer/
This work is licensed under a Creative Commons Attribution 4.0 International License.
Comments
Post a Comment