Posts

Showing posts from January, 2020

How to (Not) Get a Job in Science

Over the years, I’ve probably looked at thousands of CVs.   In that time, I’ve come across a few themes that I thought might be worth sharing.   Perhaps others will find this useful, maybe I just need to get a few things off my chest. It’s my hope someone will benefit from, or at least be amused by this little rant.  I’m writing this with the caveat that these are my personal opinions and biases.  Following my advice could have disastrous consequences with other organizations.

Show some self-awareness
I have to admit that I find it off-putting when someone claims to be an “expert” in a dozen different areas.  In fact, I don’t like the word, expert.  Tell me what you’ve done, I’ll figure out whether you’re an expert.   If you’ve just finished your Ph.D., you’re not an expert in anything, yet. Be realistic about your experience. Don’t claim to have “7 years of experience in drug discovery" if you’ve just finished a 5-year Ph.D. program and a 2-year postdoc.  People are going to read…

Visualizing Decision Trees

Image
A 2016 paper by Wicker and Cooper, describing a molecular descriptor designed to capture molecular flexibility, popped up on Twitter this week.  This paper reminded me of the power of a simple decision tree.  Decision trees can often provide an efficient way of looking at the relationship between molecular descriptors and experimental data.   They can also provide a means of understanding the relationship between sets of experiments, particularly with pharmacokinetic data.

In this spirit, I thought I'd put together a quick post showing how to build and visualize a decision tree.  This post will also show off a couple of useful Python libraries that I've recently integrated into my workflow. 

In their paper, Wicker and Cooper use a set of 40,541 commercially available molecules, from the ZINC database, to establish a relationship between molecular flexibility and the ability of a molecule to crystallize.  The dataset is divided into two subsets.
“observed to crystallize” - mole…