How (Not) to Get a Job in Science - Part 2 - The Interview

My last post ended up getting a lot more views than the typical code-heavy Cheminformatics stuff that I write.  I thought it might be useful to write a follow-up to share some of my views on the interview process.   Fear not, faithful readers, I haven’t sold out and become a pundit.  Posts with code will resume shortly.  As with the previous post, these are my views, other hiring managers and organizations may see the world differently. My last post laid out some of my thoughts on writing a CV and getting a hiring manager to notice your application.  In this post, we’ll cover the next steps, the phone screen, and the interview.

The Phone Screen
The interview process usually begins with a phone screen.  In some cases, your preliminary phone screen may be with someone from a company’s HR group, in other cases the phone screen may be with the hiring manager.  For me, the phone screen serves a number of purposes.  More than anything, I want to understand how you, the candidate, are going t…

How to (Not) Get a Job in Science

Over the years, I’ve probably looked at thousands of CVs.   In that time, I’ve come across a few themes that I thought might be worth sharing.   Perhaps others will find this useful, maybe I just need to get a few things off my chest. It’s my hope someone will benefit from, or at least be amused by this little rant.  I’m writing this with the caveat that these are my personal opinions and biases.  Following my advice could have disastrous consequences with other organizations.

Show some self-awareness
I have to admit that I find it off-putting when someone claims to be an “expert” in a dozen different areas.  In fact, I don’t like the word, expert.  Tell me what you’ve done, I’ll figure out whether you’re an expert.   If you’ve just finished your Ph.D., you’re not an expert in anything, yet. Be realistic about your experience. Don’t claim to have “7 years of experience in drug discovery" if you’ve just finished a 5-year Ph.D. program and a 2-year postdoc.  People are going to read…

Visualizing Decision Trees

A 2016 paper by Wicker and Cooper, describing a molecular descriptor designed to capture molecular flexibility, popped up on Twitter this week.  This paper reminded me of the power of a simple decision tree.  Decision trees can often provide an efficient way of looking at the relationship between molecular descriptors and experimental data.   They can also provide a means of understanding the relationship between sets of experiments, particularly with pharmacokinetic data.

In this spirit, I thought I'd put together a quick post showing how to build and visualize a decision tree.  This post will also show off a couple of useful Python libraries that I've recently integrated into my workflow. 

In their paper, Wicker and Cooper use a set of 40,541 commercially available molecules, from the ZINC database, to establish a relationship between molecular flexibility and the ability of a molecule to crystallize.  The dataset is divided into two subsets.
“observed to crystallize” - mole…

Interactive Plots with Chemical Structures

It’s often useful to be able to associate chemical structures with a set of points on a scatter plot. While it’s easy to do this with commercial software like Spotfire or Vortex, I haven’t found an easy way to integrate an interactive plot like this into a Python script.  In this post, I’ll cover how I was able to generate an interactive scatter plot with about a page of Python code, most of which was boilerplate.  I was able to pull this off by integrating Dash, a Python library for interactive dashboards from the nice folks who brought you the Plotly plotting library, with the RDKit.  For a quick view of what the application does, check out the movie below.   For a higher quality version, try this YouTube link.  Actually, just grab the code from GitHub and run it, the application looks a lot better in real life than it does in the video. 

At this point, this is more of a Saturday afternoon hack than a complete application.  I did this as proof of concept to prove to myself that it …

Visualizing Chemical Space

In many cases in Cheminformatics, we need to be able to create a graphical representation of the chemical space covered by a set of molecules.  In this space, similar molecules will be close together and molecules that are different will be far apart.  As an example, we may have an existing set of screening compounds and we want to see how a new set of compounds we plan to purchase will complement the chemical diversity of the existing collection.  Another common use case is examining the similarity of molecules in a training set to those in a test set.  One of the most common methods of evaluating chemical space coverage is by performing principal component analysis, better known as PCA.  In PCA, sets of correlated variables in a higher dimensional space are combined to produce a set of variables in a lower-dimensional space.  For instance, given 2048 bit fingerprints as a high dimensional representation of a set of molecules, many cheminformaticians will perform PCA to reduce this …