Some Thoughts on Evaluating Predictive Models
I'd like to use this post to provide a few suggestions for those writing papers that report the performance of predictive models. This isn't meant to be a definitive checklist, just a few techniques that can make it easier for the reader to assess the performance of a method or model. As is often the case, this post was motivated by a number of papers in the recent literature. I'll use one of these papers to demonstrate a few things that I believe should be included as well as a few that I believe should be avoided. My intent is not to malign the authors or the work, I simply want to illustrate a few points that I consider to be important. As usual, all of the code I used to create this analysis is in GitHub . For this example, I'll use a recent paper in ChemRxiv which compares the performance of two methods for carrying out free energy calculations. 1. Avoid putting multiple datasets on the same plot , especially if the combined performance is not re