Tips for Tracking & Analyzing Training Runs

Nick Payton
Experiment Management, Hyperparameter Optimization, Modeling Best Practices

SigOpt partnered with MLconf on a webinar that focused on practical best practices for metrics, training, and hyperparameter optimization. During this discussion, our Head of Engineering Jim Blomo shared a few best practices for metrics, model training, and hyperparameter tuning. In this post, we build on his thoughts with a few practical recommendations for tracking and analyzing training runs during your machine learning process.

Make tracking easy and effortless

It is one thing to agree that tracking is an essential component of rigorous machine learning. But it is quite another to dedicate the time required to do it thoroughly, especially in cases where you are doing so manually, the tooling you use is tough to implement, or you find yourself needing different tools depending on the libraries you are using. All of these are barriers to tracking, and usually result in a situation where modelers find tracking to “expensive” in time and resources. So they iterate without it. SigOpt Runs is designed to be easy to use, automatic, and library agnostic to erode these exact barriers. 

Learn how Jim thinks about the importance of tracking in the workflow.

Track a full set of modeling attributes

It is important to log dataset versions, metrics, parameters, model architectures, machines, code, and a wide variety of other metadata through the training process. Any tooling you use should make it easy to reference each of these attributes without necessarily accessing the underlying object. For example, you should always be able to log a reference to a dataset version without accessing the underlying dataset itself so your data remains private and secure in your own system. Systems should also be flexible enough to allow you to introduce metadata that may be specific to a given modeling problem or even a specific run as you iterate. 

Hear how Jim thinks about dataset versioning, among other modeling attributes.

Introduce checkpoints

Of particular importance for deep learning is understanding model convergence in the training process. It can be tricky to converge your model and even trickier to do so efficiently. Introducing checkpoints allows you to see your model train incrementally so you can get a much better sense of intermediate model performance throughout any given run. This additional information is useful for understanding convergence with greater depth and getting to a converged model more efficiently.

See how Jim logs checkpoints using SigOpt.

Visualize your runs in detail and in comparison to one another

Logging runs is much more useful if you have a way to easily visualize them. Although time consuming, there are plenty of libraries that make it easy to create custom, one-off plots of your runs as you proceed through a machine learning process. But it can be tough to see all of your runs in comparison to one another and in an organized way. As you execute runs with the SigOpt API, our system populates a dashboard with configurable plots, analysis, and comparisons so you don’t have to spend the time putting it together yourself.

See how to explore training comparisons using the SigOpt Dashboard.

These are a few of the insights our team relied upon to build Runs to track training, Experiments to automate hyperparameter optimization, and our Dashboard to visualize these jobs throughout the modeling process. 

If you’re interested in seeing the broader webinar context in which we gathered and discussed these results, watch the recording. If you want to try out the product, join our beta program for free access, execute a run to track your training, and launch an experiment to automate hyperparameter optimization.

Nick Payton
Nick Payton Head of Marketing & Partnerships