New Alpha Feature: Monitor Training Convergence

Ben Hsu, Michael McCourt, and Nicki Vance

Today, we release a new SigOpt feature into alpha testing: Training Monitor. This feature was designed to better empower our neural network developers. By storing information throughout a neural network training we enable users to:

  • monitor progress through the web and API,
  • visualize the metric curve as the neural network converges, and
  • facilitate early stopping when convergence is detected.

Neural networks have proven to be effective in a wide number of applications, including natural language processing, image processing, and biology. The proliferation of data and easier access to high performance compute resources have made neural networks even more broadly applicable, which in turn has created even more interest in their use.

An important component of all of these neural network success stories is the appropriate choice of hyperparameters. A poorly chosen dropout proportion, learning rate or number of nodes per layer can doom a neural network to mediocrity. SigOpt users already know this—they have been training neural networks for years using our core optimization engine, as well as advanced algorithmic features such as Multimetric, to efficiently build high-performing deep learning models.

In a standard setting, the only communication with SigOpt would occur at Suggestion Create and Observation Create (receive suggested parameter assignments with which to train, and report how they performed). Using Training Monitor, a Training Run is created to store data associated with a neural network training.

To report data, we have created new Checkpoint objects, which report progress of the neural network training at intermittent intervals. Checkpointing can occur at any desired interval: every epoch, every 4 epochs, every 11 minutes, every 37 batches, or something else.

The graphic below depicts the the standard workflow.

The next graphic shows the new workflow enabled through the Training Monitor.

Training Runs and Checkpoints provide new opportunities in SigOpt by better supporting the iterative behavior of neural network training. This is clear in a few specific capabilities:

  • The SigOpt website provides new visualizations both for individual training runs and across high performing training runs.
  • Users can define their own sense of training convergence, which SigOpt will monitor and report back during the training.
  • SigOpt’s optimization engine can internalize all the checkpoints to better understand the training progress and convergence behavior.

Below we show examples of the suggestion modal and analysis page of a Training Monitor experiment. This suggestion has completed only 10 checkpoints out of a maximum possible 20 checkpoints and seems to have not yet converged.

The new graphic in the analysis page compares the top 5 performing training runs.

We applied the training monitor feature to the Stanford cars image classification problem, which is viewable through this shared experiment.

Training Monitor experiments are still in alpha testing. If you have comments or questions, or are interested in using training monitor for your neural network experiments, please reach out to our customer success team.

Ben Hsu
Ben Hsu Product Manager
Michael McCourt Research Engineer
Nicki Vance Software Engineer