Store Visual Artifacts in SigOpt to See the Bigger Modeling Picture

Tobias Andreasen and Barrett Williams
Advanced Optimization Techniques, Company news, Modeling Best Practices, Training & Tuning

When designing models, it’s essential to track as many metrics as you find useful, and ultimately pick one or two to optimize when developing your model. That said, successful data scientists rely on visualization to get a better sense of how their model is performing, troubleshoot any issues, and evolve their modeling project.

Just as you might track F1 score over multiple optimization runs, you might also benefit from a confusion matrix or a Receiver Operating Characteristic (ROC) curve. Sure, it’s helpful to calculate the simple area under that curve, but it can also be helpful to store, track and retrieve multiple ROC curves in order to understand the accuracy-precision tradeoff for multiple classes in the same plot, for example.

Storing Artifacts in SigOpt

SigOpt has always populated a Dashboard with out-of-the-box visualizations of training runs, metrics, checkpoints, and hyperparameter optimization jobs to enable a more robust, reproducible model development process. With a recent product update, we now make it easier for modelers to bring their own artifacts to our platform to store, analyze, and track them along with our out-of-the-box visualizations.

This feature allows you to log images via the Python library PIL, matplotlib, or even a numpy ndarray, straight from your Python code or Jupyter notebook. Whether uploading images during classification to more deeply understand model performance or tracking custom designed plots to explain model behavior, this feature is designed to enable modelers to use SigOpt as a central hub for their modeling process.

How It Works

When optimizing machine learning models the best strategy is to optimize the performance of the model based on some measure of out-of-sample performance. You’ll often compute out-of-sample performance using some set of composite metrics such as Accuracy, F1-score, Log Loss, Receiver Operating Characteristic area, and others. These are all great ways to examine performance across all of our samples and use this information to guide the optimization process.

However, these composite metrics synthesize a lot of information from each of our testing samples into a single number. Furthermore, all of these metrics in one way or another correspond to the probability of some event occurring—for a given sample, these metrics represent the prediction of Class A or Class B. Therefore, it is important, when doing modeling in general and optimization in particular, to remember that with higher probability comes more confidence in the underlying model.

Unfortunately, these probabilities are typically hidden in many composite metrics, as seen in the following simple, hypothetical example:

Two-class accuracy with flat probability dist.Two-class accuracy with smaller certainty spread

Here we let two different models (model A to the left and model B to the right) make predictions based on the same set of unseen data points, and afterwards we show a histogram of the confidence in predicting the correct class. What we end up seeing is that the two models have the exact same Accuracy-score, even though the model B, on the right, never gets the confidence in a single prediction above 80%. Therefore, we would expect that model A, to the left, would perform a lot better in general than model B, to the right, when put into production.

This is a distinction that any optimizer would miss, but something that we as humans are extremely good at discovering when presented with the right information. This is why SigOpt has upgraded our platform to allow you to upload artifacts such as images, plots or dataframes to any SigOpt Run, so that you can go back and analyze some of the best SigOpt Runs even more deeply. For the example below, we have uploaded the above described histogram corresponding to one of our SigOpt Runs:

Visualization of uploaded image artifacts

Here we see how the mass of the distributions are nicely centered around the edges, which means that our model has a lot of confidence in the correct predictions. Furthermore, we are able to store both the confusion matrix and its complementary ROC-curve:

Receiver Operating Characteristic plotConfusion Matrix Plot Artifact

Together, both images give you, the data scientist, even more information about your individual models.

Getting Started

If you want to start storing your custom visualizations or other image artifacts together with your SigOpt Runs, you can read the documentation here to start creating your own unique artifacts. If you’re just getting started with SigOpt, you can sign up here for free access, or if you are doing academic or nonprofit research you plan to publish, join the Academic Program here.

Tobias Andreasen
Tobias Andreasen Machine Learning Specialist
Barrett Williams Product Marketing Lead