Experiment, Suggestion, Observation: How We Named Our HPO API Objects

Alexandra Johnson
All Model Types, Modeling Best Practices

Background

Hyperparameter optimization, hyperparameter tuning, model tuning, model optimization, and occasionally, model selection, all refer to roughly the same thing: we have a model, the model has hyperparameters, and we want to find “the best” hyperparameters to maximize the performance of our model.

Our goal is to make understanding the value of hyperparameter optimization (HPO) as easy as integrating a few simple API calls. However, this is only hindered by the fact that there seem to be five different names for every concept in the field. Additionally, two frameworks may refer to the same terminology, but each assigns a different meaning to the same term!

My team and I have been building hyperparameter optimization APIs for over four years, and we’ve seen first-hand the value of a consistent and interpretable naming scheme. Our customers consistently give us positive feedback on the design of our API[1], pointing out it is easy to use and set up. We believe that the time and care we put into naming our API fields plays a large role here. As part of our mission to make hyperparameter optimization accessible to modelers everywhere, we want to share our naming choices with other developers of HPO APIs.

Happy Optimizing!

Terminology

Metric

A Metric is a numeric measure of some trait you care about.

A Metric can be something like “accuracy”. The goal of Hyperparameter Optimization is generally to maximize that Metric.

Sometimes, you care about more than one Metric, for example, you may want to balance accuracy and model complexity. We refer to Hyperparameter Optimization with multiple metrics as Multi-Metric Optimization[6].

Experiment

An Experiment encapsulates the process of optimizing one or more Metrics.

For example, you may run an Experiment to find a machine learning model that maximizes accuracy on your dataset.

The name Experiment comes from the field “Design of Experiments”, which was encapsulated in R. A. Fisher’s 1937 book The Design of Experiments[5]. The term is referred to both formally and informally across many hyperparameter optimization projects including Hyperband[2], Spearmint[3], and BayesOpt[4].

Parameters

The Parameters of an Experiment represent inputs that you believe affect the Metric.

For example, a Parameter might be “learning rate”, a continuous value between 0 and 1.

Parameters may have a type, such as Double, Integer, or Categorical. They may also have bounds, such as a minimum or maximum, or a list of enumerated categories.

Parameters are another term that comes from Design of Experiments[5], and is a commonly used term in statistics. Using “parameters” rather than “hyperparameters” may reflect the history of optimization frameworks that were originally designed for use cases other than machine learning.

Additionally, the term “hyperparameter” is long, verbose, and may be overloaded when the optimization method itself contains hyperparameters, as is the case with many Bayesian optimization methods. In the context of HPO APIs, “parameters” seems to be a good shortening of “hyperparameter”. This term is used in BayesOpt[4], and Spearmint[3].

Assignments

While Parameters represent a range of possible inputs, Assignments represent one specific input.

For example, the Assignments might state that “learning rate” should be equal to 0.1.

We found that it was confusing to use Parameters to refer to both concepts, so we broke out the terms with distinct names.

Value

A Value is one result of evaluating your Metric.

For example, you may evaluate the “accuracy” Metric of your model, and find that it is 0.95.

A Value may have associated noise, which we measure with the Standard Deviation. This noise may come from any number of areas, including cross validation, or uncertainty.

Observation

An Observation is the logical encapsulation of Assignments and the resulting Value of each Metric on those assignments.

One Experiment will have many observations. Multi-metric experiments will have one Value for every Metric that is part of an Experiment.

Observation is a commonly used term in scientific literature, and appears many times in “Design of Experiments”. Additionally, the term is referred to in BayesOpt[4], Spearmint[3], and Hyperband[2].

Suggestion

A Suggestion is the logical encapsulation of a set of Assignments to be evaluated by the user.

Our API breaks Suggestions and Observations into two distinct objects. Suggestion and Observation objects are created, read, updated, and destroyed with their own unique sets of endpoints. This API design gives users total control over how they would like to handle each object, which is especially important for users running in distributed systems settings and writing complex system failure handling logic.

A Suggestion is “open” if no Observation has been reported with its id, otherwise it is “closed” if at least one Observation has been reported.

Optimization Loop

This is a three step process that the user repeats throughout the Experiment:

  • Create a Suggestion
  • Evaluate your model on those suggested Assignments
  • Report an Observation

For ease of communication, we generally refer to the “Optimization Loop” when talking about the iterative optimization process. An Experiment that evaluates many Suggestions in parallel is said to have multiple Optimization Loops.

Conclusion

We believe naming is important, but at SigOpt it doesn’t stop with our API objects. We try to weave terms like “assignments”, “observed value”, and “suggest” into our API documentation in a more casual context, and we’ve observed this in other libraries as well. By using specific terms in our API consistently, we help our users build a grounded vocabulary that’s shared across our API documentation, blog posts, research papers, and presentations.

If you are developing your own API, or are working with such tools, we hope that this post helps you understand how we’ve approached the problem of creating a common language. Our goal is to create a common ground, which we hope will create a positive experience for users modeling away on HPO tools.

If you’re interested on working on these problems with us, we’re hiring! Check out our careers page for open positions.

References

[1] SigOpt

[2] Hyperband

[3] Spearmint

[4] BayesOpt

[5] Fisher, R. A. (1937). The Design of Experiments. London, England: Oliver and Boyd.

[6] Multimetric Experiments – SigOpt Docs

AlexandraJohnson
Alexandra Johnson Software Engineer