Our research team at SigOpt has been very fortunate to be able to collaborate with outstanding researchers around the world, including through our academic and internship programs. In our Highlight blog posts, we take the opportunity to feature work by these collaborators. Our goal is to give you a short summary, but you’ll have to read their work to learn more. This post introduces the article Automating Bayesian Optimization with Bayesian Optimization by Gustavo Malkomes and Roman Garnett, appearing in the upcoming NeurIPS 2018 proceedings. Gustavo is a former intern from our research team, who is currently pursuing his Ph. D. with Roman at Washington University in St. Louis.
“The development of some system of a priori distributions suitable for different classes of the function \(f\) is probably the most important problem in the application of [the] Bayesian approximation to the global optimization [problem].”
Močkus is suggesting, and we agree, that model selection is crucial. Bayesian optimization has two key components: the creation of a model for the objective function and the design of an acquisition function that guides the optimization. Acquisition function design has historically received an overwhelming share of attention in Bayesian optimization research. Gustavo and team’s proposed methodology in this article dynamically builds a better model for the objective function throughout optimization.
Model selection is particularly difficult in typical Bayesian optimization scenarios, as only a small amount of training data may be available. As a result, it may be difficult to choose a single model among many possible options as the best choice. For robustness against model misspecification, it may be useful to combine several plausible models for the objective and use them all while acquiring data. The following example illustrates this idea.
Figure 1: Importance of model selection in Bayesian optimization. top left: One model represents the belief about the objective. top right: Custom mixture of models selected by our automated Bayesian optimization represents the belief about the objective. bottom: The acquisition function value (expected improvement) computed using the respective beliefs about the objective. Automated Bayesian optimization will place the next observation at the optimum (highest function value).
In the figure above, we show two instances of Bayesian optimization where our goal is to maximize the (unknown-to-the-methods) red objective function. Both instances use expected improvement as acquisition function. The difference in this new work is how the beliefs are construct about the objective: using a single model (left) or combining several models using the automated Bayesian optimization (ABO) approach (right). We can see that the single model does not capture the nuances of the true function.
In contrast, ABO captures the linear increasing trend of the true function and produces a credible interval which successfully captures the function’s behavior. Consequently, ABO will find the optimum (where the red function is maximized) in the next iteration.
To further understand our automated Bayesian optimization approach, we need to quickly talk about our earlier NeurIPS 2016 work, where Gustavo and team introduced a novel model search algorithm for classic supervised learning tasks. The method is based on “Bayesian optimization in model space”, where we reason about model evidence as a function to be maximized. The result is an automated algorithm for constructing a model of the BO objective function that can quickly design bespoke models for explaining a given data set as well as possible.
In our NeurIPS 2018 work, we adapt our previous (Bayesian optimization) model search to simultaneously perform Bayesian optimization and model selection. We automatically construct custom mixture models of the objective function to more effectively power the optimization process. Our solution is thus using Bayesian optimization to automate Bayesian optimization!
The details of this strategy and recommendations for effective operation are discussed in the paper. At a high level, the automated Bayesian optimization algorithm dynamically builds a collection of models to explain the objective function. The challenge is how to design an efficient model search procedure for an active learning setting, which is the case of Bayesian optimization. As time goes by, we gather more data, and thus previous computations of model evidence can become outdated. Our solution was to attach different levels of confidence to model evidence computations that took place at different time steps, which allows us to softly “remember” (and forget) previous models without requiring excessive recomputation.