Each new observation in the process of black-box optimization assumes there will be infinite possibilities to guess again, typically in multidimensional hyperparameter space. But no data scientist has infinite time to deliver a working model, and similarly, compute resources may never feel sufficient for the tasks at hand. What if you had to optimize a model knowing that you have a fixed number of observations remaining?
In fact, making high quality black-box optimization decisions depends upon the remaining budget. If a data scientist indeed has infinite time to deliver a working model, Bayesian optimization no longer matters. Any simple optimization algorithm, such as random search, will eventually locate the optimal hyperparameters. Conversely, if a data scientist has a budget of one, black-box optimization becomes a fool’s errand; no optimization method, no matter how intelligent, will be able to reliably locate the optimal hyperparameters.
Of course, these are both degenerate edge cases. In the former, anything simple will work well, and in the latter, nothing could possibly work well. What about the in-between? Most modeling problems our customers face lie solidly in the middle of these two extremes. Successful modeling teams typically have a fair amount of compute power available to them, but they also need to deliver on a deadline.
As it turns out, the added constraint of an observation limit, or even a limit in dollar amount or time, greatly increases the complexity of Bayesian optimization. Eric Lee, Research Engineer at SigOpt, wrote his doctoral dissertation on budget-constrained Bayesian optimization to address exactly this problem. But what is the performance benefit of a tailored optimization strategy over a more general one? That depends.
In order to take a “non-myopic” approach to optimization, it makes sense to modify your acquisition function. For example, consider Figure 1. In sequential decision-making, given a one-dimensional search space, you might guess a point in the middle, after which your search strategy becomes inherently asymmetric. That’s OK. But we can do better. If we have a budget of two, then we know that we can in fact, make a symmetric decision instead of an asymmetric one. This allows us to make Bayesian optimization decisions according to a non-myopic acquisition—see the right panel in Figure 1.
You can learn how different acquisition functions might help you optimize certain model types by reading Eric’s dissertation here. In his research, Eric tries out a non-myopic acquisition function on the following common model types, and shows that non-myopic acquisition functions deliver superior hyperparameters in the same budget compared to out-of-the-box BO solutions in the following four model types:
- K-nearest neighbor
- Multi-layer perceptrons (though not “deep” nets)
- Random forests
- Gradient-boosted trees
Note that custom acquisition functions are still—for the moment—too computationally intensive to be of practical use on deep learning models, where expected improvement (EI) still reigns supreme. Even at the corporate level, many teams are actively researching these topics, including Amazon and Facebook (FAIR), as well as MIT in academia. Stay tuned for ways to further enhance your optimization strategy, or if your research is related and you can take advantage of SigOpt, sign up for our academic program here. If you’re interested in reading about prior research on Monte Carlo methods, you can find it here.
Use SigOpt free. Sign up today.