This is the second of a three-part series covering different practical approaches to hyperparameter optimization. In the first post, we discussed the strengths and weaknesses of different methods. Today we focus on Bayesian optimization for hyperparameter tuning, which is a more efficient approach to optimization, but can be tricky to implement from scratch.
We’ll begin by introducing a few of the differentiators that separate Bayesian optimization from other methods. Then, we’ll provide some straightforward context on a selection of meaningful optimization packages and their varied approaches to implementing Bayesian optimization.
What makes Bayesian optimization different:
Unlike grid search, Bayesian optimization is not exhaustive so it is typically more efficient at discovering an improved outcome, through a balance of exploratory and exploitative search. Meanwhile, unlike random search, Bayesian optimization leverages past results to decide where to sample next.
Bayesian optimization starts by sampling the parameter space broadly (“exploring”), then zooming in on more and more successful regions as it finds better and better values (“exploiting”), all while continuing to sample broadly some portion of the time (trading these off to not “over exploit”). This means that Bayesian optimization will most often be able to discover optima across multimodal functions, with greater resolution of results close to their respective peaks. And with a dynamically chosen balance between exploration and exploitation, it won’t waste your compute resources or additional wall clock time optimizing around a patently inferior parameter set. Here are some open source optimizers that have gained attention in the data science community:
A feature comparison of a number of Bayesian or similar optimization packages:
|Project||Base Algorithm||Parallelism||Advanced Features||GitHub Stars||Last >50 lc Update (Master)|
|BayesOpt||Bayesian||Partial||Conditionals using third-party Optunity library; |
|Optuna||Bayesian + Multiple||Yes||Visualization, early termination, multimetric||2360||6/11/20|
|BoTorch and Ax||Bayesian||Yes||Thresholds, multimetric||1590/|
|HpBandSter||Bayesian||Yes||Hyperband as an option||378||11/11/18|
|SMAC||Bayesian||Yes (shared memory model)||Performance-optimized Random Forest, modularity||497||5/24/20|
|GPyOpt||Bayesian||Yes||Modular acquisition functions||600||3/19/20|
Note that this comparison is by no means exhaustive, and only considers a number of options publicly available and popular today. If you would like your open source optimization package considered for addition to this list, please reach out to [email protected]. All GitHub star counts are updated as of 6/12/2020.
Open source optimization packages are certainly useful for solving specific academic problems, or in cases where the research itself includes modifications or augmentation to the optimizer itself. But as many of these examples demonstrate, progress on any given open source package can stall or lack key features due to lack of interest or changing priorities. SigOpt is continually building new features and solving novel problem spaces for customers, as we continually improve our product to solve real world, enterprise problems. As we add functionality, we also regression-test to ensure that our changes are always improving upon the many areas in which we’ve shown great results in the past.
One of the most challenging aspects of open source optimization packages is that many were designed by experts focused on solving a specific problem, anything from molecular simulation, to image classification, to reinforcement learning. Many of these packages don’t necessarily adapt to handle problems that extend beyond those motivating its creation. In the next post, we’ll present some of the challenges that larger enterprises face as they bring their optimization solution up to scale.