Researchers Xavier Bouthillier of Inria and Gaël Varoquaux of Mila, surveyed the use of model experimentation methods across NeurIPS 2019 and ICLR 2020, two of the most prestigious international academic ML conferences. Given our work pioneering solutions for hyperparameter optimization and model experimentation in machine learning, we found this survey particularly relevant. Their survey results revealed interesting takeaways regarding the current state of adoption of hyperparameter optimization within these conferences and represent a great snapshot of what leading researchers in industry and academia are doing.
The survey consisted of 10 questions. We’ll dive into specific questions and takeaways, we’ll also dive into the data itself to try to understand different tradeoffs researchers make with the method, complexity, and depth of the experimentation and optimization.
Question: Did you optimize your hyperparameters?
Takeaway: The use of hyperparameter optimization is widespread
More than 86% of NeurIPS and 96% of ICLR papers are empirical rather than theoretical. And of these empirical papers, 80% of NeurIPS and 88% of ICLR applied hyperparameter optimization to their models. This shows the widespread recognition of the value of hyperparameter optimization for work where the best results matter, which parallels the use of hyperparameter optimization we see within our enterprise customers. Just as enterprises use hyperparameter optimization as a critical step to confidently advance the best models into production, researchers use it to confidently submit the best results to their peers at machine learning conferences.
Question: If yes, you did optimize, how did you tune them?
Takeaway: Hyperparameter optimization is still mostly limited to manual and naive methods like grid and random search
Some researchers favor manual tuning of their models to gain more intuition. This can be helpful in the early hypothesis testing experiments of model research and development, but as models grow more complex it is often hard for researchers to balance many competing metrics simultaneously while trying to optimize a high dimensional non-convex space in their head. Diving into the data, 54% (336) of the papers used at least partial manual tuning, while only 39% (241) used exclusively manual tuning.
Grid search and random search are popular methods for automating model optimization because they are simple to understand, use, and implement. While Bergstra, et al. have shown that random search can be much more efficient than grid search, both are typically much less efficient than adaptive methods like Bayesian Optimization. In empirical studies we have shown that an ensemble of Bayesian Optimization methods is typically one or more orders of magnitude faster than grid and random methods.
Only a small portion of researchers used “other” methods to optimize. This could represent a collection of Bayesian Optimization, population/genetic methods, random search variants like hyperband, or other derivative-free methods. Many researchers don’t sink the time into investing into these methods because it is often orthogonal to the research itself and are too time consuming and expensive to set up to justify the compute and time savings in the tuning itself. This is exactly why we started the free SigOpt Academic Program so that researchers can get access to the best optimization and experimentation tools with only a few lines of code wrapped around their model.
Question: How many hyperparameters did you optimize?
Takeaway: Many researchers limit the number of parameters they tune to a handful, although those that use more advanced methods tune more.
From this question we see that a vast majority of researchers limit themselves to 5 or less hyperparameters, often far fewer than the total number of hyper-, architecture, feature, and embedding parameters that exist within complex modeling pipelines. When using a method that scales exponentially with the number of parameters like grid search this makes a lot of sense, you can’t brute force an exponential problem. Diving into the data, researchers who used methods like random or “other” were more than twice as likely to be tuning 5+ hyperparameters, showing that better tools often allow you to tackle bigger and harder problems.
Question: How many trails/experiments in total [sic] during the optimization? (How many different set [sic] of hyperparameters were evaluated)
Takeaway: More sophisticated optimization methods led to more total trials.
Better, more efficient optimization leads to more total trials/experiments. This is often referred to as the Jevens paradox; more efficiency (better optimization methods) leads to more complex usage (more parameters) which ends up countering the efficiency of the method. Modelers who used “other” or random search were more than twice as likely to perform 500+ trials. Getting better results from more complex models can lead to more overall compute. However, depending on the results, this tradeoff can be worth the extra cycles, as it often is with the algorithmic traders with a combined $600B+ assets under management that SigOpt works with for model experimentation and optimization.
It is worth noting that a single researcher that responded to the survey used only manual tuning, but still did 500+ trials/experiments. This brings a whole new perspective to the concept of “grad student descent.” If that researcher is reading this, please look into the free SigOpt Academic Program, research is hard enough without doing 500+ manual hyperparameter iterations.
Final Takeaway: Sophisticated optimization methods are still gaining traction even at the most sophisticated academic conferences. Better tools allow for more tuning, more complex modeling, and often better results.
Try SigOpt out today, especially if you have a conference or journal you’re targeting. It’s free for academic use, and enables you to go beyond grid search and random search to more efficiently and effectively craft a performing model for your research or business application.
Xavier Bouthillier, Gaël Varoquaux. Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020. [Research Report] Inria, Saclay, Ile-de-France. 2020. Ffhal-02447823