What can the brain teach us about machine intelligence? Perhaps no organization is in a better position to answer this question than Numenta, founded in 2005 by Jeff Hawkins to explore this very topic. While Jeff explained this work in his book A Thousand Brains, Subutai Ahmad, VP of Research at Numenta, has been applying these concepts to deep learning tasks with SigOpt.
This post summarizes a case study on one of these tasks – designing a novel version of ResNet 50 for image classification. If you want to learn more, we recommend you read the case study or attend Subutai’s talk at the free and virtual SigOpt Summit on November 16th.
Subutai’s goal was to develop a sparse version of ResNet that maintained sufficiently high accuracy when trained on ImageNet. The brain can be considered sparse in that it has many connections, but only utilizes a few of them for any single input. This sparsity, among other attributes, allows it to be highly efficient and robust at the same time. Subutai wanted to evaluate whether the same could apply to neural networks.
Subutai was able to develop a version of ResNet that was 75% sparse and also achieved 77.1% top-1 accuracy, which is a promising step in the right direction for this research. But it was not an easy process. Developing a novel architecture requires much more intensive, iterative and novel approaches to experimentation. Subutai and his Numenta team overcame these challenges by relying on SigOpt. Subutai comments:
“The SigOpt Intelligent Experimentation Platform is easy to implement as a system of record for all of your experiments – across model type, task, or package. But what sets it apart is its capacity to guide experimentation so you can uncover insights on model behavior and develop configurations of models that fit your specific needs.”
These challenges are not unique to Numenta’s case, so exploring them in more detail could help any modeler address roadblocks they confront in their deep learning model training and hyperparameter optimization processes.
Model Exploration and Hyperparameter Selection
Numenta was not taking pre-trained ResNet 50 and doing one-shot learning before putting it in production. They were exploring the architecture and evolving it to be a unique and purpose-designed version of ResNet to meet the particular needs of their experimental process. Most modelers are in a similar situation. They may use a pre-trained model as a starting point, but need to explore it, iterate on it, adjust it, understand it and evolve it with their own data sets and objective metrics during model training. Similarly, once Numenta moved beyond vanilla ResNet, it was quickly essential to select a new configuration of hyperparameters for the model to continue to perform.
This type of exploration and hyperparameter optimization is possible without tooling in place, but is much more productive with a set of best-in-class tools in place. For this job, Numenta relied on SigOpt, which includes a tightly integrated approach to tracking metadata from training runs and managing hyperparameter optimization with just a few lines of code in a seamless user experience. By relying on SigOpt for this workflow, Numenta was able to track all training runs in the SigOpt web dashboard to enable collaboration, apply SigOpt’s advanced experimentation features to explore their modeling problem, and manage hyperparameter optimization so they could find and select the best configuration of hyperparameters in fewer training runs. These tools enabled Numenta to iterate much faster on their modeling problem, understand the ResNet architecture with more depth and gain the insights they needed to design a sparse ResNet that still achieved high accuracy.
An Example of Numenta multitask optimization experiment results in the SigOpt Dashboard
Tool Selection and Implementation
It may be clear that Numenta could benefit from implementing the SigOpt Intelligent Experimentation platform in their workflow, but how did they choose SigOpt? Numenta actually explored a variety of options before selecting SigOpt and had high standards with a variety of requirements that are not easy for a single solution to meet. Numenta is a high-performing modeling team that is capable of implementing most open source relatively easily in their workflow. This means that they can often be flexible in the tools they use for any given project, spinning up what they need as they need it.
They originally took this approach for experimentation, but they quickly realized implementing and maintaining open source for these workflow problems was taking too much time for their team to manage – especially in the case of hyperparameter optimization. For instance, they applied a popular library for managing distributed scheduling of hyperparameter tuning jobs, but continuously experienced infrastructure bugs that meant experiments would take longer and the team had to spend a lot of time debugging.
These problems were so persistent that Numenta considered building their own solution predicated on an open source Bayesian optimization package. But when they scoped it, the time required to build and maintain this system was significant enough that it was worth it to instead run SigOpt through a series of tests to validate whether it would meet their requirements. Ultimately, it was worth it for Numenta to spend this time evaluating SigOpt, because they ended up with a solution they could rely on for their entire experimentation process. Overall, it saved their team significant time in the modeling process and resulted in much faster time to significant outcomes like the sparse version of ResNet 50.
Management of Long Training Runs and Advanced Experimentation
A bit more specific to this particular modeling problem, Numenta also faced a familiar situation to many deep learning engineers – lengthy training runs. Both ImageNet and ResNet 50 are large, so training runs take a non-trivial amount of time. Any way to speed this up will benefit the team by giving them faster insights with fewer computing resources.
Numenta applied a variety of SigOpt features to accelerate the wall-clock time for model training to address this exact problem. They ran hyperparameter optimization jobs in parallel to take advantage of the compute they had available, which was automatically managed by SigOpt with just a line of code. They utilized SigOpt’s proprietary optimizer instead of bringing their own to the SigOpt platform to take advantage of the fact that it was purpose-built to be sample-efficient by applying Bayesian optimization and other global optimization techniques to reduce the number of training runs required to optimize hyperparameters. Finally, they utilized unique advanced experimentation features in SigOpt, such as multitask optimization, which allowed them to train on a fraction of the data in early training runs (to efficiently explore the space) and train on the full dataset in later runs (to exploit promising parts of the space). Between these techniques, Numenta was able to significantly accelerate the wall-clock time for their experiments, learning faster about their model and generating deeper insights through the process.
Although Numenta’s application of brain-based principles to build a novel deep learning architecture may be unique, their workflow challenges are not. If you want to get started addressing similar problems in your workflow, use SigOpt free today by signing up at https://sigopt.com/signup. If you want to learn more about this case and others before trying SigOpt, register for our free and virtual use conference at https://sigopt.com/summit where Subutai and others are giving talks that cut across computer vision, natural language processing, time series forecasting, recommendation systems, physical simulations and more.