SigOpt hosted our first user conference, the SigOpt AI & HPC Summit, on Tuesday, November 16, 2021. It was virtual and free to attend, and you can access content from the event at sigopt.com/summit. For more than any other reason, we were excited to host this Summit to showcase the great work of some of SigOpt’s customers. Today, we share some lessons on design, exploration, and optimization from Subutai Ahmad, the VP of Research at Numenta.
Design: Do smaller networks give higher accuracy?
Numenta’s core design goal was to get a high level of accuracy and a high level of sparsity. Subutai did multiple experiments – each with different variations. They wanted to see if smaller, dense networks could work well.
And as you can see in the above table, when they reduced the size of the overall dense network, accuracy started to drop. And they were even able to surpass the network by increasing the size of the sparse network. This is counterintuitive, but it turns out that if you increase the number of neurons in your overall network, then you can actually decrease the number of non-zero weights that are required. There’s some really interesting mathematical properties that come into play. So they were able to create a network that’s about 96% sparse here.
Explore: Explore the parameter space to uncover patterns
After the design phase, they wanted to better understand the hyperparameter space. To do this, they conducted different explorations into the parameters and the ranges. The Parallel Coordinates chart (seen above) is another snapshot from the SigOpt dashboard. From this chart, they uncovered some really interesting patterns among the hyperparameter space. As you can see, there is some clustering in a region of the hyperparameter space. Some of these clusters are performing far better than others.
They were able to gain insights from this and use that for a lot of their training. As you can see from the Best Metrics plot (seen above), the actual number of trials that met both of Numenta’s criteria was a tiny subset. It’s a tiny solution set within this large 10 dimensional hyperparameter space. And among the thousand trials, only four trials actually met their criteria. Through exploration, Numenta was able to locate good trials in this small subset of the hyperparameter space.
Optimize: Scaling up sparse networks with high accuracy
Now we get to the problem of optimization at scale. One common strategy that’s often used is to optimize your hyperparameter space using a small network, and then use the parameters that you find in the small networks to train a large network. So this is cost effective because you’re mostly training small networks, but unfortunately it does not always work. And in particular, with sparse networks, Numenta found that this strategy does not work. The parameters that they got from creating a small network did not always translate to a large network. To resolve this, Numenta used the SigOpt multitask optimization feature. What you can do there is set up a number of different tasks, and SigOpt will allow you to associate a cost with each of these tasks. And then try to optimize which tasks are run with the goal of reducing the overall cost of running these things. So their strategy was to always run a large network, but with a varying number of training steps. So you can see below another screenshot from the SigOpt dashboard.
Subutai set up a three task system. With this, they trained the network for 15 epochs. It is a cost of 0.25 for 30 epics or 1.0 for 60 epics – which is the full cost of training. And by balancing between these different tasks, they were able to achieve state of the art accuracy in a cost effective manner. And what Numenta found is that the hyperparameters for sparse networks were pretty different from the hyperparameters for dense networks. So this was definitely a worthwhile thing to do. This chart (see below) shows how this works in SigOpt.
The gray dots show the runs that were at the lower cost tests, and the blue circles show the trials that were run at the higher cost. And those blue circles clearly have higher accuracy. And you can see that most of the runs were done at the lower cost scenarios. And then once it starts to hone in on some parameter regimes, SigOpt will automatically invoke the higher cost regimes. So again, SigOpt allowed Numenta to achieve state of the art accuracy at scale in a very cost effective manner.
Conclusion
To learn more about how Numenta used the SigOpt Intelligent Experimentation platform to achieve sparsity at scale, I encourage you to watch the talk. To see if SigOpt can drive similar benefits for you and your team, sign up to use it for free.