Research on neural networks has become an international and fast-moving academic pursuit in the past several years. Michele Alberti, from the DIVA group at the University of Fribourg in Switzerland, collaborated with researchers in Sweden, Germany, and the UK to demonstrate that spectral initialization of the weights in convolutional neural nets can increase accuracy by an average of 2.2%.
Convolutional neural networks have been applied to image recognition, speech recognition, and even speech synthesis. Typically, these models consist of a large number of weights, sequentially linked by functions such as ReLUs, convolutions, maximization functions, and others. In the past, generally-accepted best practices dictate that a data scientist should randomly initialize all weights in the network, whether it’s a detector or an estimator. However, Alberti’s research demonstrates that using a concerted spectrum of weights yields better outcomes once the model is fully trained.
Choosing hyperparameters is an important task for Alberti’s team, since in their research, they compare and contrast different initialization methods. As Alberti explains, “we cannot afford to guess them and take the risk that an observed phenomenon (in one initialization method or the other) is a byproduct of a poor or sub-optimal hyper-parameters choice. That’s why we rely on SigOpt to find the best set of hyper-parameters for each configuration we compare. Moreover, once this set if found, we run every experiment 20 times to ensure that we capture the real distribution and not a lucky-seed run.”
Meanwhile, other optimization techniques, such as grid or random search have major drawbacks in both terms of speed and reliability of results. While it has been shown empirically that random search provides good results faster than a grid search, there are no guarantees on their optimality. Grid search, on the other hand, is very impractical in a deep-learning scenario due to the very demanding computational and time resources necessary to conduct it thoroughly in such a high-dimensional search space.
From initialization strategies to neural architecture search
Convolutional neural networks are still a rapidly evolving field, and many researchers around the world continually work to refine and reinvent every part of a CNN. Most notably, a lot of effort is devoted to architectural searches and other ways to optimize the overall network efficiency, many of which are possible to parametrize and explore with SigOpt.
In order to enable further iteration on traditional CNN models, Alberti and his team also introduce a novel architectural component that can compute any matrix-transformation. This component can be inserted into a larger network. Therefore, one could envisage a neural architecture search which includes this component, as well as more traditional functions like ReLUs, softmaxes, and dropout layers. We believe this to be a very interesting and promising direction for further investigations following up on Alberti’s work.
Alberti and his team’s preliminary results show that the spectral initialization is not equally useful when applied to all datasets. In particular, it seems that the nature of the dataset (pathology, radiology, and historical images seem to benefit most) and the way the information is distributed in them are the features which mostly affect the presence of significant gains. Evidence shows that medical images seems to benefit the most from this method, so the team hopes to enable fellow researchers who train and classify on these types of data sets to achieve better results.
Some background on the DIVA group, and its platform, DeepDIVA
The Document Image and Voice Analysis Group (DIVA) at the University of Fribourg experienced a familiar challenge for any scientific practitioner: reproducibility. They found this problem was particularly challenging in machine learning and deep learning. So Michele Alberti, Vinaychandran Pondenkandath, Marcel Wursch, Rolf Ingold, and Marcus Liwicki set out to design a combination of best-in-class infrastructure and frameworks capable of facilitating a more reproducible approach to machine learning and deep learning models. They call this solution DeepDIVA.
DeepDIVA consists of four parts:
- High-performance, industry-leading deep learning models written in PyTorch
- Visualization and analysis provided by TensorFlow’s TensorBoard utility
- Model versioning from GitHub
- Hyperparameter optimization provided by SigOpt
“SigOpt is the most advanced and complete solution for experiment management and hyperparameter optimization we have encountered thus far,” researcher Michele Alberti explains. “It enables us to produce robust and reproducible research with reliable results. We built it into our modeling framework, DeepDIVA, as the standard method for experimentation and optimization.”