How to Win a Kaggle Competition with Hyper Parameter Optimization

Tobias Andreasen
Artificial Intelligence, CNN, Convolutional Neural Networks

In this blog post we highlight some of the key takeaways from David Austin’s presentation on how to supercharge a 1st place Kaggle solution to higher performance.

David Austin is a Senior Principal Artificial Intelligence Engineer at Intel working on industrial applications within the Internet of Things space. In his spare time, he spends, in his own words, way too much time participating in Kaggle competitions and has since 2018 held the title of grandmaster.


In the presentation David Austin walks though the Iceberg Classifier Challenge, where the participants are asked to classify radar images into either icebergs or ships to improve safety at sea. At the time of the Iceberg Classifier Challenge it was the computer vision challenge with the most participants ever on Kaggle.

Some of the main challenges for this dataset are the limited number of training samples – only around 1600 samples are available – and the fact that not even humans are able to distinguish between the two classes based on the different radar images.



However, even with those challenges David and his partner were able to claim the top spot on the leaderboard and claim the title of grandmaster.

Previous Solution

As David pointed out during the presentation a lot has happened on the AI-front since 2018. The following shows a diagram of their winning solution:

Winning Solution Architecture

Back then they used an ensemble of almost 200 different convolution neural networks (CNN), and with no access to neither an experiment-management tool nor an efficient scheme for optimization, they ended up spending almost 100 hours manually training, random optimizing, testing, and keeping track of the different CNN blocks.

David is also pointing out that no one would ever put a system like that into production, but since it is a challenge the end goal is to produce the architecture with the highest performance score and not something that would run smoothly in production.

Model Optimization

Towards the end David shows how he can go from a baseline accuracy of 87.19% to 91.25% by using SigOpt and six lines of extra code – an improvement equivalent to 400 jumps up the leaderboard.

David uses a single EfficientNet-BO model, instead of almost 200 CNN models, as the baseline and after only 4.5 hours of work he has done the equivalent of jumping 400 spots up the leaderboard by optimizing some of the hyperparameters using SigOpt.

All of this is done in a completely automated fashion, and instead of keeping track of log-files and parameter configurations for each model, this is handled automatically by the SigOpt intelligent experimentation platform, which ends up giving the huge gain in terms of both compute and human time.

Next steps

If you want to learn more about Kaggle and the Iceberg Classifier Challenge, check out the competition website or watch David’s presentation from the SigOpt Summit. To see if SigOpt can drive equivalent results for you and your team, sign up to use it for free.

Tobias Andreasen
Tobias Andreasen Machine Learning Specialist