The world’s most important scientific discoveries depend on the ability to simulate real-world scenarios using computational resources. A primary goal is to simulate as close to the real world as possible. Some example use cases include climate forecasting, aircraft engine design, car crash simulations, risk evaluations, options trading, molecular dynamics, genomics, astronomy, and high energy physics.
These applications all have a need for massive computational resources. Some research sites use upwards of 1,000 connected servers (called nodes) that are all used to perform a single action; complete a simulation in as little time as possible. The speed of these computers is mind-boggling. Some of the largest Supercomputers in the world are approaching 1 ExaFLOPS of calculation, which is 10¹⁸ of operations per second. In other words, these supercomputers can complete 1billion operations per second. The number of calculations per second is estimated to be on par with the human brain.
In order to advance the world towards Exascale, Application Engineers, Researchers, and Developers use High Performance Computing (HPC) methodologies to better tune their workloads and applications. Developers often write and optimize their own code which requires a set of tools to compose, compile, optimize, and analyze the code to take advantage of the massive computational power available to them. A key aspect of the Developer’s job is to write applications more efficiently to speed up the time to completion, while also improving the performance of models. To do that, it is the job of the application engineers to find the optimal set of hyperparameters for that simulation – something which will ultimately end up controlling the performance of the simulation.
Historically, the job of finding these parameters has been a very manual process. As a result, Research teams are spending hours, days, or even weeks trying to find the best set of hyperparameters. To help with this, SigOpt has developed a platform for Intelligent Experimentation, which ultimately gives the user the ability to design, explore, and optimize HPC simulations, AI Training, and inference workloads. In this post, we are investigating how SigOpt can be used to improve the productivity of Developers as well as boost the performance of HPC workloads.
Intel offers a comprehensive portfolio of products to help customers improve workload performance of HPC and Artificial Intelligence (AI) applications. Intel uses the SigOpt Intelligent Experimentation platform to design, explore, and optimize HPC workloads. SigOpt provides the ability to efficiently design workloads by storing, tracking, and visualizing relevant information for Developers. SigOpt also provides tools to explore workload behavior and optimize performance of the workload.
Recently, the HPC group showed how SigOpt was able to significantly increase the team’s productivity and improve upon existing baselines. Intel used SigOpt to optimize the Weather Report Forecasting (WRF) workload, which is a mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting needs. This workload was improved by almost 30% over the baseline with 50 SigOpt Runs by optimizing some of the Intel MPI environment parameters. Intel MPI® (Message Passing Interface) is a tool which optimizes and simplifies how programs can utilize multiple computing nodes. The key was finding the right configuration for the TCP/Ethernet message and buffer size and polling for completion. The second workload was OpenFoam®, which saw an improvement of about 14%. OpenFoam® is the leading free open-source software for Computational Fluid Dynamics (CFD).
“In AI model training, it requires a skill set of application engineers to select what the environment values should be. It’s possible to tune it manually but it’s quite difficult and takes a significant amount of time. SigOpt uses Intel-Optimized Bayesian optimization to actually explore the search space and achieve better results faster.”
–Vikram Saletore, Deep Learning & HPC Performance Architect, Intel
Why It Matters
Similar to building AI workloads, building HPC workloads is a scientific process and requires experimentation to get it right. Any time a researcher needs to make decisions around data, metrics, models, or hardware, all these decisions can be thought of as being part of one large experiment. Experimentation is used to better understand the workload, gain insights into how to improve performance, and design a more standardized process.
First, the boost in performance comes from finding the set of parameters that can optimize the performance. Secondly, the productivity gains come from being able to design, explore and optimize the workload with minimal changes to what you are doing today. Third, the standardized approach gives the teams what they need to better collaborate.
How it Works
The SigOpt Intelligent Experimentation platform is designed to be entirely agnostic to modeling framework, task, library, or problem. SigOpt offers a hosted platform which is easily integrated into any workload, both cloud or on-premise, through an easy-to-use API.
As the team is iterating through different model configurations and designing the experiment, all relevant data such as metrics, parameters, metadata, or artifacts is automatically stored on the SigOpt dashboard. SigOpt is privacy-first so no data needs to go onto the SigOpt platform.
After designing the experiment, only a single line of code is changed in order for the user to start using SigOpt’s proprietary optimization algorithms to optimize the workload. This means that the team can focus on designing the workload rather than worrying about finding the best set of hyperparameters. As a result, users can streamline their workflow, compare simulations, and scale their model development processes to increase performance and boost productivity.
If you want to apply SigOpt to your next HPC workload, you can access it for free at sigopt.com/signup.