Dealing with Model Performance Drift in the Pandemic

Nick Payton
Advanced Optimization Techniques, Deep Learning, Experiment Management, Hyperparameter Optimization, Machine Learning

Did the Pandemic Break Your Models?

If the pandemic broke your models, you are not alone. Models that were trained on pre-pandemic datasets and fine-tuned with pre-pandemic intuition may no longer make relevant predictions. This issue can be found across a variety of industries, including retail, transportation, algorithmic trading, insurance, and recommendations. This is a much more drastic version of a common problem, often referred to as model drift

Now, there are a few tried and true solutions to this model drift problem. One is to reevaluate data strategies, including acquisition of new relevant training data that better reflects the new reality. This data strategy is typically more time intensive, resource intensive, and takes a significant amount of time to establish. A second is retraining and retuning models. This strategy can typically be implemented immediately and continuously. Our customers are approaching this problem with both strategies in mind, implementing retuning immediately to keep as many models online as possible while they execute a longer term data strategy shift. 

The focus of this post is on the benefit of retraining and retuning models in the short term. Retraining and retuning models to avoid model drift is a critical component to any industrialized modeling process. But there are three differences with the current model drift problem that make it uniquely challenging:

  • Models are Broken, not Simply Damaged: Drift used to be incremental, but now in many cases it is significant and abrupt so requires both retraining on data and retuning hyperparameters (instead of just retraining).
  • Most Models are Impacted: Drift used to happen to models at different times, but is now happening to most models at once, which requires a new level of response – ability to retrain and retune all models, not just a few. The implication here is that the scale of the impact, and the scale of the necessary response, are both large. 
  • Managing Drift is a Top Priority: Managing drift was a nice-to-have, but perhaps did not rise to the level of becoming a top priority for most teams. Now the problem is so severe and widespread, it is pushing models offline and reducing their ability to have any impact on the business at all. 

If any of this sounds familiar, then how should you navigate this problem?

Recommendations to Get Models Back on Track

If you have a systematic approach to retraining and retuning models, you are in a good position to overcome this challenge. But how you approach this process is equally important to what you are doing. At SigOpt, we work with some of the top machine learning teams in the world across media, algorithmic trading, enterprise technology, industrials, and financial services. Through these collaborations, we have seen them focus on these priorities to ensure they are maximizing the post-pandemic potential of their models:

  • Analyze Model Behavior: To properly diagnose the drift problem often requires a deeper dive into model behavior. And to do this well often requires executing a variety of training runs while tracking a variety of metrics, and ensuring that you have instrumented your code so you can log a combination of model attributes in the process. SigOpt has built a solution called Experiment Management to make the process of logging these attributes as simple as including a few short lines of code to call our API, and logging in our dashboard to analyze the results of your runs.
  • Track (Many) Metrics: These teams track metrics across all runs in a centralized place – typically the SigOpt dashboard – that can be shared across modelers, teams, and management to gauge offline and predict online performance of these models in response to new data. They also track dozens of metrics, apply metrics as constraints, and optimize for multiple through the retraining and retuning process. This many metrics approach gives them a balanced, broad perspective on their models so they have a good sense of out-of-sample performance before putting them back in production. Teams without this capability today should choose a lightweight API for this purpose that requires only a few lines of code in a coding environment to implement, like our Metric Strategy
  • Use Sample-Efficient HPO Methods: Hyperparameter optimization (HPO) is essential to quickly rebuilding models that perform well in the post-pandemic world. But the most popular methods for HPO are manual, exhaustive, or naive. These typically require thousands of training runs, significant investment of expert time, and considerable access to compute. Conversely, Bayesian optimization is a sample-efficient method for HPO that balances exploration and exploitation as it performs a search. This method will allow you to get the most out of the limited amount of data you have that reflects this new reality. Rather than 1,000s of training runs that grid or random search require, Bayesian optimization can get you better results in 100s of runs. And, in the process, it will ensure you get models in production 8x to 10x faster. SigOpt offers all of these methods in our automated hyperparameter optimization solution, but most of our customers take advantage of our proprietary ensemble of Bayesian and global optimization algorithms to streamline the process. 
  • Maximize Your Compute with Parallel Jobs: At the same time that these teams are investing in sample-efficient HPO methods, they are distributing these training and tuning jobs in parallel to ensure full utilization of their compute. This combination of algorithmic efficiency and job parallelization will give you the fastest wall-clock time to retrain and retune your models – and get them back in production. It also increases the frequency with which you will be able to retrain and retune so it is more of a continuous process. It can be challenging to run Bayesian optimization in parallel (most packages just default to random search), but solutions like SigOpt’s have been designed to solve this problem so you don’t have to waste time rebuilding it yourself. 

We are Here to Help

We have spent the last five years building the most complete solution to meet these needs so you don’t have to figure it out on your own. And we have collaborated with these AI leaders in the process to ensure it reflects and is responsive to the needs of any modeling team, regardless of how demanding their needs. 

But don’t take our word for it. Try our full offering, or join our beta program for recently launched Experiment Management functionality. Or if you’d rather just follow along as we continue to evolve our product, sign up for blog updates.

Nick Payton
Nick Payton Head of Marketing & Partnerships