Using Model Tuning to Beat Vegas

Scott Clark
Applied AI Insights, Machine Learning

Here at SigOpt we think a lot about model tuning and building optimization strategies; one of our goals is to help users get the most out of their Machine Learning (ML) models as quickly as possible. When our last hackathon rolled around I was inspired by some recent articles about using machine learning to make sports bets. For my hackathon project, I teamed up with our amazing intern George Ke and set out to use a simple algorithm and open data to build a model that could predict the best basketball bets to make. We used SigOpt to tune the features and hyperparameters of this model to make it as profitable as possible, hoping to find a winning combination that could beat the house. Is it possible to use optimized machine learning models to beat Vegas? The short answer is yes; read on to find out how.

Broadly speaking, there are three main challenges before deploying a machine learning model. First, you must Extract the data from somewhere, Transform it into a usable state, and then Load it somewhere you can quickly access it (ETL). This stage often requires a lot of creativity and good old-fashioned hacking. Next, you must apply your domain expertise about the problem to build the features and pick the model that will best solve it. Once you have your data and model you must train and tune the model to get it to the best possible state. This is what we will focus on in this post.

It is often completely intractable to tune a model with more than a handful of parameters using traditional methods like grid and random search, because of the curse of dimensionality and how resource-intensive this process is. Model tuning is non-intuitive and orthogonal to the domain expertise required for the rest of the ML process so it is often also prohibitively inefficient to be done by hand. However, with the advent of optimization tools like SigOpt to properly tune models, it is now possible for experts in any field to get the most out of their models quickly and easily. While sometimes in practice this final stage of model building is skipped, it can often mean the difference between making money and losing money with your model, as we see below.

The Bet

We used one of the simplest possible sports bets you can make in Vegas for our experiment, the Over/Under line. This is a bet that the total number of points scored by both teams in a game will be higher, or lower, than some number that Vegas picks. For example, if Vegas says the sum of scores for a game will be 200.5, and the scores totaled to 210, and we bet “over,” then we would be entitled to $100 of winnings for every $110 we bet1, otherwise (if we bet “under” or the score came in lower than 200.5) we would lose our $110 bet. On each game we simulated the same $110 bet (only winning $100 when we choose correctly). We picked NBA games for the experiment both for the wide availability of open statistics2 and because over 1,000 games are played per year, giving us many data points with which to train our model.

The Model

We picked a random forest regression model as our algorithm because it is easy to use and has interesting parameters to tune (hyperparameters)3. 23 different team-based statistics were chosen to build the features of the model4. We did not modify the feature set beyond our initial picks in order to show how model tuning, independent of feature selection, would fare against Vegas. For each of the 23 features we created a slow and fast moving average for both the home and away team. These fast and slow moving averages are tunable feature parameters which we use SigOpt to optimize5. The averages were calculated both for a total number of games and for a number of games of similar type (home games for the home team, away games for the away team). This led us to 184 total features for every game and a total of 7 tunable parameters.

The output of our model is a predicted number of total points scored given the historical statistics of the two teams playing in a given game. If the model predicts a lower score than the Vegas Over/Under line then we will bet under; similarly if the model predicts a higher score we will bet over. We will also let SigOpt tune how “certain” the model needs to be in order for us to make a bet by only simulating a bet when the difference between our prediction and the over-under line is greater than a tunable threshold.

Tuning the Model

We used the ‘00-’14 NBA seasons to train our model (training data), and random subsets of the ‘14-’15 season to evaluate it in the tuning phase (test data). For every set of tunable parameters, we calculated the average winnings (and variance of winnings) that we would have achieved over many random subsets of the testing data. Every evaluation took 15 minutes on a high CPU Linux machine. Note that grid search and random search (traditional approaches to model tuning) would be an impractical way to perform parameter optimization on this problem because the number of required evaluations grows so large with the number of parameters for both methods6. SigOpt takes a linear number of evaluations with respect to the number of parameters in practice. It is worth noting that even though it requires fewer evaluations, SigOpt also tends to find better results than grid and random search. Figure 1 shows how profitability increases with evaluations as SigOpt tunes the model.

Figure 1: Over the course of 100 different train and test evaluations, SigOpt was able to tune our model from losing more than $500 to winning more than $1,000, on average. This value was computed on random subsets of the ‘14-’15 test season, which was not used for training.

Evaluating on Future Data

Once we have used SigOpt to fine tune the model, we want to see how it performs on a holdout dataset that we have never seen before. This is simulating using our model to make bets where the only information is historical information. Since the model was trained and tuned on the ‘00-’15 seasons, we used the first games of the ‘15-’16 season (being played now) to evaluate our tuned model. After simulating 131 total bets over a month, we observe that the SigOpt tuned model would have made $1,550 in profit. An untuned version of this same model racked up $1,020 in losses over the same holdout dataset7. Not only does model tuning with SigOpt make a huge difference, but a simple, well-tuned model can beat the house.

Figure 2: The blue line is cumulative winnings after each day of the SigOpt tuned model. The grey dashed line is the cumulative winnings of the untuned model. The dashed red line is the breakeven line.


We were able to use the power of SigOpt optimization to take a relatively simple model and make it beat Vegas. Can you use a more complicated model to get better results? Can you think of more features to add? Does including individual player stats increase accuracy?

Use SigOpt free. Sign up today.


1. Betting $110 to win $100 is part of the edge that Vegas keeps. This keeps a player from breaking even by picking “over” and “under” randomly. Return
3. We will tune the hyperparameters of n_estimators, min_samples_leaf, and min_samples_split. Return
4. 23 different team level features were chosen: points per minute, offensive rebounds per minute, defensive rebounds per minute, steals per minute, blocks per minute, assists per minute, points in paint per minute, second-chance points per minute, fast break points per minute, lead changes per minute, times tied per minute, largest lead per game, point differential per game, field goal attempts per minute, field goals made per minute, free throw attempts per minute, free throws made per minute, three-point field goals attempted per minute, three-point field goals made per minute, first-quarter points per game, second quarter points per game, third quarter points per game, and fourth quarter points per game. Return
5. The feature parameters included the number of games to look back for the slow and fast moving averages, as well as an exponential decay parameter for how much the most recent games count towards that average (with a value of 0 indicating linear decay), and the threshold for the difference between our prediction and the over-under line required to make a bet. Return
6. Even a coarse grid of width 5 would require 5^7 = 78125 evaluations, taking over 800 days to run sequentially. The coarse width would almost certainly also perform poorly compared to the Bayesian approach that SigOpt takes, for examples see this blog post. Return
Scott Clark, Ph. D.
Scott Clark Co-Founder & Chief Executive Officer

Want more content from SigOpt? Sign up now.