Live Now! New XGBoost Integration

Eddie Mattia
Company news, Gradient Boosting, Hyperparameter Optimization, Intelligent Experimentation, SigOpt Company News

SigOpt’s XGBoost Integration is now live! This is an enhanced SigOpt API that is dedicated to making the hyperparameter optimization experience for XGBoost users more streamlined, easier to use, and suited to your needs. The new XGBoost-aware API helps you:

  • Reuse your XGBoost configuration in SigOpt
  • Track your XGBoost metrics and metadata
  • Store your XGBoost experiments with SigOpt
  • Tune your XGBoost models within your range
  • Optimize XGBoost using Bayesian Optimization

Let’s walk through what the XGBoost Integration offers and give some quick tips on getting started.

Automatically track XGBoost  

With sigopt.xgboost.run, you can use the same arguments you pass to the xgboost.train Python API. You can use this pseudocode:

arguments = {  
    "dtrain": dtrain,    
    "params": {   
    	"learning_rate": 0.35,  
    	"objective": "binary:logistic"  
    },   
 	"evals": [(dtest, "Test")], 
}   
booster = xgboost.train(**arguments) 
xgb_run = sigopt.xgboost.run(**arguments)

The advantage of passing these arguments to sigopt.xgboost.run like in the above code snippet is that for no additional development overhead you will get access to the logging functionality of SigOpt as part of a model training Run. By default, the integration will log all parameters and a set of pre-configured metrics that are inferred based on what is defined in the dtrain matrix and evals data splits (see XGBoost documentation for more detail on these dataset parameters). Additionally, the default settings log metadata relevant to XGBoost models such as dataset feature importance.

Examples of automatic tracking from app.sigopt.com

In addition to the autologging that takes place as shown above, it is important to note that you can always access these results at app.sigopt.com and do not need to worry about storing them yourself. You can also configure whatever additional metadata, metrics, or parameters you want about log with the run that is return from sigopt.xgboost.run. As you will see in the next section, this also gets you a one-word change away from finding the best xgboost model for your task with an intelligent hyperparameter tuning routine. 

Automatically tune XGBoost  

To extend the auto-logging functionality at the single Run level, sigopt.xgboost.experiment provides a very similar API as the sigopt.xgboost.run case to automatically run hyperparameter tuning. 

arguments = {   
    "dtrain": dtrain,     
    "params": {    
    	"learning_rate": 0.35,   
    	"objective": "binary:logistic"   
    },    
 	"evals": [(dtest, "Test")],  
 	“experiment_config”: {“name”: “Automatic XGBoost Tuning”, “budget”: 10} 
}    
experiment = sigopt.xgboost.run(**arguments)

In the above pseudocode, a SigOpt Experiment will be created and a hyperparameter tuning loop will be executed. If no parameter space is specified in the Experiment configuration, SigOpt will automatically instantiate an appropriate parameter space and metric space inferred from dataset properties. Notice that like the single Run case with sigopt.xgboost.run, we are still able to fix some hyperparameters (for all Runs in the Experiment) while letting SigOpt determine how to parameterize others. 

XGBoost-aware Bayesian Optimization 

Many SigOpt customers have benefited from using XGBoost, and have used SigOpt for tracking and tuning these models. Our research team has taken the learnings we gathered from these individual interactions, performed additional research, and built these insights into the optimization engine that powers the sigopt.xgboost APIs. Using these insights aggregated across many dataset types and modeling circumstances we have produced a set of Experiment strategies that consider the knowledge that SigOpt is tuning an XGBoost model. These assumptions drastically decrease the time it takes to run the hyperparameter optimization process. This leads to finding models that hit quality targets under more aggressive constraints with fewer Runs, less compute costs, and/or in less wall clock time. This research is built into SigOpt backend server, so you don’t need to worry about it at the API level.  

Future Work

In the future we will continue to make SigOpt’s XGBoost Integration easier to use, provide more tracking functionality, and add state-of-the-art Bayesian Optimization research functions behind the sigopt.xgboost.experiment API. Moreover, we are pursuing other integrations with packages like Scikit-learn, PyTorch, and more. Please reach out to [email protected] if you have feedback or requests for future integrations. 

Eddie-Mattia-1
Eddie Mattia Machine Learning Specialist