Overview of XGBoost

Eddie Mattia
Gradient Boosting, Hyperparameter Optimization, Intelligent Experimentation

We recently released the XGBoost Integration into SigOpt to make it that much easier to leverage Gradient Boosting on the SigOpt platform. We decided it was only proper to also explain what XGBoost is and why it’s useful.

What is XGBoost?

XGBoost is a machine learning modeling framework for gradient-boosting that was first released by Tianqi Chen in 2014. It contains powerful out-of-the-box optimizations for execution in a distributed compute environment, making its algorithms highly scalable. Moreover, the performance of XGBoost models often compare favorably with neural networks and other common machine learning model types. These factors make XGBoost a popular choice of modeling framework in a wide variety of cases such as data science competitions, benchmarking, and as the main models powering sophisticated production machine learning pipelines.  

Machine Learning Framework Usage

Image Source: Kaggle – State of Machine Learning and Data Science 2021  

What is Gradient Boosting?  

Gradient boosting methods iteratively build an ensemble of weak learners, which are often chosen to be decision trees. Unlike a random forest, the iterative boosting process selects weak learners at each iteration to compensate for the errors made by the previous collection of weak learners.  

In the case of XGBoost, a variant of decision trees called Classification and Regression Trees (CARTs) act as weak learners. The algorithm builds CARTs until a maximum number of trees is built or until there is only a small residual (error) in the model’s predictions. 

Like many machine learning modeling frameworks, the process of training XGBoost models entails selecting hyperparameter values for the models. One such hyperparameter is the maximum number of trees. To see a complete description of the available hyperparameters, please visit the XGBoost API documentation. The best setting for these values depends on the dataset you are trying to train a model to make predictions about, which makes tracking and tuning the hyperparameters a key step in reproducing and optimizing XGBoost models.  

If you’d like to learn more about SigOpt’s XGBoost Integration, then read the official announcement.

Eddie Mattia Machine Learning Specialist

Want more content from SigOpt? Sign up now.