As Data Science Director at PayPal, Venkatesh Ramanathan is familiar with scaling artificial intelligence (AI). PayPal is one of the largest digital payments companies in the world and manages trillions of transactions per day for its global customer base. Venkatesh and his team spend their time applying AI to solve some of the trickiest challenges for PayPal, generating cost savings, revenue gains and, most importantly, a better customer experience in the process.
Fraud detection is one example of the type of problem that Venkatesh and team can address with AI. Fraud detection is an interesting problem for PayPal, because it is possible to structure the data as a very, very, very large graph. This graph structure makes fraud detection a great use case for graph neural networks.
Graph neural networks and how to optimize them at scale, was the subject of Venkatesh’s talk at the O’Reilly Scaling AI superstream event. We will provide a short summary of his talk in this post but encourage you to watch the talk.
Venkatesh has spent considerable time evaluating graph neural networks and the tradeoffs between different approaches to them. Read this post to learn more about these considerations and how to think about graph neural networks in the context of fraud detection.
To enable graph neural networks, Venkatesh has spent considerable time developing a robust hardware and software stack supporting his team and their efforts. Below is a figure he shared that walks through each component, including hardware, data, software and application layers. PayPal uses SigOpt as part of their standard software stack for tracking experiments and optimizing hyperparameters.
Figure: Technology stack required to scale graph neural network optimization
Venkatesh spent a considerable amount of time discussing the challenges with training graph neural networks. Given the volume of data and complexity of the architecture, graph neural networks can take a long time to train. And finding the optimal hyperparameters can be a critical but tricky component to getting to model convergence in the training process. Complicating things further, in this workflow Venkatesh needs to optimize the graph neural network to generate features that are then combined with other features in an XGBoost model. Both of these models need to be optimized to reach convergence faster and realize the best performance for the model in production.
In this context, Venkatesh discussed the importance of a solution like SigOpt. Venkatesh first trained the graph neural network and XGBoost models to establish a baseline for time to convergence and accuracy. He then applied SigOpt in the same workflow to determine whether it could either accelerate time to convergence or boost accuracy. In his experiments, he found that it drove 40% faster time to convergence with significant accuracy gains.
In his talk, Venkatesh discussed how SigOpt’s proprietary optimization algorithm ensemble is designed to be sample efficient, which reduced the number of iterations (or training runs) required to find the optimal configuration of hyperparameters and reach convergence. Because SigOpt applies an “intelligent” approach to finding this configuration, they were also able to find a hyperparameter configuration that was better performing than the previous baseline configuration Venkatesh discovered when training these models. And by logging all metadata during the process back to the SigOpt dashboard, he had a clean system of record for experiments that could easily be reproduced or used as a starting point for future experimentation.