How PayPal Uses Large Graph Neural Networks to Detect Fraud: Experiment Exchange Episode 4

Nick Payton
CNN, Deep Learning, Fraud Detection, Graph Neural Networks, Knowledge Graphs

How do you detect fraud when less than one percent of your network’s users are bad actors?

In this week’s episode of Experiment Exchange, “How PayPal Uses Large Graph Neural Networks to Detect Bad Actors,” join Michael McCourt as he talks with Venkatesh Ramanathan, a Director of Data Science at PayPal. They discuss the evolution of AI/ML since Venkatesh’s early days at AOL, the importance of robustness in deployed machine learning systems, and some of the challenges of Venkatesh’s work today building GNNs and leading a team responsible for identifying and preventing fraudulent PayPal transactions. 

Below is a transcript from the interview. Subscribe to listen to the full podcast episode, or watch the video interview.

Q: Tell us a little bit about yourself and your history.

I have been in the industry for close to 25 years. I started out as an engineer with a computer science background. I spent about 15 years building large scale backend systems for one of the biggest internet service providers, AOL. I spent quite a bit of time there, and after pursuing my Ph.D. in computer science, I started getting interested in the machine learning space. So with AOL and now about 9 years at PayPal, I have been spending a lot of time on the AI/ML side of things, mainly doing a lot of applied research and focusing on a variety of problems, mainly around fraud. I basically try to build solutions so that we can prevent bad guys from stealing money from us, that’s my goal and I’m using AI/ML technologies to facilitate that. 

Q: Was there any AI/ML going on at AOL while you were there? 

Oh, absolutely! We started mainly around the e-mail side of things, building models to address the anti-spam and anti-phishing domains. We employed naive Bayes classifiers for these use cases. But the machine learning space has evolved. Right now it’s all about deep learning, but prior to that, we used traditional machine learning techniques. Bayesian classifiers were widely used and very successful. In fact, I took most of my Ph.D. work back into AOL to build some of the anti-spam and anti-phishing systems.

Q: Can you talk more about that evolution, about how the AI/ML community has evolved and changed over the years—and what you’re embracing today at PayPal? 

The scale of the data we’re dealing with today is much, much greater, which imposes a lot of challenges—including having the right infrastructure, framework and tools like SigOpt’s hyperparameter optimization. You need to have scalable infrastructure from the ground up to facilitate building robust machine learning models. One of the things with deep learning is that if you have large amounts of data, you don’t need a very complex technique. By applying appropriate toolkits, you can build a very robust model. So that’s fundamentally changed from those days to now.

Something else that changed is that we used to do a lot of human-based feature engineering, and right now it’s more and more machine learning based feature learning as opposed to human feature engineering. That’s fundamentally a revolutionary change from those days to now. 

Q: How has the automation of feature engineering changed the field, both in terms of robustness but also interpretability?

Interpretability is one of the things we lost, actually. When we started pushing for more and more model accuracy and model robustness, we lost the interpretability in the process.

So I’m really glad now that academia and researchers are focused on not just providing model accuracy. Given that deep learning is providing all these robust models, you can have a post hoc, ad hoc, after model interpretive competence built onto the deep learning models. 

Personally, I don’t believe that’s the right approach. Interpretability has to be built from the ground up. There’s a lot of work which is being done in that area.

One of the things which I am passionate about is a graph based approach to machine learning. With graphs, representation is in general much more interpretable compared to non graph data, because in graphs for example, how far you are from somebody who is sending money to you, you can walk the edges of the graph to figure that out.

Q: Could you speak more about Graph Neural Networks?

Essentially, Graph Neural Networks started with a question: how can we facilitate deep learning and deep neural network based architecture to a graph structure representation? The idea is that individual accounts represent what you see in a graph. You can know more about particular specific outcomes by looking at your neighboring outcomes. How can we approximate information from the neighboring nodes to say something about you as an individual? Because who you are communicating with, who you are connected to, is a representation of some way to describe who you are in the system. Graph Neural Networks let us apply neural network based techniques to do exactly that – to describe each node in a graph to say something about another. 

Q: Are these Graph Neural Networks robust today?

They’re robust, they’re interpretable. But I would also say that some problems are better solved only if we leverage a graph form. For example, take the massive explosion of the cryptocurrency and blockchain infrastructure. Those kinds of applications are all actually graphs to start with. If you tried to put the data in a structure like MySQL, you’re not going to have any intelligent insights from that. So some of the newer emerging technologies and applications are better solved if they’re represented as a graph to start with. 

Q: Do you have any other examples of logical situations where Graph Neural Networks should be one of the key building blocks for inference and for decision-making? 

I work on a lot of anti-money laundering problems, looking at how bad guys move money through our network. The only way you can solve that is by representing those transactions in a graph, as a network. So that’s the classic example.

But one of the practical challenges is that scaling graphs is not as trivial as scaling non-graphs. So there’s still a lot of work which needs to be done. I personally believe that the entire infrastructure, including hardware, needs to be optimized and specialized for solving graph processing problems. The software framework needs to be specifically adapted to the Deep Graph Library. Unless and until we have the entire ecosystem setup, we won’t be able to solve practical problems. So we’re definitely not there yet. 

Q: How does a tool like SigOpt play a useful role in the development of these networks? 

When we start building models, there are lots of parameters which need to be estimated and a lot of knobs which require manual tuning. SigOpt’s toolkit helps us automate that because—first of all, there are too many parameters for a human to explore manually. It’s going to take days and months for somebody to build that robust of a model. So that’s one thing, in helping us choose what right parameters we need for building a specific model, SigOpt plays a key role in automating that.

One of the other things is that SigOpt helps us find those optimal setup parameters in a much more intelligent way. Even if a human has unlimited time, we still won’t be able to find those set up parameters like SigOpt does, because we’d only be using traditional methods like grid search. In the beginning, we’re often exploring all kinds of unneeded parameters which are not going to help us build a robust model. SigOpt actually narrows down on the specific set of parameters that are more optimal for the problem we’re trying to solve. 

Q: How important is sample efficiency to you in your development of this network? 

Another thing with scaling is that you need to have effective sampling strategies. The fraud problem is highly unbalanced: the number of bad guys in the system is less than 1% of the entire ecosystem. When you have a highly unbalanced problem, sampling is even more critical. You need to define a representative sample, otherwise you’re not going to build scalable models. With SigOpt, we have an effective sampling strategy which helps us narrow down the optimal set of parameters that are needed, instead of having to search endlessly.

Q: What should people be looking forward to from your group at PayPal in the coming year? What are you guys working on right now? 

Without going too into the details of what we’re working on, the payment space is evolving rapidly. At PayPal, we’ve recently launched pay-with-crypto options. A lot of the tools and ecosystems and protection in place for the traditional domain may not serve well in the crypto domain. This opens the door for bad guys to exploit the system. 

Besides that, I am a firm believer in building a large scale graph machine learning system in order to solve a variety of problems—not just for PayPal but also advancing the state of the art research.

Academia is just focused on small scale problems. My goal is to make those systems scalable and find more practical Graph Neural Network variants. For example, one thing I’m working on is a spatial temporal graph convolutional neural network, a graph that changes with time and also in space. Being able to model both time and space simultaneously will help us solve some of the problems which wouldn’t be possible in the traditional domain. So a lot of exciting things on the algorithmic side of things, but more importantly, I do believe in working with vendors to make sure that the hardware is scalable to meet the large scale of Graph Neural Networks for 2022 and beyond. So it’s a long way there, but that’s something which we are firmly, firmly pushing the industry to push forward.

Q: If people are interested in joining your team, what should they be doing to prepare themselves to be a successful contributor as a data scientist on your team? 

Certainly I think that we need a much bigger curriculum around the AI/ML space. Most of academia is focused around the research side of things, they’re always looking for the newest and the greatest thing, but that may not be practical, right? The advice I would give for ML students is: build solutions which are practical in nature. If it benefits you in your daily life, whatever you are building, then that’s going to be beneficial for lots of other people too. So that’s something to think about: making sure that whatever you’re building, you focus on simplicity, which is more useful than building something very complex and not useful.

But the beauty of being in the data science and math space, regardless of what your undergraduate major is, is it gives you an opportunity to be in this field. This field is looking for interdisciplinary people, not one cookie cutter background. So people with any kind of major can get into this field if you have the inclination and the intuition to look. This is a massively interdisciplinary field, and our group is filled with people from all kinds of backgrounds, which is what we need to build the next generation of data science and AI/ML solutions. 

From SigOpt, Experiment Exchange is a podcast where we find out how the latest developments in AI are transforming our world. Host and SigOpt Head of Engineering Michael McCourt interviews researchers, industry leaders, and other technical experts about their work in AI, ML, and HPC — asking the hard questions we all want answers to. Subscribe through your favorite podcast platform including Spotify, Google Podcasts, Apple Podcasts, and more!

Nick Payton
Nick Payton Head of Marketing & Partnerships