SigOpt hosted our first user conference, the SigOpt AI & HPC Summit, on Tuesday, November 16, 2021. It was virtual and free to attend, and you can access content from the event at sigopt.com/summit. For more than any other reason, we were excited to host this Summit to showcase the great work of some of SigOpt’s customers. Today, we share some lessons on design and optimization from a panel of experts in the field of Graph Neural Networks. Sasi Avancha (Intel Labs), Venkatesh Ramanathan (PayPal), and Da Zheng (Amazon Web Services) share best practices for sampling for GNN’s.
Use Cases
One of the fundamental things to think about before diving into Graph Neural Networks, is how the information is actually stored in a data center. You can no longer leverage relational databases like OracleDB. Before you input your data into a database, you need to represent your data using a graph structure. The first thing you need is a foundational graph database to represent the graph structure. Once you have the data in that form, you need to identify the relevant use cases. One such use case from PayPal’s Director of Data Science is money laundering, where the bad actors are transferring money between various accounts. This is a typical graph problem. From here, you need to know how the accounts are linked to each other. For example, there could be two different IP addresses from the Philippines and the United States, and we need to discover if they are linked in some way. To be able to solve this, we need the data to be structured in a graph form.
Before applying GNN, you need to have some kind of intuition about what sort of problem you’re solving. For example, the data is highly unbalanced in a money laundering use case. Less than 1% are bad actors, whereas over 99% are good actors. However, this small number of bad actors account for billions of dollars of loss. So, when you do mini-batching, you cannot merely do random sampling. Random sampling may exclude all of the bad actors. To solve this, we need to apply intelligent sampling techniques. What Venkatesh from PayPal has found in practice is that if possible, try to fit the entire graph in a single machine with a large amount of memory and a lot of CPUs with high horsepower. That way a lot of the problems that Sasi from Intel brings up about communication does not become a bottleneck.
System Design
GNN’s and their associated datasets are examples of massive high performance compute problems. These workloads stress the entire stack from top to bottom, because you not only need to think about the entire workflow, but you also need to think about each component in the workflow. Identifying which components are potential bottlenecks is a critical step. Sasi Avancha from Intel shares his top 4 system design questions that need to be answered to optimize GNN applications.
- What kind of models are you attempting to train? Are you trying to do full batch training or more traditional mini-batch training? That is one important problem to solve.
- After you’ve solved that, we need to ask: How are these graphs going to be partitioned? One method is to use the edge-cut algorithm which will partition the graph along the edges. With this, the graph is partitioned along different instances (i.e., CPU sockets, GPUs). Another method is the vertex-cut algorithm which splits the graphs along the vertices. Otherwise, you can avoid partitioning the graph, and instead partition the model.
- After you’ve partitioned the graph or model, you will want to implement two traditional neural network primitives: aggregation and update. If graphs are partitioned among remote systems, then additional communications problems present themselves. Communication could even become a bottleneck. To ensure that communications do not become a bottleneck, you can implement communication avoidance or communication reduction algorithms.
- Finally, you need to optimize compute on the underlying machines (i.e., CPU socket, GPU), so you can efficiently compute aggregate and update methods.
These are the high level system design challenges when we work on design and optimization for GNN’s.
Scaling Up
From a user perspective, when dealing with a large scale, we need to figure out whether to use full graph training or mini batch training. In the past, the problem of how to sample a mini batch from a graph has been extremely difficult. To start with, full graph training works extremely well on small datasets. However, when we try to scale full graph training onto a very large graph (i.e., millions or billions of nodes), there is a 1,000x slowdown for convergence.
So at the beginning, DGL (Deep Graph Library) chose mini batch training. They started with the most simple mini-batch sampling method, developed by GraphSAGE. It performs node-wise neighbor sampling, so that each time they sample neighbors, they sample neighbors independently in each neighborhood. Then, they construct multiple sub graphs, and wrap all the sub graphs into a larger graph structure. Finally, they can then just run on these sub graphs in the same way that people run full graph training.
DGL allows for any sampling method to be used. Mini batch sampling is an active research area for GNN’s, so future work will result in much higher convergence speeds.
To learn more about GNN design and optimization, I encourage you to watch the talk. In this talk, Sasi Avancha (Intel Labs), Venkatesh Ramanathan (PayPal), and Da Zheng (Amazon Web Services) share their experiences with GNN’s. To see if SigOpt can drive similar results for you and your team, sign up to use it for free.