What is the Open Graph Benchmark (OGB)?

Eddie Mattia
Graph Neural Networks

According to the OGB website, the Open Graph Benchmark is a “collection of realistic large-scale, and diverse benchmark datasets for ML on graphs.” Some of the key features of the OGB project are that it is community-driven, includes many well-constructed datasets, provides standardized dataloading and evaluation tools, and leaderboards that facilitate comparisons of modeling approaches. Leaderboard submissions for different datasets are tagged across important features such as: OGB version, modeling method, use of external data, dataset, links to the source code, hardware, validation performance, and test performance of tuned hyperparameters over 10 random seeds. 

OGB contains datasets for the main types of ML tasks on graphs: Node Prediction, Link Prediction, and Graph Prediction. In a future blog post, we discuss the results of tuning the ogbn-products and ogbn-mag datasets. Both of these are node prediction tasks. 

Node Task Datasets in OGB as of December 7, 2021

For OGBN-products, the task requires building a model to accurately predict a product class for test set nodes in the OGBN-products dataset. The graph data (i.e. the input to the models) consists of nodes, edges, and the connectivity structure of the graph. Nodes have labels of product type and features generated by a bag-of-words extractor applied to product descriptions followed by applying PCA for dimensionality reduction. Edges in the graph represent co-purchasing of products. 

For OGBN-mag, the OGBN-mag dataset is a heterograph – meaning it has multiple types of nodes and edges. OGBN-mag is a subset of the Microsoft Academic Graph. It has four node types – papers, authors, institutions, and fields – and it has four edge types – an author can be affiliated with an institution, an author can write a paper, a paper can cite another paper, and a paper can have a topic in a field. Nodes with type paper are assigned features using word2vec embeddings. The task is to predict the venue (conference or journal) of each paper.

If any of this GNN terminology was confusing, go read our Overview on Graph Neural Networks. To learn more about optimizing GNN performance for OGB, I encourage you to watch our GNN panel with Amazon AI, PayPal, and Intel Labs at the SigOpt Summit. If you’d like to apply hyper parameter optimization to your Graph Neural Networks, sign up to use SigOpt for free.

Eddie Mattia Machine Learning Specialist