mlx_graphs.datasets.OGBDataset

Contents

mlx_graphs.datasets.OGBDataset#

class mlx_graphs.datasets.OGBDataset(name: Literal['ogbn-products', 'ogbn-proteins', 'ogbn-arxiv', 'ogbn-papers100M', 'ogbl-ppa', 'ogbl-collab', 'ogbl-ddi', 'ogbl-citation2', 'ogbl-vessel', 'ogbg-molhiv', 'ogbg-molpcba', 'ogbg-ppa', 'ogbg-code2'], split: Literal['train', 'val', 'test'] | None = None, base_dir: str | None = None)[source]#

Bases: Dataset

Datasets from the Open Graph Benchmark (OGB) collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.

Datasets belongs to three fundamental graph machine learning task categories: predicting the properties of nodes, links, and graphs. Node property prediction datasets consist of a single graph with three additional properties: train_mask, val_mask and test_mask specifying the masks for the train, validation and test splits. Link property prediction datasets also consist of a single graphs with three additional properties: train_edge_index, val_edge_index and test_edge_index, specifying the edges to be considered for training, validation and testing. Graph property prediction datasets consists of multiple graphs. The desired split can be specified via the split arg.

See here for further details and a list of the available datasets with their descriptions

Parameters:
  • name (Literal['ogbn-products', 'ogbn-proteins', 'ogbn-arxiv', 'ogbn-papers100M', 'ogbl-ppa', 'ogbl-collab', 'ogbl-ddi', 'ogbl-citation2', 'ogbl-vessel', 'ogbg-molhiv', 'ogbg-molpcba', 'ogbg-ppa', 'ogbg-code2']) – Name of the dataset

  • split (Optional[Literal['train', 'val', 'test']]) – Split of the dataset to load. Thi parameter has effect only on graph property prediction dataset. If None, the entire dataset is loaded. Defaults to None.

  • base_dir (Optional[str]) – Directory where to store dataset files. Default is in the local directory .mlx_graphs_data/.

Note

ogb needs to be installed to use this dataset

Note

The ogbn-mag, ogbl-wikikg2 and igbl-biokg and the graphs belonging to the largs-scale challenge category are currently not available as they require heterogenous graphs which are not yet supported by mlx-graphs

Example:

from mlx_graphs.datasets.ogb_dataset import OGBDataset

ds = OGBDataset("ogbg-molhiv", split="train")
>>> ogbg-molhiv(num_graphs=32901)