mlx_graphs.datasets.OGBDataset#
- class mlx_graphs.datasets.OGBDataset(name: Literal['ogbn-products', 'ogbn-proteins', 'ogbn-arxiv', 'ogbn-papers100M', 'ogbl-ppa', 'ogbl-collab', 'ogbl-ddi', 'ogbl-citation2', 'ogbl-vessel', 'ogbg-molhiv', 'ogbg-molpcba', 'ogbg-ppa', 'ogbg-code2'], split: Literal['train', 'val', 'test'] | None = None, base_dir: str | None = None)[source]#
Bases:
Dataset
Datasets from the Open Graph Benchmark (OGB) collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs.
Datasets belongs to three fundamental graph machine learning task categories: predicting the properties of nodes, links, and graphs. Node property prediction datasets consist of a single graph with three additional properties: train_mask, val_mask and test_mask specifying the masks for the train, validation and test splits. Link property prediction datasets also consist of a single graphs with three additional properties: train_edge_index, val_edge_index and test_edge_index, specifying the edges to be considered for training, validation and testing. Graph property prediction datasets consists of multiple graphs. The desired split can be specified via the split arg.
See here for further details and a list of the available datasets with their descriptions
- Parameters:
name (
Literal
['ogbn-products'
,'ogbn-proteins'
,'ogbn-arxiv'
,'ogbn-papers100M'
,'ogbl-ppa'
,'ogbl-collab'
,'ogbl-ddi'
,'ogbl-citation2'
,'ogbl-vessel'
,'ogbg-molhiv'
,'ogbg-molpcba'
,'ogbg-ppa'
,'ogbg-code2'
]) – Name of the datasetsplit (
Optional
[Literal
['train'
,'val'
,'test'
]]) – Split of the dataset to load. Thi parameter has effect only on graph property prediction dataset. If None, the entire dataset is loaded. Defaults to None.base_dir (
Optional
[str
]) – Directory where to store dataset files. Default is in the local directory.mlx_graphs_data/
.
Note
ogb needs to be installed to use this dataset
Note
The ogbn-mag, ogbl-wikikg2 and igbl-biokg and the graphs belonging to the largs-scale challenge category are currently not available as they require heterogenous graphs which are not yet supported by mlx-graphs
Example:
from mlx_graphs.datasets.ogb_dataset import OGBDataset ds = OGBDataset("ogbg-molhiv", split="train") >>> ogbg-molhiv(num_graphs=32901)