mlx_graphs.datasets.PlanetoidDataset#

class mlx_graphs.datasets.PlanetoidDataset(name: Literal['cora', 'citeseer', 'pubmed'], split: Literal['public', 'full', 'geom-gcn'] = 'public', without_self_loops: bool = True, base_dir: str | None = None)[source]#

Bases: Dataset

The citation network datasets "Cora", "CiteSeer" and "PubMed" from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.

This dataset follows a similar implementation as in PyG.

Parameters:

name (Literal['cora', 'citeseer', 'pubmed']) – The name of the dataset ("Cora", "CiteSeer", "PubMed").
split (str, optional) –
The type of dataset split ("public", "full", "geom-gcn"). If set to "public", the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to "full", all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to "geom-gcn", the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given.
without_self_loops (bool) – Whether to remove self loops. Default to True.
base_dir (Optional[str]) – Directory where to store dataset files. Default is in the local directory .mlx_graphs_data/.

Example:

from mlx_graphs.datasets import Planetoid

dataset = Planetoid("cora")
>>> cora(num_graphs=1)

dataset[0]
>>> GraphData(
        edge_index(shape=(2, 10556), int32)
        node_features(shape=(2708, 1433), float32)
        node_labels(shape=(2708,), int32)
        train_mask(shape=(2708,), bool)
        val_mask(shape=(2708,), bool)
        test_mask(shape=(2708,), bool))

mlx_graphs.datasets.PlanetoidDataset

Contents

mlx_graphs.datasets.PlanetoidDataset#