mlx_graphs.datasets.PlanetoidDataset#
- class mlx_graphs.datasets.PlanetoidDataset(name: Literal['cora', 'citeseer', 'pubmed'], split: Literal['public', 'full', 'geom-gcn'] = 'public', without_self_loops: bool = True, base_dir: str | None = None)[source]#
Bases:
Dataset
The citation network datasets
"Cora"
,"CiteSeer"
and"PubMed"
from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. Nodes represent documents and edges represent citation links. Training, validation and test splits are given by binary masks.This dataset follows a similar implementation as in PyG.
- Parameters:
name (
Literal
['cora'
,'citeseer'
,'pubmed'
]) – The name of the dataset ("Cora"
,"CiteSeer"
,"PubMed"
).split (str, optional) –
The type of dataset split (
"public"
,"full"
,"geom-gcn"
). If set to"public"
, the split will be the public fixed split from the “Revisiting Semi-Supervised Learning with Graph Embeddings” paper. If set to"full"
, all nodes except those in the validation and test sets will be used for training (as in the “FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling” paper). If set to"geom-gcn"
, the 10 public fixed splits from the “Geom-GCN: Geometric Graph Convolutional Networks” paper are given.without_self_loops (
bool
) – Whether to remove self loops. Default toTrue
.base_dir (
Optional
[str
]) – Directory where to store dataset files. Default is in the local directory.mlx_graphs_data/
.
Example:
from mlx_graphs.datasets import Planetoid dataset = Planetoid("cora") >>> cora(num_graphs=1) dataset[0] >>> GraphData( edge_index(shape=(2, 10556), int32) node_features(shape=(2708, 1433), float32) node_labels(shape=(2708,), int32) train_mask(shape=(2708,), bool) val_mask(shape=(2708,), bool) test_mask(shape=(2708,), bool))