grb.dataset¶

grb.dataset.dataset¶

class grb.dataset.dataset.CogDLDataset(name, data_dir=None, mode='origin', verbose=True)[source]¶

Bases: object

property COGDL_GRAPH_CLASSIFICATION_DATASETS¶

static build_adj(attr, edge_index, adj_type='csr')[source]¶

static graph_splitting(num_graphs, train_ratio=0.8, val_ratio=0.1)[source]¶

class grb.dataset.dataset.CustomDataset(adj, features, labels, train_mask=None, val_mask=None, test_mask=None, name=None, data_dir=None, mode='full', feat_norm=None, save=False, verbose=True, seed=42)[source]¶

Bases: object

Class that helps to build customized dataset for GRB evaluation.

Parameters

adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of N * N sparse matrix.
features (torch.FloatTensor) – Features in form of N * D torch float tensor.
labels (torch.LongTensor) – Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.
train_mask (torch.Tensor, optional) – Mask of train nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.
val_mask (torch.Tensor, optional) – Mask of validation nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.
test_mask (torch.Tensor, optional) – Mask of test nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.
name (str) – Name of dataset.
data_dir (str, optional) – Directory of dataset.
mode (str, optional) – Mode of dataset. One of ["easy", "medium", "hard", "full"]. Default: full.
feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.
save (bool, optional) – Whether to save data as files.
verbose (bool, optional) – Whether to display logs. Default: True.
name – Name of dataset, supported datasets: ["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].
data_dir – Directory for dataset. If not provided, default is "./data/".
mode – Difficulty determined according to the average degree of test nodes. Choose from ["easy", "medium", "hard", "full"]. Default: "full" is to use the entire test set.
feat_norm – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.
verbose – Whether to display logs. Default: True.

adj¶

Adjacency matrix in form of N * N sparse matrix.

Type: scipy.sparse.csr.csr_matrix

features¶

Features in form of N * D torch float tensor.

Type: torch.FloatTensor

labels¶

Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.

Type: torch.LongTensor

num_nodes¶

Number of nodes N.

Type: int

num_edges¶

Number of edges.

Type: int

num_features¶

Dimension of features D.

Type: int

num_classes¶

Number of classes L.

Type: int

num_train¶

Number of train nodes.

Type: int

num_val¶

Number of validation nodes.

Type: int

num_test¶

Number of test nodes.

Type: int

mode¶

Mode of dataset. One of ["easy", "medium", "hard", "full"].

Type: str

index_train¶

Index of train nodes.

Type: np.ndarray

index_val¶

Index of validation nodes.

Type: np.ndarray

index_test¶

Index of test nodes.

Type: np.ndarray

train_mask¶

Mask of train nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

val_mask¶

Mask of validation nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

test_mask¶

Mask of test nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

class grb.dataset.dataset.Dataset(name, data_dir=None, mode='easy', feat_norm='arctan', verbose=True, custom=False)[source]¶

Bases: object

Class that loads GRB datasets for evaluating adversarial robustness.

Parameters

name (str) – Name of dataset, supported datasets: ["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].
data_dir (str, optional) – Directory for dataset. If not provided, default is "./data/".
mode (str, optional) – Difficulty determined according to the average degree of test nodes. Choose from ["easy", "medium", "hard", "full"]. Default: "full" is to use the entire test set.
feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.
verbose (bool, optional) – Whether to display logs. Default: True.

adj¶

Adjacency matrix in form of N * N sparse matrix.

Type: scipy.sparse.csr.csr_matrix

features¶

Features in form of N * D torch float tensor.

Type: torch.FloatTensor

labels¶

Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.

Type: torch.LongTensor

num_nodes¶

Number of nodes N.

Type: int

num_edges¶

Number of edges.

Type: int

num_features¶

Dimension of features D.

Type: int

num_classes¶

Number of classes L.

Type: int

num_train¶

Number of train nodes.

Type: int

num_val¶

Number of validation nodes.

Type: int

num_test¶

Number of test nodes.

Type: int

mode¶

Mode of dataset. One of ["easy", "medium", "hard", "full"].

Type: str

index_train¶

Index of train nodes.

Type: np.ndarray

index_val¶

Index of validation nodes.

Type: np.ndarray

index_test¶

Index of test nodes.

Type: np.ndarray

train_mask¶

Mask of train nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

val_mask¶

Mask of validation nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

test_mask¶

Mask of test nodes in form of N * 1 torch bool tensor.

Type: torch.Tensor

Example

>>> import grb
>>> from grb.dataset import Dataset
>>> dataset = Dataset(name='grb-cora', mode='easy', feat_norm="arctan")

class grb.dataset.dataset.OGBDataset(name, data_dir=None, verbose=True)[source]¶

Bases: object

property OGB_GRAPH_CLASSIFICATION_DATASETS¶

property OGB_NODE_CLASSIFICATION_DATASETS¶

grb.dataset.dataset.feat_normalize(features, norm=None, lim_min=-1.0, lim_max=1.0)[source]¶

Feature normalization function.

Parameters

features (torch.FloatTensor) – Features in form of N * D torch float tensor.
norm (str, optional) – Type of normalization. Choose from ["linearize", "arctan", "tanh", "standarize"]. Default: None.
lim_min (float) – Minimum limit of feature value. Default: -1.0.
lim_max (float) – Maximum limit of feature value. Default: 1.0.

Returns

features – Normalized features in form of N * D torch float tensor.

Return type

torch.FloatTensor

grb.dataset.dataset.splitting(adj, range_min=(0.0, 0.05), range_max=(0.95, 1.0), range_easy=(0.05, 0.35), range_medium=(0.35, 0.65), range_hard=(0.65, 0.95), ratio_train=0.6, ratio_val=0.1, ratio_test=0.1, seed=42)[source]¶

GRB splitting scheme designed for adversarial robustness evaluation.

Parameters

adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of N * N sparse matrix.
range_min (tuple of float, optional) – Range of nodes with minimum degrees to be ignored. Value in percentage. Default: (0.0, 0.05).
range_max (tuple of float, optional) – Range of nodes with maximum degrees to be ignored. Value in percentage. Default: (0.95, 1.0).
range_easy (tuple of float, optional) – Range of nodes for easy difficulty. Value in percentage. Default: (0.05, 0.35).
range_medium (tuple of float, optional) – Range of nodes for medium difficulty. Value in percentage. Default: (0.35, 0.65).
range_hard (tuple of float, optional) – Range of nodes for hard difficulty. Value in percentage. Default: (0.65, 0.95).
ratio_train (float, optional) – Ratio of train nodes. Default: 0.6.
ratio_val (float, optional) – Ratio of validation nodes. Default: 0.1.
ratio_test (float, optional) – Ratio of test nodes. Default: 0.1.
seed (int, optional) – Random seed. Default: 42.

Returns

index – Dictionary containing {"index_train", "index_val", "index_test", "index_test_easy", "index_test_medium", "index_test_hard"}.

Return type

dict