grb.dataset

grb.dataset.dataset

class grb.dataset.dataset.CogDLDataset(name, data_dir=None, mode='origin', verbose=True)[source]

Bases: object

property COGDL_GRAPH_CLASSIFICATION_DATASETS
static build_adj(attr, edge_index, adj_type='csr')[source]
static graph_splitting(num_graphs, train_ratio=0.8, val_ratio=0.1)[source]
class grb.dataset.dataset.CustomDataset(adj, features, labels, train_mask=None, val_mask=None, test_mask=None, name=None, data_dir=None, mode='full', feat_norm=None, save=False, verbose=True, seed=42)[source]

Bases: object

Class that helps to build customized dataset for GRB evaluation.

Parameters
  • adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of N * N sparse matrix.

  • features (torch.FloatTensor) – Features in form of N * D torch float tensor.

  • labels (torch.LongTensor) – Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.

  • train_mask (torch.Tensor, optional) – Mask of train nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.

  • val_mask (torch.Tensor, optional) – Mask of validation nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.

  • test_mask (torch.Tensor, optional) – Mask of test nodes in form of N * 1 torch bool tensor. Default: None. If is None, generated by default splitting scheme.

  • name (str) – Name of dataset.

  • data_dir (str, optional) – Directory of dataset.

  • mode (str, optional) – Mode of dataset. One of ["easy", "medium", "hard", "full"]. Default: full.

  • feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.

  • save (bool, optional) – Whether to save data as files.

  • verbose (bool, optional) – Whether to display logs. Default: True.

  • name – Name of dataset, supported datasets: ["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].

  • data_dir – Directory for dataset. If not provided, default is "./data/".

  • mode – Difficulty determined according to the average degree of test nodes. Choose from ["easy", "medium", "hard", "full"]. Default: "full" is to use the entire test set.

  • feat_norm – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.

  • verbose – Whether to display logs. Default: True.

adj

Adjacency matrix in form of N * N sparse matrix.

Type

scipy.sparse.csr.csr_matrix

features

Features in form of N * D torch float tensor.

Type

torch.FloatTensor

labels

Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.

Type

torch.LongTensor

num_nodes

Number of nodes N.

Type

int

num_edges

Number of edges.

Type

int

num_features

Dimension of features D.

Type

int

num_classes

Number of classes L.

Type

int

num_train

Number of train nodes.

Type

int

num_val

Number of validation nodes.

Type

int

num_test

Number of test nodes.

Type

int

mode

Mode of dataset. One of ["easy", "medium", "hard", "full"].

Type

str

index_train

Index of train nodes.

Type

np.ndarray

index_val

Index of validation nodes.

Type

np.ndarray

index_test

Index of test nodes.

Type

np.ndarray

train_mask

Mask of train nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

val_mask

Mask of validation nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

test_mask

Mask of test nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

class grb.dataset.dataset.Dataset(name, data_dir=None, mode='easy', feat_norm='arctan', verbose=True, custom=False)[source]

Bases: object

Class that loads GRB datasets for evaluating adversarial robustness.

Parameters
  • name (str) – Name of dataset, supported datasets: ["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].

  • data_dir (str, optional) – Directory for dataset. If not provided, default is "./data/".

  • mode (str, optional) – Difficulty determined according to the average degree of test nodes. Choose from ["easy", "medium", "hard", "full"]. Default: "full" is to use the entire test set.

  • feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from ["arctan", "sigmoid", "tanh"]. Default: None.

  • verbose (bool, optional) – Whether to display logs. Default: True.

adj

Adjacency matrix in form of N * N sparse matrix.

Type

scipy.sparse.csr.csr_matrix

features

Features in form of N * D torch float tensor.

Type

torch.FloatTensor

labels

Labels in form of N * L. L=1 for multi-class classification, otherwise for multi-label classification.

Type

torch.LongTensor

num_nodes

Number of nodes N.

Type

int

num_edges

Number of edges.

Type

int

num_features

Dimension of features D.

Type

int

num_classes

Number of classes L.

Type

int

num_train

Number of train nodes.

Type

int

num_val

Number of validation nodes.

Type

int

num_test

Number of test nodes.

Type

int

mode

Mode of dataset. One of ["easy", "medium", "hard", "full"].

Type

str

index_train

Index of train nodes.

Type

np.ndarray

index_val

Index of validation nodes.

Type

np.ndarray

index_test

Index of test nodes.

Type

np.ndarray

train_mask

Mask of train nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

val_mask

Mask of validation nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

test_mask

Mask of test nodes in form of N * 1 torch bool tensor.

Type

torch.Tensor

Example

>>> import grb
>>> from grb.dataset import Dataset
>>> dataset = Dataset(name='grb-cora', mode='easy', feat_norm="arctan")
class grb.dataset.dataset.OGBDataset(name, data_dir=None, verbose=True)[source]

Bases: object

property OGB_GRAPH_CLASSIFICATION_DATASETS
property OGB_NODE_CLASSIFICATION_DATASETS
grb.dataset.dataset.feat_normalize(features, norm=None, lim_min=-1.0, lim_max=1.0)[source]

Feature normalization function.

Parameters
  • features (torch.FloatTensor) – Features in form of N * D torch float tensor.

  • norm (str, optional) – Type of normalization. Choose from ["linearize", "arctan", "tanh", "standarize"]. Default: None.

  • lim_min (float) – Minimum limit of feature value. Default: -1.0.

  • lim_max (float) – Maximum limit of feature value. Default: 1.0.

Returns

features – Normalized features in form of N * D torch float tensor.

Return type

torch.FloatTensor

grb.dataset.dataset.splitting(adj, range_min=(0.0, 0.05), range_max=(0.95, 1.0), range_easy=(0.05, 0.35), range_medium=(0.35, 0.65), range_hard=(0.65, 0.95), ratio_train=0.6, ratio_val=0.1, ratio_test=0.1, seed=42)[source]

GRB splitting scheme designed for adversarial robustness evaluation.

Parameters
  • adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of N * N sparse matrix.

  • range_min (tuple of float, optional) – Range of nodes with minimum degrees to be ignored. Value in percentage. Default: (0.0, 0.05).

  • range_max (tuple of float, optional) – Range of nodes with maximum degrees to be ignored. Value in percentage. Default: (0.95, 1.0).

  • range_easy (tuple of float, optional) – Range of nodes for easy difficulty. Value in percentage. Default: (0.05, 0.35).

  • range_medium (tuple of float, optional) – Range of nodes for medium difficulty. Value in percentage. Default: (0.35, 0.65).

  • range_hard (tuple of float, optional) – Range of nodes for hard difficulty. Value in percentage. Default: (0.65, 0.95).

  • ratio_train (float, optional) – Ratio of train nodes. Default: 0.6.

  • ratio_val (float, optional) – Ratio of validation nodes. Default: 0.1.

  • ratio_test (float, optional) – Ratio of test nodes. Default: 0.1.

  • seed (int, optional) – Random seed. Default: 42.

Returns

index – Dictionary containing {"index_train", "index_val", "index_test", "index_test_easy", "index_test_medium", "index_test_hard"}.

Return type

dict