grb.dataset¶
grb.dataset.dataset¶
- class grb.dataset.dataset.CogDLDataset(name, data_dir=None, mode='origin', verbose=True)[source]¶
Bases:
object
- class grb.dataset.dataset.CustomDataset(adj, features, labels, train_mask=None, val_mask=None, test_mask=None, name=None, data_dir=None, mode='full', feat_norm=None, save=False, verbose=True)[source]¶
Bases:
objectClass that helps to build customized dataset for GRB evaluation.
- Parameters
adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of
N * Nsparse matrix.features (torch.FloatTensor) – Features in form of
N * Dtorch float tensor.labels (torch.LongTensor) – Labels in form of
N * L. L=1 for multi-class classification, otherwise for multi-label classification.train_mask (torch.Tensor, optional) – Mask of train nodes in form of
N * 1torch bool tensor. Default:None. If isNone, generated by default splitting scheme.val_mask (torch.Tensor, optional) – Mask of validation nodes in form of
N * 1torch bool tensor. Default:None. If isNone, generated by default splitting scheme.test_mask (torch.Tensor, optional) – Mask of test nodes in form of
N * 1torch bool tensor. Default:None. If isNone, generated by default splitting scheme.name (str) – Name of dataset.
data_dir (str, optional) – Directory of dataset.
mode (str, optional) – Mode of dataset. One of
["easy", "medium", "hard", "full"]. Default:full.feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]. Default:None.save (bool, optional) – Whether to save data as files.
verbose (bool, optional) – Whether to display logs. Default:
True.name – Name of dataset, supported datasets:
["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].data_dir – Directory for dataset. If not provided, default is
"./data/".mode – Difficulty determined according to the average degree of test nodes. Choose from
["easy", "medium", "hard", "full"]. Default:"full"is to use the entire test set.feat_norm – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]. Default:None.verbose – Whether to display logs. Default:
True.
- adj¶
Adjacency matrix in form of
N * Nsparse matrix.- Type
scipy.sparse.csr.csr_matrix
- features¶
Features in form of
N * Dtorch float tensor.- Type
torch.FloatTensor
- labels¶
Labels in form of
N * L. L=1 for multi-class classification, otherwise for multi-label classification.- Type
torch.LongTensor
- num_nodes¶
Number of nodes
N.- Type
int
- num_edges¶
Number of edges.
- Type
int
- num_features¶
Dimension of features
D.- Type
int
- num_classes¶
Number of classes
L.- Type
int
- num_train¶
Number of train nodes.
- Type
int
- num_val¶
Number of validation nodes.
- Type
int
- num_test¶
Number of test nodes.
- Type
int
- mode¶
Mode of dataset. One of
["easy", "medium", "hard", "full"].- Type
str
- index_train¶
Index of train nodes.
- Type
np.ndarray
- index_val¶
Index of validation nodes.
- Type
np.ndarray
- index_test¶
Index of test nodes.
- Type
np.ndarray
- train_mask¶
Mask of train nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
- val_mask¶
Mask of validation nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
- test_mask¶
Mask of test nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
- class grb.dataset.dataset.Dataset(name, data_dir=None, mode='easy', feat_norm=None, verbose=True)[source]¶
Bases:
objectClass that loads GRB datasets for evaluating adversarial robustness.
- Parameters
name (str) – Name of dataset, supported datasets:
["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"].data_dir (str, optional) – Directory for dataset. If not provided, default is
"./data/".mode (str, optional) – Difficulty determined according to the average degree of test nodes. Choose from
["easy", "medium", "hard", "full"]. Default:"full"is to use the entire test set.feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]. Default:None.verbose (bool, optional) – Whether to display logs. Default:
True.
- adj¶
Adjacency matrix in form of
N * Nsparse matrix.- Type
scipy.sparse.csr.csr_matrix
- features¶
Features in form of
N * Dtorch float tensor.- Type
torch.FloatTensor
- labels¶
Labels in form of
N * L. L=1 for multi-class classification, otherwise for multi-label classification.- Type
torch.LongTensor
- num_nodes¶
Number of nodes
N.- Type
int
- num_edges¶
Number of edges.
- Type
int
- num_features¶
Dimension of features
D.- Type
int
- num_classes¶
Number of classes
L.- Type
int
- num_train¶
Number of train nodes.
- Type
int
- num_val¶
Number of validation nodes.
- Type
int
- num_test¶
Number of test nodes.
- Type
int
- mode¶
Mode of dataset. One of
["easy", "medium", "hard", "full"].- Type
str
- index_train¶
Index of train nodes.
- Type
np.ndarray
- index_val¶
Index of validation nodes.
- Type
np.ndarray
- index_test¶
Index of test nodes.
- Type
np.ndarray
- train_mask¶
Mask of train nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
- val_mask¶
Mask of validation nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
- test_mask¶
Mask of test nodes in form of
N * 1torch bool tensor.- Type
torch.Tensor
Example
>>> import grb >>> from grb.dataset import Dataset >>> dataset = Dataset(name='grb-cora', mode='easy', feat_norm="arctan")
- grb.dataset.dataset.feat_normalize(features, norm=None, lim_min=- 1.0, lim_max=1.0)[source]¶
Feature normalization function.
- Parameters
features (torch.FloatTensor) – Features in form of
N * Dtorch float tensor.norm (str, optional) – Type of normalization. Choose from
["linearize", "arctan", "tanh", "standarize"]. Default:None.lim_min (float) – Minimum limit of feature value. Default:
-1.0.lim_max (float) – Maximum limit of feature value. Default:
1.0.
- Returns
features – Normalized features in form of
N * Dtorch float tensor.- Return type
torch.FloatTensor
- grb.dataset.dataset.splitting(adj, range_min=(0.0, 0.05), range_max=(0.95, 1.0), range_easy=(0.05, 0.35), range_medium=(0.35, 0.65), range_hard=(0.65, 0.95), ratio_train=0.6, ratio_val=0.1, ratio_test=0.1, seed=42)[source]¶
GRB splitting scheme designed for adversarial robustness evaluation.
- Parameters
adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of
N * Nsparse matrix.range_min (tuple of float, optional) – Range of nodes with minimum degrees to be ignored. Value in percentage. Default:
(0.0, 0.05).range_max (tuple of float, optional) – Range of nodes with maximum degrees to be ignored. Value in percentage. Default:
(0.95, 1.0).range_easy (tuple of float, optional) – Range of nodes for
easydifficulty. Value in percentage. Default:(0.05, 0.35).range_medium (tuple of float, optional) – Range of nodes for
mediumdifficulty. Value in percentage. Default:(0.35, 0.65).range_hard (tuple of float, optional) – Range of nodes for
harddifficulty. Value in percentage. Default:(0.65, 0.95).ratio_train (float, optional) – Ratio of train nodes. Default:
0.6.ratio_val (float, optional) – Ratio of validation nodes. Default:
0.1.ratio_test (float, optional) – Ratio of test nodes. Default:
0.1.seed (int, optional) – Random seed. Default:
42.
- Returns
index – Dictionary containing
{"index_train", "index_val", "index_test", "index_test_easy", "index_test_medium", "index_test_hard"}.- Return type
dict