grb.dataset¶
grb.dataset.dataset¶
- class grb.dataset.dataset.CogDLDataset(name, data_dir=None, mode='origin', verbose=True)[source]¶
Bases:
object
- property COGDL_GRAPH_CLASSIFICATION_DATASETS¶
- class grb.dataset.dataset.CustomDataset(adj, features, labels, train_mask=None, val_mask=None, test_mask=None, name=None, data_dir=None, mode='full', feat_norm=None, save=False, verbose=True, seed=42)[source]¶
Bases:
object
Class that helps to build customized dataset for GRB evaluation.
- Parameters
adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of
N * N
sparse matrix.features (torch.FloatTensor) – Features in form of
N * D
torch float tensor.labels (torch.LongTensor) – Labels in form of
N * L
. L=1 for multi-class classification, otherwise for multi-label classification.train_mask (torch.Tensor, optional) – Mask of train nodes in form of
N * 1
torch bool tensor. Default:None
. If isNone
, generated by default splitting scheme.val_mask (torch.Tensor, optional) – Mask of validation nodes in form of
N * 1
torch bool tensor. Default:None
. If isNone
, generated by default splitting scheme.test_mask (torch.Tensor, optional) – Mask of test nodes in form of
N * 1
torch bool tensor. Default:None
. If isNone
, generated by default splitting scheme.name (str) – Name of dataset.
data_dir (str, optional) – Directory of dataset.
mode (str, optional) – Mode of dataset. One of
["easy", "medium", "hard", "full"]
. Default:full
.feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]
. Default:None
.save (bool, optional) – Whether to save data as files.
verbose (bool, optional) – Whether to display logs. Default:
True
.name – Name of dataset, supported datasets:
["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"]
.data_dir – Directory for dataset. If not provided, default is
"./data/"
.mode – Difficulty determined according to the average degree of test nodes. Choose from
["easy", "medium", "hard", "full"]
. Default:"full"
is to use the entire test set.feat_norm – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]
. Default:None
.verbose – Whether to display logs. Default:
True
.
- adj¶
Adjacency matrix in form of
N * N
sparse matrix.- Type
scipy.sparse.csr.csr_matrix
- features¶
Features in form of
N * D
torch float tensor.- Type
torch.FloatTensor
- labels¶
Labels in form of
N * L
. L=1 for multi-class classification, otherwise for multi-label classification.- Type
torch.LongTensor
- num_nodes¶
Number of nodes
N
.- Type
int
- num_edges¶
Number of edges.
- Type
int
- num_features¶
Dimension of features
D
.- Type
int
- num_classes¶
Number of classes
L
.- Type
int
- num_train¶
Number of train nodes.
- Type
int
- num_val¶
Number of validation nodes.
- Type
int
- num_test¶
Number of test nodes.
- Type
int
- mode¶
Mode of dataset. One of
["easy", "medium", "hard", "full"]
.- Type
str
- index_train¶
Index of train nodes.
- Type
np.ndarray
- index_val¶
Index of validation nodes.
- Type
np.ndarray
- index_test¶
Index of test nodes.
- Type
np.ndarray
- train_mask¶
Mask of train nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
- val_mask¶
Mask of validation nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
- test_mask¶
Mask of test nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
- class grb.dataset.dataset.Dataset(name, data_dir=None, mode='easy', feat_norm='arctan', verbose=True, custom=False)[source]¶
Bases:
object
Class that loads GRB datasets for evaluating adversarial robustness.
- Parameters
name (str) – Name of dataset, supported datasets:
["grb-cora", "grb-citeseer", "grb-aminer", "grb-reddit", "grb-flickr"]
.data_dir (str, optional) – Directory for dataset. If not provided, default is
"./data/"
.mode (str, optional) – Difficulty determined according to the average degree of test nodes. Choose from
["easy", "medium", "hard", "full"]
. Default:"full"
is to use the entire test set.feat_norm (str, optional) – Feature normalization that transform all features to range [-1, 1]. Choose from
["arctan", "sigmoid", "tanh"]
. Default:None
.verbose (bool, optional) – Whether to display logs. Default:
True
.
- adj¶
Adjacency matrix in form of
N * N
sparse matrix.- Type
scipy.sparse.csr.csr_matrix
- features¶
Features in form of
N * D
torch float tensor.- Type
torch.FloatTensor
- labels¶
Labels in form of
N * L
. L=1 for multi-class classification, otherwise for multi-label classification.- Type
torch.LongTensor
- num_nodes¶
Number of nodes
N
.- Type
int
- num_edges¶
Number of edges.
- Type
int
- num_features¶
Dimension of features
D
.- Type
int
- num_classes¶
Number of classes
L
.- Type
int
- num_train¶
Number of train nodes.
- Type
int
- num_val¶
Number of validation nodes.
- Type
int
- num_test¶
Number of test nodes.
- Type
int
- mode¶
Mode of dataset. One of
["easy", "medium", "hard", "full"]
.- Type
str
- index_train¶
Index of train nodes.
- Type
np.ndarray
- index_val¶
Index of validation nodes.
- Type
np.ndarray
- index_test¶
Index of test nodes.
- Type
np.ndarray
- train_mask¶
Mask of train nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
- val_mask¶
Mask of validation nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
- test_mask¶
Mask of test nodes in form of
N * 1
torch bool tensor.- Type
torch.Tensor
Example
>>> import grb >>> from grb.dataset import Dataset >>> dataset = Dataset(name='grb-cora', mode='easy', feat_norm="arctan")
- class grb.dataset.dataset.OGBDataset(name, data_dir=None, verbose=True)[source]¶
Bases:
object
- property OGB_GRAPH_CLASSIFICATION_DATASETS¶
- property OGB_NODE_CLASSIFICATION_DATASETS¶
- grb.dataset.dataset.feat_normalize(features, norm=None, lim_min=-1.0, lim_max=1.0)[source]¶
Feature normalization function.
- Parameters
features (torch.FloatTensor) – Features in form of
N * D
torch float tensor.norm (str, optional) – Type of normalization. Choose from
["linearize", "arctan", "tanh", "standarize"]
. Default:None
.lim_min (float) – Minimum limit of feature value. Default:
-1.0
.lim_max (float) – Maximum limit of feature value. Default:
1.0
.
- Returns
features – Normalized features in form of
N * D
torch float tensor.- Return type
torch.FloatTensor
- grb.dataset.dataset.splitting(adj, range_min=(0.0, 0.05), range_max=(0.95, 1.0), range_easy=(0.05, 0.35), range_medium=(0.35, 0.65), range_hard=(0.65, 0.95), ratio_train=0.6, ratio_val=0.1, ratio_test=0.1, seed=42)[source]¶
GRB splitting scheme designed for adversarial robustness evaluation.
- Parameters
adj (scipy.sparse.csr.csr_matrix) – Adjacency matrix in form of
N * N
sparse matrix.range_min (tuple of float, optional) – Range of nodes with minimum degrees to be ignored. Value in percentage. Default:
(0.0, 0.05)
.range_max (tuple of float, optional) – Range of nodes with maximum degrees to be ignored. Value in percentage. Default:
(0.95, 1.0)
.range_easy (tuple of float, optional) – Range of nodes for
easy
difficulty. Value in percentage. Default:(0.05, 0.35)
.range_medium (tuple of float, optional) – Range of nodes for
medium
difficulty. Value in percentage. Default:(0.35, 0.65)
.range_hard (tuple of float, optional) – Range of nodes for
hard
difficulty. Value in percentage. Default:(0.65, 0.95)
.ratio_train (float, optional) – Ratio of train nodes. Default:
0.6
.ratio_val (float, optional) – Ratio of validation nodes. Default:
0.1
.ratio_test (float, optional) – Ratio of test nodes. Default:
0.1
.seed (int, optional) – Random seed. Default:
42
.
- Returns
index – Dictionary containing
{"index_train", "index_val", "index_test", "index_test_easy", "index_test_medium", "index_test_hard"}
.- Return type
dict