Скачать книгу

regards graph convolutions as a diffusion process. It assumes that information is transferred from one node to one of its neighboring nodes with a certain transition probability so that information distribution can reach equilibrium after several rounds. DCNN defines the diffusion graph convolution (DGC) as

      (5.49)equation

      where f(·) is an activation function, and the probability transition matrix P ∈ Rn × n is computed by P = D−1 A. Note that in DCNN, the hidden representation matrix H(k) remains the same dimension as the input feature matrix X and is not a function of its previous hidden representation matrix H(k − 1). DCNN concatenates H(1), H(2), ⋯, H(K) together as the final model outputs. As the stationary distribution of a diffusion process is the sum of the power series of probability transition matrices, DGC [49] sums up outputs at each diffusion step instead of concatenating them. It defines the DGC by

      where W(k) ∈ RD × F and f(·) is an activation function. Using the power of a transition probability matrix implies that distant neighbors contribute very little information to a central node.

      Parametric graph convolution (parametric GC ‐DGCNN) [12, 39] increases the contributions of distant neighbors based on shortest paths. It defines a shortest path adjacency matrix S(j). If the shortest path from a node v to a node u is of length j, then images otherwise 0. With a hyperparameter r to control the receptive field size, PGC‐DGCNN introduces a graph convolutional operation as follows:

      (5.51)equation

      Partition graph convolution (PGC) [50] partitions a node’s neighbors into Q groups based on certain criteria not limited to shortest paths. PGC constructs Q adjacency matrices according to the defined neighborhood by each group. Then, PGC applies GCN [14] with a different parameter matrix to each neighbor group and sums the results:

      (5.52)equation

      where images and images

      MPNN [51] outlines a general framework of spatial‐based ConvGNNs. It treats graph convolutions as a message passing process in which information can be passed from one node to another along edges directly. MPNN runs K‐step message passing iterations to let information propagate further. The message passing function (namely the spatial graph convolution) is defined as

      (5.53)equation

      where images =xv, Uk(·) and Mk(·) are functions with learnable parameters. After deriving the hidden representations of each node, images can be passed to an output layer to perform node‐level prediction tasks or to a readout function to perform graph‐level prediction tasks. The readout function generates a representation of the entire graph based on node hidden representations. It is generally defined as

      (5.54)equation

      where R(·) represents the readout function with learnable parameters. MPNN can cover many existing GNNs by assuming different forms of Uk(·), Mk(·), and R(·).

      Graph Isomorphism Network (GIN): Reference [52] finds that previous MPNN‐based methods are incapable of distinguishing different graph structures based on the graph embedding they produced. To amend this drawback, GIN adjusts the weight of the central node by a learnable parameter ε(k). It performs graph convolutions by

      (5.55)equation

      where MLP(·) represents a multi‐layer perceptron. As the number of neighbors of a node can vary from one to a thousand or even more, it is inefficient to take the full size of a node’s neighborhood. GraphSAGE [23] adopts sampling to obtain a fixed number of neighbors for each node. It performs graph convolutions by

      (5.56)equation

      where images fk(·) is an aggregation function, SN(v) is a random sample of the node v’s neighbors. The aggregation function should be invariant to the permutations of node orderings such as a mean, sum, or max function.

      (5.57)equation

      where images. The attention weight images measures the connective strength between the node v and its neighbor u:

      (5.58)equation

      where g(·) is a LeakyReLU activation function and a is a vector of learnable parameters. The softmax function ensures that the attention weights sum up to one over all neighbors of the node v.

      Graph pooling modules: After a GNN generates node features, we can use them for the final task. But using all these features directly can be computationally challenging, and thus a downsampling strategy is needed. Depending on the objective and the role it plays in the network, different names are given to this strategy: (i) the pooling operation aims to reduce the size of parameters by downsampling the nodes to generate smaller representations and thus avoid overfitting, permutation invariance, and computational complexity issues and (ii) the readout operation is mainly used to generate graph‐level representation based on node representations (see Figure 5.5). Their mechanism is very similar. In this section, we use pooling to refer to all kinds of downsampling strategies applied to GNNs. Nowadays, mean/max/sum pooling, already illustrated in Section 5.1, is the most primitive and effective way to implement downsampling since calculating the mean/ max /sum

Скачать книгу