Скачать книгу

of relations is very large, r‐GCN (Relational Data with Graph Convolutional Networks) [7] introduces two kinds of regularization to reduce the number of parameters for modeling relations: basis‐ and block‐diagonal‐decomposition. With the basis decomposition, each Wr is defined as follows:(5.7)

      Here each Wr is a linear combination of basis transformations images with coefficients arb . In the blockdiagonal decomposition, r‐GCN defines each Wr through the direct sum over a set of low‐dimensional matrices, which needs more parameters than the first one.

      Dynamic graphs: These have a static graph structure and dynamic input signals. To capture both kinds of information, diffusion convolutional recurrent NN (DCRNN) [8] and spatial‐temporal graph convolutional networks (STGCN) [9] first collect spatial information by GNNs, then feed the outputs into a sequence model like sequence‐to‐sequence model or convolutional neural networks (CNNs). On the other hand, structural‐recurrent NN [10] and STGCN [11] collect spatial and temporal messages at the same time. They extend the static graph structure with temporal connections so that they can apply traditional GNNs to the extended graphs.

      5.1.2 Propagation Types

      The propagation step and output step in the model define the hidden states of nodes (or edges). Several major modifications have been made to the propagation step from the original GNN model, whereas in the output step a simple feedforward neural network setting is the most popular. The variants utilize different aggregators to gather information from each node’s neighbors and specific updaters to update nodes’ hidden states.

      Convolution has been generalized to the graph domain as well. Advances in this direction are often categorized as spectral approaches and non‐spectral (spatial) approaches. Spectral approaches work with a spectral representation of the graphs.

      Spectral network: The spectral network was proposed in Bruna et al. [12]. The convolution operation is defined in the Fourier domain by computing the eigen decomposition of the graph Laplacian. For the basics of these techniques, see Appendix 5.A. The operation can be defined as the multiplication of a signal x ∈ N (a scalar for each node) with a filter gθ = diag(θ) parameterized by θN:

      (5.8)equation

      where U is the matrix of the eigenvectors of the normalized graph Laplacian images (D is the degree matrix, and A is the adjacency matrix of the graph), with a diagonal matrix of its eigenvalues Λ.

      ChebyshevNet [13] truncates gθ(Λ) in terms of Chebyshev polynomials Tk(x) up to Kth order as

      (5.9)equation

      images Parameter λmax denotes the largest eigenvalue of L. θK is now a vector of Chebyshev coefficients. The Chebyshev polynomials are defined as Tk(x) = 2xTk − 1(x) − Tk − 2(x), with T0(x) = 1 and T1(x) = x. It can be observed that the operation is K‐localized since it is a Kth‐order polynomial in the Laplacian.

      Graph convolutional network (GCN) [14] : This limits the layer‐wise convolution operation to K = 1 to alleviate the problem of overfitting on local neighborhood structures for graphs with very wide node degree distributions. By approximating λmax ≈ 2, the equation simplifies to

      (5.10)equation

      Non‐spectral approaches define convolutions directly on the graph, operating on spatially close neighbors. The major challenge with non‐spectral approaches is defining the convolution operation with differently sized neighborhoods and maintaining the local invariance of CNNs.

      Neural fingerprints (FP) [15] use different weight matrices for nodes with different degrees,

      (5.12)equation

      where images is the weight matrix for nodes with degree images at layer t. The main drawback of the method is that it cannot be applied to large‐scale graphs with more node degrees.

      Diffusion‐convolutional neural networks (DCNNs) were proposed in [16].Transition matrices are used to define the neighborhood for nodes in DCNN. For node classification, it has the form

      (5.13)equation

      where X is an N × F tensor of input features (N is the number of nodes, and F is the number of features). P* is an N × K × N tensor that contains the power series {P, P2, …, PK} of matrix P, and P is the degree‐normalized transition matrix from the graph adjacency matrix A. Each entity is transformed to a diffusion convolutional representation that is a K × F matrix defined by K hops of graph diffusion over F features. It will then be defined by a K × F weight matrix and a nonlinear activation function f. Finally, H (which is N × K × F) denotes the diffusion representations of each node in the graph. As for graph classification, DCNN simply takes the average of nodes’ representation,

      (5.14)equation

      and 1N here is an N × 1 vector of ones. DCNN can also be applied to edge classification tasks, which requires converting edges to nodes and augmenting the adjacency matrix.