Скачать книгу

target="_blank" rel="nofollow" href="#fb3_img_img_362d6807-893d-5b96-bc04-3a5f8824aa42.png" alt="equation"/>

      (5.16)equation

      where XP is the PPMI matrix, and DP is the diagonal degree matrix of XP.

      GraphSAGE (SAmple and aggreGatE) [23] is a general inductive framework. The framework generates embeddings by sampling and aggregating features from a node’s local neighborhood.

      The mean aggregator could be viewed as an approximation of the convolutional operation from the transductive GCN framework [14], so that the inductive version of the GCN variant could be derived by

      (5.18)equation

      The mean aggregator is different from other aggregators because it does not perform the concatenation operation that concatenates images and images in Eq. (5.17). It could be viewed as a form of “skip connection” [24] and could achieve better performance.

      The long short‐term memory (LSTM) aggregator, which has a larger expressive capability, is also used. However, LSTMs process inputs in a sequential manner so that they are not permutation invariant. Reference [23] adapts LSTMs to operate on an unordered set by permutating the node’s neighbors.

      Pooling aggregator: In the pooling aggregator, each neighbor’s hidden state is fed through a fully connected layer, after which a max ‐pooling operation is applied to the set of the node’s neighbors:

      (5.19)equation

      Gate: Several works have attempted to use a gate mechanism such as gate recurrent units (GRUs) [25] or LSTM [26] in the propagation step to mitigate the restrictions in the former GNN models and improve the long‐term propagation of information across the graph structure.

      Gated graph neural network (GGNN) [27] uses GRUs in the propagation step, unrolls the recurrence for a fixed number of steps T, and uses backpropagation through time in order to compute gradients. So, the propagation model can be presented as

      (5.20)equation

Schematic illustration of operation of GraphSAGE: (a) sample neighborhood, (b) aggregate feature information from neighbors, (c) predict graph context and label using aggregated information.

      Source: Hamilton et al. [23].

      LSTM architecture extensions, referred to as the ChildSum Tree‐LSTM and the Nary Tree‐LSTM, are presented in [28]. As in standard LSTM units, each Tree‐LSTM unit (indexed by v) contains input and output gates iv and ov, a memory cell cv , and a hidden state hv . Instead of a single forget gate, the Tree‐LSTM unit contains one forget gate fvk for each child k, allowing the unit to selectively incorporate information from each child. The Child‐Sum Tree‐LSTM transition equations are given as

      (5.21)equation

      images is the input vector at time t in the standard LSTM setting. If the branching factor of a tree is at most K and all children of a node are ordered, – that is, they can be indexed from 1 to K – then the N‐ary Tree‐LSTM can be used. For node v, images and images denote the hidden state and memory cell of its k‐th child at time t, respectively. The transition equations are now

      The introduction of separate parameter matrices for each child k allows the model to learn more fine‐grained representations conditioning on the states of a unit’s children than the Child‐Sum Tree‐LSTM.

      The two types of Tree‐LSTMs can be easily adapted to the graph. The graph‐structured LSTM in [29] is an example of the N‐ary Tree‐LSTM applied to the graph. However, it is a simplified version since each node in the graph has at most two incoming edges (from its parent and sibling predecessor). Reference [30] proposed another variant of the Graph LSTM based on the relation extraction task. The main difference between graphs and trees is that edges of graphs have labels. Work in [30] utilizes different weight matrices to represent different labels:

      (5.23)equation

      where m(v, k) denotes the edge label between node v and k.

      The attention mechanism has been successfully used in many sequence‐based tasks such as machine translation [31–33] and machine reading [34]. Work in [35]

Скачать книгу