Скачать книгу

14 (2): 105–113.

      98 98 Chiang, J.H. and Hao, P.Y. (2004). Support vector learning mechanism for fuzzy rule‐based modeling: a new approach. IEEE Trans. Fuzzy Syst. 12 (1): 1–12.

      99 99 Shen, J., Syau, Y., and Lee, E.S. (2007). Support vector fuzzy adaptive network in regression analysis. Comput. Math. Appl. 54 (11–12): 1353–1366.

      100 100 Smola, A.J. and Schölkopf, B. (1998). The connection between regularization operators and support vector kernels. Neural Netw. 10: 1445–1454.

      101 101 https://en.wikipedia.org/wiki/T‐norm:fuzzy_logics

      102 102 https://en.wikipedia.org/wiki/Construction_of_t‐norms

      103 103 Cherkassky, V. and Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 17 (1): 113–126.

      104 104 Chalimourda, A., Schölkopf, B., and Smola, A.J. (2004). Experimentally optimal v in support vector regression for different noise models and parameters settings. Neural Netw. 17 (1): 127–141.

      105 105 Yu, L. and Xiao, J. (2009). Trade‐off between accuracy and interpretability: experience‐oriented fuzzy modeling via reduced‐set vectors. Comput. Math. Appl. 57: 885–895.

      As a start, in this section we present a brief overview of the and then in the subsequent sections discuss some of the most popular variants in more detail.

      The basic idea here is to extend existing neural networks for the purpose of processing the data represented in graph domains [1]. In a graph, each node is defined by its own features and the features of the related nodes. The target of GNN is to learn a state embedding hvs that contains information on the neighborhood for each node. The state embedding hv is an s‐dimension vector of node v and can be used to produce an output ov . Let f be a parametric function, called the local transition function, that is shared among all nodes and updates the node state according to the input neighborhood. Let g be the local output function that describes how the output is produced. Then, hv and ov are defined as

      where xv, xco[v], hne[v], and xne[v] are the features of v, the features of its edges, the states, and the features of the nodes in the neighborhood of v, respectively. If H, O, X, and XN are the vectors constructed by stacking all the states, all the outputs, all the features, and all the node features, respectively, then we can write

      (5.4)equation

      1 The states are iteratively updated by Eq. (5.1) until a time T.They approach the fixed‐point solution of Eq. (5.2) H(T)≈ H.

      2 The gradient of weights W is computed from the loss.

      3 The weights W are updated according to the gradient computed in the last step.

      5.1.1 Classification of Graphs

      Directed graphs: Directed edges can yield more information than undirected edges. For example, in a knowledge graph where the edge starts from the head entity and ends at the tail entity, the head entity is the parent class of the tail entity, which suggests we should treat the information propagation process from parent classes and child classes differently. Here, we use two kinds of weight matrix, Wp and Wc , to incorporate more precise structural information. The propagation rule is [3]

      (5.5)equation

      where images images are the normalized adjacency matrix for parents and children, respectively, and σ denotes a nonlinear activation function.

      Heterogeneous graphs: These have several kinds of nodes. The simplest way to process heterogeneous graphs is to convert the type of each node to a one‐hot feature vector that is concatenated with the original feature. GraphInception [4] introduces the concept of metapath into propagation on the heterogeneous graph. With metapath, we can group neighbors according to their node types and distances. For each neighbor group, GraphInception treats it as a subgraph in a homogeneous graph to perform propagation and concatenates the propagation results from different homogeneous graphs to arrive at a collective node representation. In [5], the heterogeneous graph attention network (HAN) was proposed, which utilizes node‐level and semantic‐level attention. The model has the ability to consider node importance and meta‐paths simultaneously.

      Graphs with edge information: Here, each edge has additional information like the weight or the type of the edge. There are two ways to handle this kind of graph:

      1 We can convert the graph to a bipartite graph where the original edges also become nodes and one original edge is split into two new edges, which means there are two new edges between the edge node and begin/end nodes. The encoder of GS2 (Graph to Sequence) [6] uses the following aggregation function for neighbors:(5.6) where Wr and br are the propagation parameters for different types of edges (relations r), ρ is a nonlinearity, ⊙ stands for the Hadamard product and is the set of neighboring nodes.

      2 We can

Скачать книгу