Скачать книгу

target="_blank" rel="nofollow" href="#ulink_afb85c87-fdf8-50b2-979e-bd2ea7443dbc">Figure 3.18 Nonlinear IIR filter structures. (a) A recurrent nonlinear neural filter, (b) a recurrent linear/nonlinear neural filter structure.

      (3.70)StartLayout 1st Row normal upper Theta Subscript j Baseline left-parenthesis k right-parenthesis equals StartFraction partial-differential y left-parenthesis k right-parenthesis Over partial-differential omega Subscript j Baseline left-parenthesis k right-parenthesis EndFraction comma y left-parenthesis k right-parenthesis equals left-bracket y 1 left-parenthesis k right-parenthesis comma ellipsis comma y Subscript upper N Baseline left-parenthesis k right-parenthesis right-bracket comma j equals 1 comma 2 comma ellipsis comma upper N 2nd Row upper U Subscript j Baseline left-parenthesis k right-parenthesis equals Start 5 By 1 Matrix 1st Row 0 2nd Row vertical-ellipsis 3rd Row u left-parenthesis k right-parenthesis 4th Row vertical-ellipsis 5th Row 0 EndMatrix left-arrow italic j t h italic r o w comma j equals 1 comma 2 comma period period comma upper N 3rd Row upper F left-parenthesis k right-parenthesis equals diag left-bracket normal upper Phi prime left-parenthesis u left-parenthesis k right-parenthesis Superscript upper T Baseline omega 1 left-parenthesis k right-parenthesis right-parenthesis comma ellipsis comma normal upper Phi prime left-parenthesis u left-parenthesis k right-parenthesis Superscript upper T Baseline omega Subscript upper N Baseline left-parenthesis k right-parenthesis right-parenthesis right-bracket EndLayout

      With this notation, the gradient updating equation regarding the recurrent neuron can be symbolically expressed as

      (3.71)normal upper Theta Subscript j Baseline left-parenthesis k plus 1 right-parenthesis equals upper F left-parenthesis k right-parenthesis left-bracket upper U Subscript j Baseline left-parenthesis k right-parenthesis plus upper W Subscript alpha Baseline left-parenthesis k right-parenthesis normal upper Theta Subscript alpha Baseline left-parenthesis k right-parenthesis right-bracket comma j equals 1 comma 2 comma period period comma upper N

      where Wα denotes the set of those entries in W that correspond to the feedback connections.

      3.4.3 Advanced RNN Architectures

Schematic illustration of a long short-term memory (LSTM) memory cell.

      Note that when we use vector notation, we are referring to the values of the nodes in an entire layer of cells. For example, s is a vector containing the value of sc at each memory cell c in a layer. When the subscript c is used, it is to index an individual memory cell.

      Input node: This unit, labeled gc , is a node that takes activation in the standard way from the input layer x(t) at the current time step and (along recurrent edges) from the hidden layer at the previous time step h(t − 1). Typically, the summed weighted input is run through a tanh activation function, although in the original LSTM paper, the activation function is a sigmoid.

      Input gate: Gates are a distinctive feature of the LSTM approach. A gate is a sigmoidal unit that, like the input node, takes activation from the current data point x(t) as well as from the hidden layer at the previous time step. A gate is so called because its value is used to multiply the value of another node. It is a gate in the sense that if its value is 0, then flow from the other node is cut off. If the value of the gate is 1, all flow is passed through. The value of the input gate ic multiplies the value of the input node.

      Internal state: At the heart of each memory cell is a node sc with linear activation, which is referred to in the original work as the “internal state” of the cell. The internal state sc has a self‐connected recurrent edge with fixed unit weight. Because this edge spans adjacent time steps with constant weight, error can flow across time steps without vanishing or exploding. This edge is often called the constant error carousel. In vector notation, the update for the internal state is s(t) = g(t)i(t) + s(t − 1) where ⊙ is pointwise multiplication.

      Forget gate: These gates fc were introduced to provide a method by which the network can learn to flush the contents of the internal state. This is especially useful in continuously running networks. With forget gates, the equation to calculate the internal state on the forward pass is s(t) = g(t)i(t) + f (t)s(t − 1).

      Output gate: The value vc ultimately produced by a memory cell is the value of the internal state sc multiplied by the value of the output gate oc . It is customary that the internal state first be run through a tanh activation function, as this gives the output of each cell the same dynamic range as an ordinary tanh hidden unit. However, in other works, rectified linear units, which have a greater dynamic range, are easier to train. So it seems plausible that the nonlinear function on the internal state might be omitted.

      In the original paper and in most subsequent work, the input node is labeled g. We adhere to this convention but note that it may be confusing as g does not stand for gate. In the original paper, the gates

Скачать книгу