Скачать книгу

      Computation of the state: Banach’s fixed‐point theorem does not only ensure the existence and the uniqueness of the solution of Eq. (5.74), but it also suggests the following classic iterative scheme for computing the state:

Schematic illustration of graph (on the top, left), the corresponding encoding network (top right), and the network obtained by unfolding the encoding network (at the bottom).The nodes (the circles) of the graph are replaced, in the encoding network, by units computing fw and gw (the squares).

      The learning algorithm is based on a gradient‐descent strategy and is composed of the following steps:

      1 The states xn(t) are iteratively updated by Eq. (5.78) until at time T they approach the fixed‐point solution of Eq. (5.75): x(T) ≈ x.

      2 The gradient ∂ew(T)/∂w is computed.

      3 The weights w are updated according to the gradient computed in Step 2.

      Backpropagation through time consists of carrying out the traditional backpropagation step (see Chapter 3) on the unfolded network to compute the gradient of the cost function at time T with respect to all the instances of fw and gw . Then, ∂ew(T)/∂w is obtained by summing the gradients of all instances. However, backpropagation through time requires storing the states of every instance of the units. When the graphs and Tt0 are large, the memory required may be considerable. On the other hand, in this case, a more efficient approach is possible, based on the Almeida–Pineda algorithm [65, 66]. Since Eq. (5.78) has reached a stable point x before the gradient computation, we can assume that x(t) = x holds for

Скачать книгу