Скачать книгу

alt="w Subscript italic i j Superscript l"/> denote the weights on the links between neuron i in the previous layer and neuron j in layer l. The output of the j‐th neuron in layer l is represented by the variable a Subscript i Superscript l. The outputs a Subscript i Superscript upper L in the last L‐th layer represent the overall outputs of the network. Here, we use notation yi for the outputs as y Subscript i Baseline equals a Subscript i Superscript upper L. Parameters xi , defined as inputs to the network, may be viewed as a 0‐th layer with notation x Subscript i Baseline equals a Subscript i Superscript 0. These definitions are summarized in Table 3.1.

w Subscript italic i j Superscript l Weight connecting neuron i in layer l − 1 to neuron j in layer l
w Subscript italic b j Superscript l Bias weight for neuron j in layer l
s Subscript j Superscript l Baseline equals sigma-summation Underscript i Endscripts w Subscript italic i j Superscript l Baseline a Subscript i Superscript l minus 1 Baseline plus w Subscript italic b j Superscript l Summing junction for neuron j in layer l
a Subscript j Superscript l Baseline equals italic hyperbolic tangent left-parenthesis s Subscript j Superscript l Baseline right-parenthesis Activation (output) value for neuron j in layer l
x Subscript i Baseline equals a Subscript i Superscript 0 i‐th external input to network
y Subscript i Baseline equals a Subscript i Superscript upper L i‐th output to network

      Define an input vector x = [x0, x1, x2, … xN] and output vector y = [y0, y1, y2, … yM]. The network maps, y = N(w, x), the input x to the outputs y using the weights w. Since fixed weights are used, this mapping is static; there are no internal dynamics. Still, this network is a powerful tool for computation.

      It has been shown that with two or more layers and a sufficient number of internal neurons, any uniformly continuous function can be represented with acceptable accuracy. The performance rests on the ways in which this “universal function approximator” is utilized.

      3.1.2 Weights Optimization

      (3.3)e equals d minus y period

      The overall objective function to be minimized over the training set is the given squared error

      (3.4)upper J equals sigma-summation Underscript p equals 1 Overscript upper P Endscripts e Subscript p Superscript upper T Baseline e Subscript p Baseline period

      The training should find the set of weights w that minimizes the cost J subject to the constraint of the network topology. We see that training a neural network represent a standard optimization problem.

      A stochastic gradient descent (SGD) algorithm is an option as an optimization method. For each sample from the training set, the weights are adapted as

      where ModifyingAbove italic nabla With ampersand c period circ semicolon equals partial-differential normal e Superscript upper T Baseline normal e slash partial-differential w is the error gradient for the current input pattern, and μ is the learning rate.

      Single neuron case – Consider first a single linear neuron, which we may describe compactly as

      (3.6)y equals sigma-summation Underscript i equals 0 Overscript upper N Endscripts w Subscript i Baseline x Subscript i Baseline equals normal w Superscript upper T Baseline normal x comma

      where w = [w0, w1, … wN] and x = [1, x1, … xN]. In this simple setup

Schematic illustration of supervised learning.

      so that Δw = 2μex. From this, we have Δwi = 2μexi , which is the least mean square (LMS) algorithm.

      In a multi‐layer network, we just formally extend this procedure. For this we use the chain rule

      (3.8)StartFraction partial-differential left-parenthesis normal 
				<p style= Скачать книгу