Скачать книгу

layer l bold a Subscript i Superscript l Baseline left-parenthesis k right-parenthesis equals left-bracket a Subscript i Superscript l Baseline left-parenthesis k right-parenthesis comma a Subscript i Superscript l Baseline left-parenthesis k minus 1 right-parenthesis comma period period a Subscript i Superscript l Baseline left-parenthesis k minus upper M Superscript l plus 1 Baseline right-parenthesis right-bracket Vector of delayed activation values x Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript 0 Baseline left-parenthesis k right-parenthesis i‐th external input to network y Subscript i Baseline left-parenthesis k right-parenthesis equals a Subscript i Superscript upper L Baseline left-parenthesis k right-parenthesis i‐th output of network Schematic illustration of finite impulse response (FIR) network unfolding.

      Example

      3.2.3 Adaptation

      For supervised learning with input sequence x(k), the difference between the desired output at time k and the actual output of the network is the error

      (3.17)normal e left-parenthesis k right-parenthesis equals normal d left-parenthesis k right-parenthesis minus normal y left-parenthesis k right-parenthesis period

      The total squared error over the sequence is given by

      (3.18)upper J equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts normal e left-parenthesis k right-parenthesis Superscript upper T Baseline normal e left-parenthesis k right-parenthesis period

      For instantaneous gradient descent, FIR filters may be updated at each time slot as

      (3.19)normal w Subscript italic i j Superscript l Baseline left-parenthesis k plus 1 right-parenthesis equals normal w Subscript italic i j Superscript l Baseline left-parenthesis k right-parenthesis minus mu StartFraction partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline left-parenthesis k right-parenthesis EndFraction comma

      where partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis slash partial-differential normal w Subscript italic i j Superscript l Baseline left-parenthesis k right-parenthesis is the instantaneous gradient estimate, and μ is the learning rate. However, deriving an expression for this parameter results in an overlapping of number of chain rules. A simple backpropagationlike formulation does not exist anymore.

      Temporal backpropagation is an alternative approach that can be used to avoid the above problem. To discuss it, let us consider two alternative forms of the true gradient of the cost function:

      (3.20)StartFraction partial-differential upper J Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts StartFraction partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction equals sigma-summation Underscript k equals 1 Overscript upper K Endscripts StartFraction partial-differential upper J Over partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis EndFraction StartFraction partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction period

      Note that

StartFraction partial-differential upper J Over partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis EndFraction StartFraction partial-differential s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction not-equals StartFraction partial-differential normal e Superscript upper T Baseline left-parenthesis k right-parenthesis normal e left-parenthesis k right-parenthesis Over partial-differential normal w Subscript italic i j Superscript l Baseline EndFraction comma

      only their sum over all k is equal. Based on this new expansion, each term in the sum is used to form the following stochastic algorithm:

      For small learning rates, the total accumulated weight change is approximately equal to the true gradient. This training algorithm is termed temporal backpropagation.

      To complete the algorithm, recall the summing junction is defined as

      (3.22)s Subscript j Superscript l Baseline left-parenthesis k right-parenthesis equals sigma-summation 
				<p style= Скачать книгу