Скачать книгу

Baseline EndFraction left-parenthesis x 0 right-parenthesis left-parenthesis x Subscript left-parenthesis d right-parenthesis Baseline minus x Subscript 0 left-parenthesis d right-parenthesis Baseline right-parenthesis EndLayout"/>

      The choice of a Taylor base point x0 is a free parameter in this setup. As stated above, in the case of classification, we are interested in finding out the contribution of each pixel relative to the state of maximal uncertainty of the prediction given by the set of points f(x0) = 0, since f(x) > 0 denotes the presence and f(x) < 0 denotes the absence of the learned structure. Thus, x0 should be chosen to be a root of the predictor f. Thus, the above equation simplifies to

      (4.14)f left-parenthesis x right-parenthesis almost-equals sigma-summation Underscript d equals 1 Overscript upper V Endscripts StartFraction partial-differential f Over partial-differential x Subscript left-parenthesis d right-parenthesis Baseline EndFraction left-parenthesis x 0 right-parenthesis left-parenthesis x Subscript left-parenthesis d right-parenthesis Baseline minus x Subscript 0 left-parenthesis d right-parenthesis Baseline right-parenthesis such that f left-parenthesis x 0 right-parenthesis equals 0

      The pixel‐wise decomposition contains a nonlinear dependence on the prediction point x beyond the Taylor series, as a close root point x0 needs to be found. Thus, the whole pixel‐wise decomposition is not a linear, but a locally linear algorithm, as the root point x0 depends on the prediction point x.

      4.2.2 Pixel‐wise Decomposition for Multilayer NN

      Pixel‐wise decomposition for multilayer networks: In the previous chapter, we discussed NN networks built as a set of interconnected neurons organized in a layered structure. They define a mathematical function when combined with each other that maps the first‐layer neurons (input) to the last‐layer neurons (output). In this section, we denote each neuron by xi , where i is an index for the neuron. By convention, we associate different indices for each layer of the network. We denote by ∑i the summation over all neurons of a given layer, and by ∑j the summation over all neurons of another layer. We denote by x(d) the neurons corresponding to the pixel activations (i.e., with which we would like to obtain a decomposition of the classification decision). A common mapping from one layer to the next one consists of a linear projection followed by a nonlinear function: zij = xj wij , zj = ∑i zij + bj , xj = g(zj), where wij is a weight connecting neuron xi to neuron xj, bj is a bias term, and g is a nonlinear activation function. Multilayer networks stack several of these layers, each of them being composed of a large number of neurons. Common nonlinear functions are the hyperbolic tangent g(t) = tanh (t) or the rectification function g(t) = max (0, t)

      Taylor‐type decomposition: Denoting by f : MN the vector‐valued multivariate function implementing the mapping between input and output of the network, a first possible explanation of the classification decision xfx) can be obtained by Taylor expansion at a near root point x0 of the decision function f:

      (4.15)upper R Subscript d Superscript left-parenthesis 1 right-parenthesis Baseline equals left-parenthesis x minus x 0 right-parenthesis Subscript left-parenthesis d right-parenthesis Baseline dot StartFraction partial-differential f Over partial-differential x Subscript left-parenthesis d right-parenthesis Baseline EndFraction left-parenthesis x 0 right-parenthesis

      The derivative ∂fx)/∂x(d) required for pixel‐wise decomposition can be computed efficiently by reusing the network topology using the backpropagation algorithm discussed in the previous chapter. Having backpropagated the derivatives up to a certain layer j, we can compute the derivative of the previous layer i using the chain rule:

      (4.16)StartFraction partial-differential f Over partial-differential x Subscript i Baseline EndFraction equals sigma-summation Underscript j Endscripts StartFraction partial-differential f Over partial-differential x Subscript j Baseline EndFraction dot StartFraction partial-differential x Subscript j Baseline Over partial-differential x Subscript i Baseline EndFraction equals sigma-summation Underscript j Endscripts StartFraction partial-differential f Over partial-differential x Subscript j Baseline EndFraction dot w Subscript italic i j Baseline dot g prime left-parenthesis z Subscript j Baseline right-parenthesis period

Schematic illustration of relevance propagation.

      (4.17)sigma-summation Underscript i Endscripts upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis

      These relevances Ri ← j are easily shown to approximate the conservation properties, in particular: