Скачать книгу

alt="sigma-summation Underscript i Endscripts upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline dot left-parenthesis 1 minus StartFraction b Subscript j Baseline Over z Subscript j Baseline EndFraction right-parenthesis"/>

      (4.20)upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column StartFraction z Subscript italic i j Baseline Over z Subscript j Baseline plus epsilon EndFraction dot upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline 2nd Column z Subscript j Baseline greater-than-or-equal-to 0 2nd Row 1st Column StartFraction z Subscript italic i j Baseline Over z Subscript j Baseline minus epsilon EndFraction dot upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline 2nd Column z Subscript j Baseline less-than 0 EndLayout

      The conservation law then becomes

      (4.21)sigma-summation Underscript i Endscripts upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals StartLayout Enlarged left-brace 1st Row 1st Column upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline dot left-parenthesis 1 minus StartFraction b Subscript j Baseline plus epsilon Over z Subscript j Baseline plus epsilon EndFraction right-parenthesis 2nd Column z Subscript j Baseline greater-than-or-equal-to 0 2nd Row 1st Column upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline dot left-parenthesis 1 minus StartFraction b Subscript j Baseline minus epsilon Over z Subscript j Baseline minus epsilon EndFraction right-parenthesis 2nd Column z Subscript j Baseline less-than 0 EndLayout

      where we can observe that some further relevance is absorbed by the stabilizer. In particular, relevance is fully absorbed if the stabilizer ε becomes very large.

      An alternative stabilizing method that does not leak relevance consists of treating negative and positive pre‐activations separately. Let z Subscript j Superscript plus Baseline equals sigma-summation Underscript i Endscripts z Subscript italic i j Superscript plus Baseline plus b Subscript j Superscript plus and z Subscript j Superscript minus Baseline equals sigma-summation Underscript i Endscripts z Subscript italic i j Superscript minus Baseline plus b Subscript j Superscript minus Baseline comma where – and + denote the negative and positive parts of zij and bj . Relevance propagation is now defined as

      (4.22)upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline dot left-parenthesis alpha dot StartFraction z Subscript italic i j Superscript plus Baseline Over z Subscript j Superscript plus Baseline EndFraction plus beta dot StartFraction z Subscript italic i j Superscript minus Baseline Over z Subscript j Superscript minus Baseline EndFraction right-parenthesis

      where α + β = 1. For example, for αβ = 1/2, the conservation law becomes

      (4.23)sigma-summation Underscript i Endscripts upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis Baseline equals upper R Subscript j Superscript left-parenthesis l plus 1 right-parenthesis Baseline dot left-parenthesis 1 minus StartFraction b Subscript j Superscript plus Baseline Over 2 z Subscript j Superscript plus Baseline EndFraction minus StartFraction b Subscript j Superscript minus Baseline Over 2 z Subscript j Superscript minus Baseline EndFraction right-parenthesis

      Once a rule for relevance propagation has been selected, the overall relevance of each neuron in the lower layer is determined by summing up the relevances coming from all upper‐layer neurons in agreement with Eqs. (4.8) and (4.9):

      (4.24)upper R Subscript i Superscript left-parenthesis l right-parenthesis Baseline equals sigma-summation Underscript j Endscripts upper R Subscript i left-arrow j Superscript left-parenthesis l comma l plus 1 right-parenthesis

Schematic illustration of relevance propagation (heat map; relevance is presented by the intensity of the red color).

      Source: Montavon et al. [92].

      In this section, we consider long short term memory networks (LSTMs), which were discussed in Chapter 3, and described an approach for tracking the importance of a given input to the LSTM for a given output. By identifying consistently important patterns of words, we are able to distill state‐of‐the‐art LSTMs on sentiment analysis and question answering into a set of representative phrases. This representation is then quantitatively validated by using the extracted phrases to construct a simple rule‐based classifier that approximates the output of the LSTM.

      Word importance scores in LSTMS: Here, we present a decomposition of the output of an LSTM into a product of factors, where each term in the product can be interpreted as the contribution of a particular word. Thus, we can assign importance scores to words according to their contribution to the LSTM’s prediction. We have introduced the basics of LSTM networks in the Chapter 3. Given a sequence of word embeddings x1, xTd, an LSTM processes one word at a time, keeping track of cell and state vectors (c1, h1), (cT, hT), which contain information in the sentence up to word i. ht and ct are computed as a function of xt, ct − 1 using the updates given by Eq.

Скачать книгу