Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic. Программы. . Читать онлайн. Литмир. LITMIR.BIZ

Decomposing the output of an LSTM: We now decompose the numerator of p_i in Eq. (4.26) into a product of factors and show that we can interpret those factors as the contribution of individual words to the predicted probability of class i. Define

(4.27)

so that

exp left-parenthesis upper W Subscript i Baseline h Subscript upper T Baseline right-parenthesis equals exp left-parenthesis sigma-summation Underscript j equals 1 Overscript upper T Endscripts upper W Subscript i Baseline left-parenthesis o Subscript upper T Baseline circled-dot left-parenthesis hyperbolic tangent left-parenthesis c Subscript j Baseline right-parenthesis minus hyperbolic tangent left-parenthesis c Subscript j minus 1 Baseline right-parenthesis right-parenthesis right-parenthesis equals product Underscript j equals 1 Overscript upper T Endscripts beta Subscript i comma j Baseline period

As tanh (c_j) − tanh (c_{j − 1}) can be viewed as the update resulting from word j, so β_{i, j} can be interpreted as the multiplicative contribution to p_i by word j.

An additive decomposition of the LSTM Cell: We will show below that β_{i, j} captures some notion of the importance of a word to the LSTM’s output. However, these terms fail to account for how the information contributed by word j is affected by the LSTM’s forget gates between words j and T. Consequently, it was empirically found [93] that the importance scores from this approach often yield a considerable amount of false positives. A more nuanced approach is obtained by considering the additive decomposition of c_T in Eq. (4.28), where each term e_j can be interpreted as the contribution to the cell state c_T by word j. By iterating the equation , we obtain that

(4.28)

This suggests a natural definition of an alternative score to β_{i, j} , corresponding to augmenting the c_j terms with the products of the forget gates to reflect the upstream changes made to c_j after initially processing word j:

(4.29)

We now introduce a technique for using our variable importance scores to extract phrases from a trained LSTM. To do so, we search for phrases that consistently provide a large contribution to the prediction of a particular class relative to other classes. The utility of these patterns is validated by using them as input for a rules‐based classifier. For simplicity, we focus on the binary classification case.

Phrase extraction: A phrase can be reasonably described as predictive if, whenever it occurs, it causes a document to both be labeled as a particular class and not be labeled as any other. As our importance scores introduced above correspond to the contribution of particular words to class predictions, they can be used to score potential patterns by looking at a pattern’s average contribution to the prediction of a given class relative to other classes. In other words, given a collection of D documents , for a given phrase w₁, …., w_k we can compute scores S₁, S₂

Скачать книгу

Новинки

Популярные

Наши рекомендации

ТОП просматриваемых книг сайта:

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Информация о произведении: