ТОП просматриваемых книг сайта:















Computational Statistics in Data Science. Группа авторов
Читать онлайн.Название Computational Statistics in Data Science
Год выпуска 0
isbn 9781119561088
Автор произведения Группа авторов
Жанр Математика
Издательство John Wiley & Sons Limited
where and
are weight matrix and bias, and
is the sigmoid function.
The two hidden states and
are calculated by
(14)
where represents elementwise product between matrices. In Equation (13), the first term multiplies
with
, controlling what information in the previous cell state can be passed to the current cell state. As for the second term,
stores the information passed from
and
, and
controls how much information from the current state is preserved in the cell state. The hidden state
depends on the current cell state and
, which decides how much information from the current cell state will be passed to the hidden state
.
Figure 9 Architecture of long short‐term memory network (LSTM).
In LSTM, if the loss is evaluated at
, the gradient w.r.t.
calculated via backpropagation can be written as
(15)
where represents other terms in the partial derivative calculation. Since the sigmoid function is used when calculating the values of
, this implies that they will be close to either 0 or 1. When
is close to 1, the gradient does not vanish, and when it is close to 0, it means that the previous information is not useful for the current state and should be forgotten.
7 Conclusion
We discussed the architectures of four types of neural networks and their extensions in this chapter. There have been many other neural networks proposed in the past years, but the ones discussed in this chapter are the classical ones that served as foundations for many other works. Though DNNs have achieved breakthroughs in many fields, the performances in many fields are far from perfect. Developing new architectures that can improve the performances on various tasks or solve new problems is an important research direction. Analyzing the properties and problems of existing architectures is also of great interest to the community.
References
1 1 Larochelle, H., Bengio,