Скачать книгу

in the following way: initiated with random weights or values, the connections between neurons updates its weights or values by the back propagation algorithm repeatedly till the model performs rather precisely. In the end, the knowledge that a neural network learned is stored in the connections in a digital manner. Most of the researches on neural network try to change the way it learns (with different algorithms or different structures), aiming to improve the generalization ability of the model.

      The basic units of neural networks are neurons, which can receive a series of inputs and return the corresponding output. A classic neuron is as shown in Figure 3.1. Where the neuron receives n inputs x1, x2, …, xn with corresponding weights w1, w2, …, wn and an offset b. Then the weighted summation Image passes through an activation function f and the neuron returns the output z = f(y). Note that the output will be the input of the next neuron. The activation function is a kind of function that maps a real number to a number between 0 and 1 (with rare exceptions), which represents the activation of the neuron, where 0 indicates deactivated and 1 indicates fully activated. Several useful activation functions are shown as follows.

      • Sigmoid Function (Figure 3.2):

Image

      • Tanh Function (Figure 3.3):

Image

      • ReLU (Rectified Linear Unit) (Figure 3.4):

Image

      Figure 3.1: A classic neuron structure.

      Figure 3.2: The Sigmoid function.

      In fact, there are many other activation functions and each has its corresponding derivatives. But do remember that a good activation function is always smooth (which means that it is a continuous differentiable function) and easily calculated (in order to minimize the computational complexity of the neural network). During the training of a neural network, the choice of activation function is usually essential to the outcome.

      During the training of a neural network, the back propagation algorithm is most commonly used. It is an algorithm based on gradient descend to optimize the parameters in a model. Let’s take the single neuron model illustrated above for an example. Suppose the optimization target for the output z is z0, which will be approached by adjusting the parameters w1, w2, …, wn, b.

      Figure 3.3: The Tanh function.

      Figure 3.4: The ReLU (Rectified Linear Unit) function.

      By the chain rule, we can deduce the derivative of z with respect to wi and b:

Image Image

      With a learning rate of η, the update for each parameter will be:

Image

      Figure 3.5: Feedforward neural network.

Image

      In summary, the process of the back propagation consists of the following two steps.

      • Forward calculation: given a set of parameters and an input, the neural network computes the values at each neuron in a forward order.

      • Backward propagation: compute the error at each variable to be optimized, and update the parameters with their corresponding partial derivatives in a backward order.

      The above two steps will go on repeatedly until the optimization target is acquired.

      Recently, there is a booming development in the field of machine learning (especially deep learning), represented by the appearance of a variety of neural network structures. Though varying widely, the current neural network structures can be classified into several categories: feedforward neural networks, convolutional neural networks, recurrent neural networks, and GNNs.

      • Feedforward neural network: The feedforward neural network (FNN) (Figure 3.5) is the first and simplest network architecture of artificial neural network. The FNN usually contains an input layer, several hidden layers, and an output layer. The feedforward neural network has a clear hierarchical structure, which always consists of multiple layers of neurons, and each layer is only connected to its neighbor layers. There are no loops in this network.

      • Convolutional neural network: Convolutional neural networks (CNNs) are special versions of FNNs. FNNs are usually fully connected networks while CNNs preserve the local connectivity. The CNN architecture usually contains convolutional layers, pooling layers, and several fully connected layers. There exist several classical CNN architectures such as LeNet5 [LeCun et al., 1998], AlexNet [Krizhevsky et al., 2012] (Figure 3.6), VGG [Simonyan and Zisserman, 2014], and GoogLeNet [Szegedy et al., 2015]. CNNs are widely used in the area of computer vision and proven to be effective in many other research fields.

      • Recurrent neural network: In comparison with FNN, the neurons in recurrent neural network (RNN) receive not only signals and inputs from other neurons, but also its own historical information. The memory mechanism in recurrent neural network (RNN) help the model to process series data effectively. However, the RNN usually suffers from the problem of long-term dependencies [Bengio et al., 1994, Hochreiter et al., 2001]. Several variants are proposed to solve the problem by incorporating the gate mechanism such as GRU [Cho et al., 2014] and LSTM [Hochreiter and Schmidhuber, 1997]. The RNN is widely used in the area of speech and natural language processing.

      • Graph neural network: The GNN is designed specifically to handle graph-structured data, such as social networks, molecular structures, knowledge graphs, etc. Detailed descriptions of GNNs will be covered in the later chapters of this book.

      Figure

Скачать книгу