Скачать книгу

parameters of AlexNet.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 X X X X X X X X X X
1 X X X X X X X X X X
2 X X X X X X X X X X
3 X X X X X X X X X X
4 X X X X X X X X X X
5 X X X X X X X X X X

      First Layer: AlexNet accepts a 227 × 227 × 3 RGB image as input which is fed to the first convolutional layer with 96 kernels (feature maps or filters) of size 11 × 11 × 3 and a stride of 4 and the dimension of the output image is changed to 96 images of size 55 × 55. The next layer is max-pooling layer or sub-sampling layer which uses a window size of 3 × 3 and a stride of two and produces an output image of size 27 × 27 × 96.

      Second Layer: The second convolutional layer filters the 27 × 27 × 96 image with 256 kernels of size 5 × 5 and a stride of 1 pixel. Then, it is followed by max-pooling layer with filter size 3 × 3 and a stride of 2 and the output image is changed to 256 images of size 13 × 13.

      Third, Fourth, and Fifth Layers: The third, fourth, and fifth convolutional layers uses filter size of 3 × 3 and a stride of one. The third and fourth convolutional layer has 384 feature maps, and fifth layer uses 256 filters. These layers are followed by a maximum pooling layer with filter size 3 × 3, a stride of 2 and have 256 feature maps.

      Sixth Layer: The 6 × 6 × 256 image is flattened as a fully connected layer with 9,216 neurons (feature maps) of size 1 × 1.

      Seventh and Eighth Layers: The seventh and eighth layers are fully connected layers with 4,096 neurons.

      Output Layer: The activation used in the output layer is softmax and consists of 1,000 classes.

      1.2.3 ZFNet

      The architecture of ZFNet introduced by Zeiler [3] is same as that of the AlexNet, but convolutional layer uses reduced sized kernel 7 × 7 with stride 2. This reduction in the size will enable the network to obtain better hyper-parameters with less computational efficiency and helps to retain more features. The number of filters in the third, fourth and fifth convolutional layers are increased to 512, 1024, and 512. A new visualization technique, deconvolution (maps features to pixels), is used to analyze first and second layer’s feature map.

Sl. no. Layer Kernel size Stride Activation shape Weights Bias # Parameters Activation # Connections

Скачать книгу