Скачать книгу

+ 1)/(vec(wi)T) and vec(xi + 1)/(vec(xi)T), we can easily get Eq. (3.79). The terms ∂vec(xi + 1)/(vec(wi)T) and ∂vec(xi + 1)/(vec(xi)T) are much easier to compute than directly computing ∂z/(vec(wi)T) and ∂vec(xi + 1)/(vec(xi)T) because xi is directly related to xi + 1 through a function with parameters wi. The details of these partial derivatives will be discussed in the following sections.

      3.6.2 Layers in CoNN

      Suppose we are considering the l‐th layer, whose inputs form an order‐3 tensor xl with x Superscript l Baseline element-of double-struck upper R Superscript upper H Super Superscript l Superscript times upper W Super Superscript l Superscript times upper D Super Superscript l. A triplet index set (il, jl, dl) is used to locate any specific element in xl. The triplet (il, jl, dl) refers to one element in xl, which is in the dl‐th channel, and at spatial location (il, jl) (at the ilth row, and jl‐th column). In actual CoNN learning, the mini‐batch strategy is usually used. In that case, xl becomes an order‐4 tensor in double-struck upper R Superscript upper H Super Superscript l Superscript times upper W Super Superscript l Superscript times upper D Super Superscript l Superscript times upper N, where N is the mini‐batch size. For simplicity we assume for the moment that N = 1. The results in this section, however, are easy to adapt to mini‐batch versions. In order to simplify the notations that will appear later, we follow the zero‐based indexing convention, which specifies that 0 ≤ il < Hl, 0 ≤ jl < Wl, and 0 ≤ dl < Dl. In the l‐th layer, a function will transform the input xl to an output =xl + 1. We assume the output has size Hl + 1 × Wl + 1 × Dl + 1, and an element in the output is indexed by a triplet (il + 1, jl + 1, dl + 1), 0 ≤ il + 1 < Hl + 1, 0 ≤ jl + 1 < Wl + 1, 0 ≤ dl + 1 < Dl + 1.

      The Rectified Linear Unit (ReLU) layer: An ReLU layer does not change the size of the input; that is, xl and y share the same size. The ReLU can be regarded as a truncation performed individually for every element in the input: y Subscript i comma j comma d Baseline equals max left-brace right-brace comma 0 comma x comma comma i comma j comma dl with 0 ≤ i < Hl = Hl + 1, 0 ≤ j < Wl = Wl + 1, and 0 ≤ d < Dl = Dl + 1. There is no parameter inside a ReLU layer, and hence there is no need for parameter learning in this layer.

Schematic illustration of the convolution operation. Schematic illustration of RGB image/three channels and three kernels.

      (3.80)y Subscript i Sub Superscript l plus 1 Subscript comma j Sub Superscript l plus 1 Subscript comma d Baseline equals sigma-summation Underscript i equals 0 Overscript upper H Endscripts sigma-summation Underscript j equals 0 Overscript upper W Endscripts sigma-summation Underscript d Superscript l Baseline equals 0 Overscript upper D Superscript l Baseline Endscripts f Subscript i comma j comma d Sub Superscript l Subscript comma d Baseline times x Subscript i Sub Superscript l plus 1 Subscript plus i comma j Sub Superscript l plus 1 Subscript plus j comma d Sub Superscript l Subscript Superscript l Baseline period

      Convolution as matrix product: There is a way to expand xl and simplify the convolution as a matrix product. Let us consider a special case with Dl = D = 1, H = W = 2, and Hl = 3, Wl = 4. That is, we consider convolving a small single‐channel 3 × 4 matrix (or image) with one

Скачать книгу