Скачать книгу

target="_blank" rel="nofollow" href="#ulink_5ae11b65-34da-534a-9f75-3b061c70ebd7">Figure 3.23, we have

      where the first matrix is denoted as A, and * is the convolution operator.

      The MATLAB command B = im2col(A, [2 2]) gives the B matrix, which is an expanded version of A:

      Note that the first column of B corresponds to the first 2 × 2 region in A, in a column‐first order, corresponding to (il + 1, jl + 1) = (0, 0). Similarly, the second to last column in B correspond to regions in A with (il + 1, jl + 1) being (1, 0), (0, 1), (1, 1), (0, 2) and (1, 2), respectively. That is, the MATLAB im2col function explicitly expands the required elements for performing each individual convolution to create a column in the matrix B. The transpose, BT, is called the im2row expansion of A. If we vectorize the convolution kernel itself into a vector (in the same column‐first order) (1, 1, 1, 1)T, we find that

      If Dl > 1 (xl has more than one channel, e.g., in Figure 3.24 of RGB image/three channels), the expansion operator could first expand the first channel of xl, then the second, … , until all Dl channels are expanded. The expanded channels will be stacked together; that is, one row in the im2row expansion will have H × W × Dl elements, rather than H × W.

      Suppose xl is a third‐order tensor in double-struck upper R Superscript upper H Super Superscript l Superscript times upper W Super Superscript l Superscript times upper D Super Superscript l, with one element in xl represented by (il, jl, dl), and f is a set of convolution kernels whose spatial extent are all H × W. Then, the expansion operator (im2row) converts xl into a matrix φ(xl) with elements indexed as (p, q). The expansion operator copies the element at (il, jl, dl) in xl to the (p, q)‐th entry in φ(xl). From the description of the expansion process, given a fixed (p, q), we can calculate its corresponding (il, jl, dl) triplet from the relation

      As an example, dividing q by HW and take the integer part of the quotient, we can determine which channel (dl) belongs to.

      (3.85)v e c left-parenthesis y right-parenthesis equals v e c left-parenthesis x Superscript l plus 1 Baseline right-parenthesis equals v e c left-parenthesis normal phi left-parenthesis x Superscript l Baseline right-parenthesis upper F right-parenthesis period

      The Kronecker product: Given two matrices Am × n and Bp × q, the Kronecker product AB is a mp × nq matrix, defined as a block matrix

      (3.86)upper A circled-times upper B equals Start 3 By 3 Matrix 1st Row 1st Column a 11 upper B 2nd Column ellipsis ellipsis 3rd Column a Subscript 1 n Baseline upper B 2nd Row 1st Column vertical-ellipsis 2nd Column down-right-diagonal-ellipsis 3rd Column vertical-ellipsis 3rd Row 1st Column a Subscript m Baseline 1 Baseline upper B 2nd Column ellipsis ellipsis period 3rd Column a Subscript italic m n Baseline upper B EndMatrix

      The Kronecker product has the following properties that will be useful for us: