[experience sharing] comparison of convolution output shape calculation methods between pytorch and darknet

Welcome to my official account, reply to 001 Google programming specification.

  O_o   >_<   o_O   O_o   ~_~   o_O

  this paper records the comparison between pytorch and darknet in calculating convolution output shape. There are some differences between them.

    Convolution operator is the most commonly used operator in deep learning. In neural networks, it often involves the derivation of operator shape. Here we mainly talk about the shape derivation of cov layer, and compare pytorch and darknet.

1. Derivation of pytorch convolution output shape

    Look at the source code. The answer is in the comments of the torch/nn/modules/conv.py script class Conv2d(_ConvNd):

Shape:
    - Input: :math:`(N, C_{in}, H_{in}, W_{in})`
    - Output: :math:`(N, C_{out}, H_{out}, W_{out})` where

      .. math::
          H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                    \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

      .. math::
          W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                    \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

  the above calculation formula is LaTeX format, which is a TEX based typesetting system invented by American computer scientist Leslie Lambert. Using this format, complex tables and mathematical formulas can be generated quickly. However, the readability of this format is poor for us. Here, use the formula function of the rich editor to convert it. Let's take a look at H_{out}:

  the expansion coefficient of cavity convolution is considered here. If the division is removed, the shape derivation will be as follows:

  the above is the derivation of convolution shape in pytorch. Let's take a look at the situation in darknet.

2. Derivation of darknet convolution output shape

    Also look at the source code. In parser.c:

int size = option_find_int(options, "size", 1);
int pad = option_find_int_quiet(options, "pad", 0);
int padding = option_find_int_quiet(options, "padding", 0);
if(pad) padding = size/2;

    Where option_find_int and option_find_int_quiet is a linked list lookup function. There are some differences here. Pad can be understood as filling flag bit. The actual filling value is padding. From the above code, it can be seen that the actual padding is not the padding read directly from the model file, but the padding obtained after judging and processing through the flag bit pad.

     After the above processing, the convolution output shape is calculated. The code is in convolutional_ In layer. C:

///Call interface
/// int out_h = convolutional_out_height(l);
/// int out_w = convolutional_out_width(l);

int convolutional_out_height(convolutional_layer l)
{
    return (l.h + 2*l.pad - l.size) / l.stride_y + 1;
}

int convolutional_out_width(convolutional_layer l)
{
    return (l.w + 2*l.pad - l.size) / l.stride_x + 1;
}

    Put the above out_h. The calculation process is visualized as follows:

  there may be a doubt here. How can I directly use l.pad instead of l.padding? The answer is here:

l.pad = padding;

    This makes sense. Let's take another case, as shown in the following figure:

  look at the layer in the box. If the layout is n c h w, the input is [n, 16, 256, 256]. From the visualization, it can be seen that the shape after this convolution is still [n, 16, 256, 256]. If we use the calculation method of pytorch: Hout = (Hin + 2 * padding - (size - 1) - 1) / Stripe + 1, you may think that pad is padding, hout = (256 + 2)- (1 - 1) - 1) / 1 + 1 = 258. At this time, the result is obviously wrong, so there is a problem with this calculation method. We use the calculation method in darknet, that is, take pad as the flag bit and calculate padding first. If padding is not written in the operator parameter, make padding = 0. The calculation process is as follows:

size = 1
padding = 0
pad = 1
    
/// if(pad) padding = size / 2;
padding = 1 / 2 = 0

/// Hout = (Hin + 2 * padding - size) / stride + 1
Hout = (256 + 2 * 0 - 1) + 1 = 256

    / in C language is a common division. When the result is a floating-point number, the number after the decimal point is directly rounded down, that is, rounded down.

  so the results can be right.

    The above shares the comparison of the calculation methods of convolution output shape in pytorch and darknet. It is also a small pit that is easy to encounter. I hope my sharing will be of some help to you.

[official account transmission]
<[experience sharing] comparison of convolution output shape calculation methods between pytorch and darknet>

Tags: Algorithm AI neural networks Pytorch Deep Learning

Posted on Thu, 11 Nov 2021 19:50:14 -0500 by todding01