Study notes

Research 1 essays

Third week

class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 5)
        self.pool1 = nn.MaxPool2d(2, 2)  # Here, translation = 1 is the default
        self.conv2 = nn.Conv2d(16, 32, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(32*5*5, 120) #Full connection
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):            #[batch, channel, height, width]
        x = F.relu(self.conv1(x))    # input(3, 32, 32) output(16, 28, 28) 
        							 # N = (W-F+2P)/S+1 = (32-5+0)/1 + 1 = 28
        x = self.pool1(x)            # output(16, 14, 14)
        x = F.relu(self.conv2(x))    # output(32, 10, 10)
        x = self.pool2(x)            # output(32, 5, 5)
        # print(x.size())            # torch.Size([36, 32, 5, 5]) 
        							 # [batch, channel, height, width]
        x = x.view(-1, 32*5*5)       # Output (32 * 5 * 5) "- 1" the computer helps us calculate
        							#It can also be written as x = x.view(36, 32*5*5)  Or 10000 #Each batch is flattened
        x = F.relu(self.fc1(x))      # output(120)
        x = F.relu(self.fc2(x))      # output(84)
        x = self.fc3(x)              # output(10)
        return x

Channel sorting of Python tensor: [batch, channel, height, width]

4 to 10 lines of code:


torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

The stride is the same as the size of the pool window to avoid stacking. (in fact, it is not important whether there is overlap in the pool)

Translation = 1, no spaces by default. Note: do not confuse with hole convolution

Tensor splicing

X =, X + 1), 1) # Cat dimension unchanged

Pool layer action

  • 1 reduce the amount of calculation

  • 2 make convolution less sensitive to position

Why is pooling used less and less

  • 1. Use convolution layer + stripe = pool layer to reduce position sensitivity

  • 2 enhance the data itself: displacement, amplification, rotation, eliminate position sensitivity, so that the convolution will not be over fitted


  • The convolution layer cross correlates the input and matrix rows, and adds the offset to get the output

  • Convolution kernel and offset are learnable parameters

  • The size of convolution kernel is a super parameter

torch.view() & torch.reshape()

The reshape that the view can do is capable. If the view can't do it, you can use reshape to handle it

x = x.view(-1, 32*5*5)       # output(32*5*5)  "-1"The computer can help us calculate or write x = x.view(36, 32*5*5)  Or 10000 #Each batch is flattened

Refer to test1_ offical_ demo -> -> 44Line

Week 5


a = torch.unsqueeze(a, dim=0) # add a dimension in dimension 0

>>> import torch
>>> a = torch.arange(0, 6)
>>> print(a.shape)
>>> a = torch.unsqueeze(a, 0)
>>> print(a.shape)
torch.Size([1, 6])
>>> print(a)
tensor([[0, 1, 2, 3, 4, 5]])


Flatten the tensor

 x = torch.flatten(x, start_dim=1) #Because [banch, channel, height, width], expand from the first dimension, and the 0-dimensional banch does not move

You can also use the previous torch.view() to flatten the tensor (refer to test2_alexnet - > model. Py - > 38line)


Sequential can package a series of operations

self.features = nn.Sequential(
	nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2), 
    nn.MaxPool2d(kernel_size=3, stride=2),

Call GPU

 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

The calculation formula of matrix size after convolution is reviewed

N = (W - F + 2P) / S + 1

  • Enter picture size W * W
  • Filter size F * F
  • Step s
  • Number of pixels in padding P

If the result is not divisible, the lines of the remainder are rounded off

Review python relative paths

"/": indicates the root directory. In windows system, it indicates the root directory of a disk, such as "E: \";

". /": indicates the current directory; (when it represents the current directory, you can also remove ". /" and write the file name or subordinate directory directly)

".. /": indicates the parent directory.



  • Too many feature dimensions

  • Model assumptions are too complex

  • Too many parameters

  • Too little training data and too much noise

Consequences of over fitting

The function can predict the training data perfectly, but the prediction of the test set of new data is poor

Over fitting training data without considering generalization ability.

Alex net's approach

In the way of Dropout, some neurons are randomly inactivated during the forward propagation of the network (the training parameters are reduced in disguise, so as to reduce over fitting)

Structure of AlexNet

layer_name kernel_size kernel_num padding stride
Conv1 11 96 [1, 2] 4
Maxpool1 3 None 0 2
Conv2 5 256 [2, 2] 1
Maxpool2 3 None 0 2
Conv3 3 384 [1, 1] 1
Conv4 3 384 [1, 1] 1
Conv5 3 256 [1, 1] 1
Maxpool3 3 None 0 2
FC1 2048 None None None
FC2 2048 None None None
FC3 1000 None None None


Splicing in depth

ResNet residual network

Through the heap residual network, the

  • Solve gradient disappearance and gradient explosion
  • degradation problem

The output matrix shape of the main branch and the side branch (sortcut shortcut) must be the same,

Addition at the same dimension position

Week 8


self.conv1 = nn.Conv2d(64, 64, 3, padding=1)
self.bn1 = nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
self.relu1 = nn.ReLU(inplace=True)

Order: convolution -- > batch standardization -- > relu

Concatenation operation

After the new vector is spliced into the original vector, the corresponding dimension increases

Code implementation of attention mechanism

        x3 = self.rfb2_1(h_nopool3)         # rfb3_1
        att_3 = self.att_1(x3)
        x3 = x3 * F.sigmoid(att_3)
        x4 = self.rfb3_1(h_nopool4)        # rfb4_1
        att_4 = self.att_2(x4)
        x4 = x4 * F.sigmoid(att_4)		   #Some attention mechanisms have to add themselves
        x5 = self.rfb4_1(h)                # rfb5_1
        att_5 = self.att_3(x5)
        x5 = x5 * F.sigmoid(att_5)
        detection_map = self.agg1(x5, x4, x3)
		self.att_1 = nn.Conv2d(32, 1, 3, padding=1)
        self.att_2 = nn.Conv2d(32, 1, 3, padding=1)
        self.att_3 = nn.Conv2d(32, 1, 3, padding=1)
        self.agg1 = aggregation_add(channel)

Question: why is it better to introduce impurities during supervision?

Next week's task (A2dele code adjustment: directly λ Set the value of to 0 and try to reproduce the ablation experiment)


Too few YOLO labels (unbalanced samples)

  • Use the display stand to rotate and take photos

  • Screenshot using video software

  • Using migration learning, migration "light" = = > "switch" (unrealistic)

  • Or directly use the box to mark and then modify it (being implemented)

cuda + cudnn + apex step pits installed in conda

Installing cuda

The same version of cuda should be installed not only in the virtual environment, but also in the physical machine, otherwise apex will not work.

Historical version CUDA Download

Historical version C udnn -collapse742-10

Install apex

Apex will report strange errors, most likely due to the introduction of incompatibilities in the latest version of apex

Use the following git statement to rewind the previous version of apex

git checkout f3a960f80244cf9e80558ab30f7f7e8cbf03c0a0 

Using apex

try:  # Mixed precision training
    from apex import amp
    print('Apex recommended for faster mixed precision training:')
    mixed_precision = False  
if mixed_precision:
    model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0)
if mixed_precision:
    with amp.scale_loss(loss, optimizer) as scaled_loss:

Risks of using apex

Using half precision optimization training may cause poor training effect

a. Hybrid accuracy training

b. Loss amplification


ImportError: cannot import name 'container_abcs' from 'torch._six'

Because container after version 1.8_ ABCs has been removed.

The solution is as follows

#if TORCH_MAJOR == 0:
	import as container_abcs
# else:
#	from torch._six import container_abcs

P2 pycococtools

P2 installation package for pycococtools Extraction code: i5d7 after installation, directly unzip and copy two folders of pycocotools into conda environment... \ lib \ site packages

yolov3 trample pit

The marked folders and pictures must not have Chinese, otherwise they need to be adjusted by software for a long time


To modify a category, you need to modify four places: two filters and two classes (127, 135, 171, 177)

Posted on Sun, 31 Oct 2021 12:22:12 -0400 by wesnoel