Written earlier, this article mainly helps you pick out the details and record what you don't understand in the learning process.
This article refers to the official website and Pytoch plays a strange way (I) pytoch performs CIFAR-10 classification (1) CIFAR-10 data loading and processing_ Morning flower & evening pick-up - CSDN blog
You are welcome to correct any mistakes
(1) Data loading and processing
transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.5, 0.5, 0.5))])
● ToTensor refers to mapping the values of PIL.Image(RGB) or numpy.ndarray(H x W x C) from 0 to 255 to the range of 0 to 1 and converting them into Tensor format.
This is a normalization process to eliminate the difference of features.
● Normalize(mean, std) is to realize standardization, and the formula is channel = (channel mean) / std, so that the overall data changes from general normal distribution to N (0, 1)
After standardization, the data center is realized, which conforms to the law of data distribution and can increase the generalization ability of the model.
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
● the data of cifar10 has been encapsulated in torchvision.datasets. Here is to download it. If you download the training set train = True, set the download test set to False and transform the data.
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
● trainset has a large amount of data. You need to use shuffle to split it into mini batch operations. You can use dataloader. ● the size of dataloader in pytorch will be determined according to batch_size is automatically resized. If the training data set has 1000 samples and batch_ If the size is 10, the length of the dataloader is 100. Shuffle indicates whether to disturb the data, num_workers indicates how many parallel processes are allowed (ps: I run with cpu in windows, and this value can only be set to 0)
(2) Model establishment
First, learn the basic knowledge of convolution:
The general architecture of CNN is as follows: input an image, first pass through the Convolution layer, and then do Max pooling (pooling is secondary sampling, which is equivalent to dividing the image into blocks, and then extracting the maximum value of each block, which can reduce the image), then do revolution, and then do Max pooling
This process can be repeated many times, and the number of times is determined in advance. Finally, flatten the data and throw the flatten output into the general fully connected network, and finally get the image recognition result.
class Net(nn.Module): # When we define the network, we usually create new subclasses from the inherited torch.nn.Module def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) # Add the first volume layer and call Conv2d() in nn self.pool = nn.MaxPool2d(2, 2) # Maximum pool layer self.conv2 = nn.Conv2d(6, 16, 5) # It is also a convolution layer self.fc1 = nn.Linear(16 * 5 * 5, 120) # Then there are three full connection layers self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # The forward propagation method is defined here. Why not define the back propagation method? This actually involves the torch.autograd module, # But to be honest, this part of the network definition has not used the knowledge of autograd, so we will talk about it later x = self.pool(F.relu(self.conv1(x))) # F is the alias of torch.nn.functional. Here, the relu function F.relu() is called x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 16 * 5 * 5) # . view() is a tensor method, which makes the tensor change the size, but the total number of elements remains unchanged. # The first parameter - 1 means that this parameter is determined by another parameter. For example, when the total number of matrix elements is certain, the number of columns can be determined to determine the number of rows. # So why do we only care about the number of columns and not the number of rows here? Because we are about to enter the full connection layer, which is simply matrix multiplication, # You will find that the first parameter of the first full connection layer is 16 * 5 * 5, so to ensure multiplication, adjust x to the correct size before matrix multiplication # For more Tensor methods, refer to Tensor: http://pytorch.org/docs/0.3.0/tensors.html x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x
● nn.Module is the base class of all neural networks. Any neural network defined by ourselves should inherit nn.Module
● Conv2d(in_channels, out_channels, kernel_size, stride=1,padding=0, dilation=1, groups=1,bias=True, padding_mode='zeros')
The input is in_ The image of the channels channel. The output is out_channels channel, i.e. out_channels is a convolution kernel, and the convolution kernel is the kernel_size*kernel_size.
(ps: difference between Conv1d and Conv2d:
The input of conv1 is three-dimensional, [batch, channels, w], the convolution kernel is one-dimensional, and the convolution operation is carried out on the third dimension along the second dimension.
The input of conv2 is four-dimensional, which is commonly used for image convolution, [batch, channels, H, W]. The convolution kernel is rectangular, which is carried out in three or four dimensions.
The image format is like this. There are 4 batch es and 3 channels. The length and width are 32 pixels
thereforeIt means that the output of the three input channels is 6 channels, and the convolution core of 5x5 size is used in the convolution layer
Remember, image convolution conv2d.)
● torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Generally, the kernel is set_ Size and stripe (step size), equivalent to a kernal size of 2 * 2. Pool 4 * 4 matrices into 2 * 2 matrices each time.
● full connection layer:
Convolution takes local features, and full connection is to reassemble the previous local features into a complete graph through the weight matrix.
Because all local features are used, it is called full connection.
● x = x.view()
The four-dimensional tensor can be transformed into a two-dimensional tensor, which can be used as the input of the full connection layer
(3) Loss function and optimizer
criterion = nn.CrossEntropyLoss() #The cross entropy loss function in neural network toolbox nn is also used optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9) #SGD gradient optimization method in optim module --- random gradient descent
● cross entropy:
pytorch encapsulates all the optimization methods commonly used in deep learning in torch.optim