As for CNN, various network structures have been proposed so far. Here, we focus on the CNN tuple LeNet, which was first proposed in 1998. LeNet sub ah was proposed in 1998. It is a network for handwritten numeral recognition. As shown in the figure below, it has a continuous convolution layer and pooling layer (correctly, it is a sub sampling layer of "sampling elements"), and finally outputs the results through full connection.
In the initial LeNet, the 32 * 32 image is input, and the feature map with channel 6 and size 28 * 28 is output through the convolution layer. After Subsampling pooling, the image size is changed to 14 * 14, (stripe = 2) convolution and output_ The channel is changed to 16 and the size is 10 * 10. After a layer of sub sampling pooling, the image is finally changed to 5 * 5, transmitted to the full connection layer, and output after processing by the full connection layer. The specific processing flow is as follows:
Compared with CNN today, LeNet has several differences. The first difference is the activation function. The sigmoid function is used in LeNet ， Now CNN mainly uses the ReLU function. In addition, the original LeNet used Subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in CNN today.
Next, we complete the identification of MNIST data set based on LeNet5 network:
First, let's create a dataset. Here, it can be said that downloading such a simple dataset using datasets is not easy to use
mnist_train = datasets.MNIST('MNIST',True,transform=transforms.Compose([ transforms.Resize((28,28)), transforms.ToTensor() ]),download=True) mnist_train = DataLoader(mnist_train,batch_size=batch_size,shuffle=True) mnist_test = datasets.MNIST('MNIST',False,transform=transforms.Compose([ transforms.Resize((28,28)), transforms.ToTensor() ]),download=True) mnist_test = DataLoader(mnist_test,batch_size=batch_size,shuffle=True)
Let's output the downloaded data set and see what happens (batch_size = 32)
x,label = iter(mnist_train).next() print('x:',x.shape,' label:',label.shape) #Output result: X: torch.size ([32, 1, 28, 28]) label: torch.size ()
Let's establish a LeNet network:
class lenet5(nn.Module): """ for MNIST DATASET """ def __init__(self): super(lenet5, self).__init__() # convolutions self.cov_unit = nn.Sequential( nn.Conv2d(1,6,kernel_size=5,stride=1,padding=1), nn.MaxPool2d(kernel_size=2,stride=2,padding=0), nn.Conv2d(6,16,kernel_size=5,stride=1,padding=1), nn.MaxPool2d(kernel_size=2,stride=2,padding=0) ) #flatten self.fc_unit = nn.Sequential( nn.Linear(16*5*5,120), nn.ReLU(), nn.Linear(120,84), nn.ReLU(), nn.Linear(84,10) ) def forward(self,x): batchsz = x.size(0) x = self.cov_unit(x) x = x.view(batchsz,16*5*5) logits = self.fc_unit(x) return logits
In the blogger's blog, I explained the parameter analysis of input data and hidden layer and why not adopt full connection: my own opinion on Kernel_ The selection of parameters in the size part is not well understood. Here is a reference:
The LeNet network has been built. The optimizer and loss function defined below have been accelerated by GPU:
device = torch.device('cuda') model = lenet5().to(device) criteon = nn.CrossEntropyLoss().to(device) optimizer = optim.Adam(model.parameters(),lr=1e-3)
Here, we can print out the model on the console and observe the LeNet network model as a whole:
# Console printout model: lenet5( (cov_unit): Sequential( (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1)) (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1)) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (fc_unit): Sequential( (0): Linear(in_features=400, out_features=120, bias=True) (1): ReLU() (2): Linear(in_features=120, out_features=84, bias=True) (3): ReLU() (4): Linear(in_features=84, out_features=10, bias=True) ) )
It can be seen from here that it is basically a network model we have set up. Now the network has been built, and the optimizer and parameters are set. Let's start the training:
for batchidx,(x,label) in enumerate(mnist_train): x,label = x.to(device),label.to(device) logits = model(x) loss = criteon(logits,label) optimizer.zero_grad() loss.backward() optimizer.step() print('epoch:',epoch,' loss:',loss.item())
logits here originally refers to sigmoid function (standard logits function), but it is used here to represent the final full connection layer output rather than its original intention. At the end of each epoch, the value of the loss function loss is printed on the console
The following tests were performed:
model.eval() with torch.no_grad(): total_num = 0 total_correct = 0 for x,label in mnist_test: x,label = x.to(device),label.to(device) logits = model(x) pred = logits.argmax(dim=1) total_correct += torch.eq(pred,label).float().sum().item() total_num += x.size(0) acc = total_correct/total_num print('epoch:',epoch,' accuarcy:',acc)
Here, pred = logits.argmax (dim=1). The argmax function is the index that returns the maximum value, that is, the index with the maximum probability of prediction results after training. Here, pred is compared with the supervision label label, and if equal, it is added to total_ In correct, acc is finally calculated.
Before training and testing, model.train() and model.eval() are added respectively
Enable BatchNormalization and Dropout and set BatchNormalization and Dropout to True
Do not enable BatchNormalization and Dropout, set BatchNormalization and Dropout to False
The idea of BatchNormalization is to adjust the activation value distribution of each layer so that it has an appropriate breadth. Inserting the layer of data distribution into the neural network for normalization can make the learning fast, less dependent on the initial value, and inhibit over fitting to a certain extent
Dropout is a method of randomly deleting neurons in the learning process. By randomly selecting and deleting neurons and stopping transmitting signals forward, dropout can reduce the gap between the recognition accuracy of training data and test data, and over fitting can be suppressed even in highly expressive networks.
Finally, in order to better display and feedback the data, we use visdom for visualization
viz = Visdom() viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss')) global_step = 0
For train_loss, starting from the [0,0] coordinate, after each epoch is executed, global_step += 1
global_step += 1 viz.line([loss.item()],[global_step],win='train_loss', update='append')
Draw the loss calculated in this epoch as a line chart
viz.images(x.view(-1, 1, 28, 28), win='x') viz.text(str(pred.detach().cpu().numpy()), win='pred', opts=dict(title='pred'))
At this time, x.shape is [16,1,28,28], and str (pred.detach(). CPU. Numpy()) changes the predicted value to data type and prints it
After 15 epoch s, we can see that the recognition accuracy has been very high. Let's take a look at the visualization results of visdom:
It can also be seen that although it fluctuates slightly, the train_loss is still decreasing gradually. We extracted 10 data for display, which can be seen, The predicted results are also very accurate.
Conclusion: LeNet was first proposed by CNN in 1998. Although it is a little different from today's CNN, it is not very different. Considering that it was proposed very early, LeNet is still very amazing