Handwritten numeral recognition neural network based on LeNet5

As for CNN, various network structures have been proposed so far. Here, we focus on the CNN tuple LeNet, which was first proposed in 1998. LeNet sub ah was proposed in 1998. It is a network for handwritten numeral recognition. As shown in the figure below, it has a continuous convolution layer and pooling layer (correctly, it is a sub sampling layer of "sampling elements"), and finally outputs the results through full connection.

In the initial LeNet, the 32 * 32 image is input, and the feature map with channel 6 and size 28 * 28 is output through the convolution layer. After Subsampling pooling, the image size is changed to 14 * 14, (stripe = 2) convolution and output_ The channel is changed to 16 and the size is 10 * 10. After a layer of sub sampling pooling, the image is finally changed to 5 * 5, transmitted to the full connection layer, and output after processing by the full connection layer. The specific processing flow is as follows:

Compared with CNN today, LeNet has several differences. The first difference is the activation function. The sigmoid function is used in LeNet  , Now CNN mainly uses the ReLU function. In addition, the original LeNet used Subsampling to reduce the size of intermediate data, while Max pooling is the mainstream in CNN today.

Next, we complete the identification of MNIST data set based on LeNet5 network:

First, let's create a dataset. Here, it can be said that downloading such a simple dataset using datasets is not easy to use

mnist_train = datasets.MNIST('MNIST',True,transform=transforms.Compose([
    mnist_train = DataLoader(mnist_train,batch_size=batch_size,shuffle=True)

    mnist_test = datasets.MNIST('MNIST',False,transform=transforms.Compose([
    mnist_test = DataLoader(mnist_test,batch_size=batch_size,shuffle=True)

Let's output the downloaded data set and see what happens (batch_size = 32)

 x,label = iter(mnist_train).next()
 print('x:',x.shape,' label:',label.shape)

#Output result: X: torch.size ([32, 1, 28, 28]) label: torch.size ([32])

Let's establish a LeNet network:

class lenet5(nn.Module):
    def __init__(self):
        super(lenet5, self).__init__()
        # convolutions
        self.cov_unit = nn.Sequential(
        self.fc_unit = nn.Sequential(

    def forward(self,x):
        batchsz = x.size(0)
        x = self.cov_unit(x)
        x = x.view(batchsz,16*5*5)
        logits = self.fc_unit(x)
        return logits

Here we need to learn from: Reading of LeNet paper: LeNet structure and calculation of the number of parameters_ silent56_th blog - CSDN bloghttps://blog.csdn.net/silent56_th/article/details/53456522

In the blogger's blog, I explained the parameter analysis of input data and hidden layer and why not adopt full connection: my own opinion on Kernel_ The selection of parameters in the size part is not well understood. Here is a reference:

The LeNet network has been built. The optimizer and loss function defined below have been accelerated by GPU:

device = torch.device('cuda')
model = lenet5().to(device)
criteon = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.parameters(),lr=1e-3)

  Here, we can print out the model on the console and observe the LeNet network model as a whole:

# Console printout
model: lenet5(
  (cov_unit): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
    (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), padding=(1, 1))
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc_unit): Sequential(
    (0): Linear(in_features=400, out_features=120, bias=True)
    (1): ReLU()
    (2): Linear(in_features=120, out_features=84, bias=True)
    (3): ReLU()
    (4): Linear(in_features=84, out_features=10, bias=True)

  It can be seen from here that it is basically a network model we have set up. Now the network has been built, and the optimizer and parameters are set. Let's start the training:

for batchidx,(x,label) in enumerate(mnist_train):
    x,label = x.to(device),label.to(device)
    logits = model(x)
    loss = criteon(logits,label)
 print('epoch:',epoch,' loss:',loss.item())

logits here originally refers to sigmoid function (standard logits function), but it is used here to represent the final full connection layer output rather than its original intention. At the end of each epoch, the value of the loss function loss is printed on the console

The following tests were performed:

 with torch.no_grad():
     total_num = 0
     total_correct = 0
     for x,label in mnist_test:
         x,label = x.to(device),label.to(device)
         logits = model(x)
         pred = logits.argmax(dim=1)
         total_correct += torch.eq(pred,label).float().sum().item()
         total_num += x.size(0)
         acc = total_correct/total_num
      print('epoch:',epoch,' accuarcy:',acc)

  Here, pred = logits.argmax (dim=1). The argmax function is the index that returns the maximum value, that is, the index with the maximum probability of prediction results after training. Here, pred is compared with the supervision label label, and if equal, it is added to total_ In correct, acc is finally calculated.

Before training and testing, model.train() and model.eval() are added respectively

(1). model.train()
Enable BatchNormalization and Dropout and set BatchNormalization and Dropout to True
(2). model.eval()
Do not enable BatchNormalization and Dropout, set BatchNormalization and Dropout to False

The idea of BatchNormalization is to adjust the activation value distribution of each layer so that it has an appropriate breadth. Inserting the layer of data distribution into the neural network for normalization can make the learning fast, less dependent on the initial value, and inhibit over fitting to a certain extent

Dropout is a method of randomly deleting neurons in the learning process. By randomly selecting and deleting neurons and stopping transmitting signals forward, dropout can reduce the gap between the recognition accuracy of training data and test data, and over fitting can be suppressed even in highly expressive networks.

Finally, in order to better display and feedback the data, we use visdom for visualization

viz = Visdom()
viz.line([0.], [0.], win='train_loss', opts=dict(title='train_loss'))
global_step = 0

For train_loss, starting from the [0,0] coordinate, after each epoch is executed, global_step += 1  

  global_step += 1
  viz.line([loss.item()],[global_step],win='train_loss', update='append')

  Draw the loss calculated in this epoch as a line chart

viz.images(x.view(-1, 1, 28, 28), win='x')
viz.text(str(pred.detach().cpu().numpy()), win='pred',

At this time, x.shape is [16,1,28,28], and str (pred.detach(). CPU. Numpy()) changes the predicted value to data type and prints it

After 15 epoch s, we can see that the recognition accuracy has been very high. Let's take a look at the visualization results of visdom:

  It can also be seen that although it fluctuates slightly, the train_loss is still decreasing gradually. We extracted 10 data for display, which can be seen,   The predicted results are also very accurate.

Conclusion: LeNet was first proposed by CNN in 1998. Although it is a little different from today's CNN, it is not very different. Considering that it was proposed very early, LeNet is still very amazing

Tags: neural networks Deep Learning

Posted on Fri, 24 Sep 2021 22:16:21 -0400 by sdallas411