Introduction to pytorch convolutional neural network

Establish CNN network for training

Define network

nn.Conv2d

nn.Conv2d(in_channels,out_channels, kernel_size, stride, padding)
  • in_channels refers to the input channel, here is (36464)
  • out_channels indicates the number of output channels
  • kernel_size indicates the size of convolution kernel
  • Stripe indicates the step size for each slide
  • padding indicates the edge filling length. If the value is not assigned, this part will be discarded if there are not enough elements for convolution at the edge.

Convolution output size formula:
W 2 = W 1 − F + 2 P S + 1 W_2=\frac{W_1-F+2P}{S}\quad+1 W2​=SW1​−F+2P​+1
W: Width F: convolution kernel size S: step size P: padding
You can't divide by an integer.

Pooling

pooling combined with convolution can effectively reduce the parameters and accelerate the convergence speed.
Here we use maxpooling. pytorch also includes average pooling.

Dropout

It makes the network drop some neurons randomly during training and do not participate in training.

nn.Sequential()

A layer chain can be defined through nn.Sequential(). The network can be broken down into more logical arrangements.
Here we will set up a feature extraction layer and a classification layer.
According to the input picture format: 64 * 64, batchsize=64 as an example

class CNNNet(nn.Module):
    def __init__(self, num_classes=2):
        super(CNNNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),  # Output: number of channels: 64 output size: 15
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 64, output size: 7
            nn.Conv2d(64, 192, kernel_size=5, padding=2),  # Output: number of channels: 192, output size: 7
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 192, output size: 3
            nn.Conv2d(192, 384, kernel_size=3, padding=1),  # Output: number of channels: 384, output size: 3
            nn.ReLU(),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),  # Output: number of channels: 256, output size: 3
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),  # Output: number of channels: 256, output size: 3
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 256, output size: 1 // batch_size = 64, channel = 265 outsize:1,1
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))  # Output: 256 * 6 * 6
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)  # 64,256,6,6
        x = torch.flatten(x, 1)  # 64,9216
        x = self.classifier(x)
        return x

The following is the complete code.

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torch.nn.functional as F
import torchvision
from torchvision import transforms
from PIL import Image


# Defining neural networks
class net(nn.Module):
    def __init__(self):
        super(net, self).__init__()
        self.fc1 = nn.Linear(12288, 84)  # 64*64*3
        self.fc2 = nn.Linear(84, 30)
        self.fc3 = nn.Linear(30, 84)
        self.fc4 = nn.Linear(84, 2)

    def forward(self, x):
        x = x.view(-1, 12288)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x


# convnet
class CNNNet(nn.Module):
    def __init__(self, num_classes=2):
        super(CNNNet, self).__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),  # Output: number of channels: 64 output size: 15
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 64, output size: 7
            nn.Conv2d(64, 192, kernel_size=5, padding=2),  # Output: number of channels: 192, output size: 7
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 192, output size: 3
            nn.Conv2d(192, 384, kernel_size=3, padding=1),  # Output: number of channels: 384, output size: 3
            nn.ReLU(),
            nn.Conv2d(384, 256, kernel_size=3, padding=1),  # Output: number of channels: 256, output size: 3
            nn.ReLU(),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),  # Output: number of channels: 256, output size: 3
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),  # Output: number of channels: 256, output size: 1 // batch_size = 64, channel = 265 outsize:1,1
        )
        self.avgpool = nn.AdaptiveAvgPool2d((6, 6))  # Output: 256 * 6 * 6
        self.classifier = nn.Sequential(
            nn.Dropout(),
            nn.Linear(256 * 6 * 6, 4096),
            nn.ReLU(),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Linear(4096, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)  # 64,256,6,6
        x = torch.flatten(x, 1)  # 64,9216
        x = self.classifier(x)
        return x


cnnnet = CNNNet()


def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cpu"):
    for epoch in range(epochs):
        training_loss = 0.0
        valid_loss = 0.0
        model.train()
        for batch in train_loader:
            optimizer.zero_grad()
            inputs, targets = batch
            inputs = inputs.to(device)
            targets = targets.to(device)
            output = model(inputs)
            loss = loss_fn(output, targets)
            loss.backward()
            optimizer.step()
            training_loss += loss.data.item() * inputs.size(0)
        training_loss /= len(train_loader.dataset)

        model.eval()
        num_correct = 0
        num_examples = 0
        for batch in val_loader:
            inputs, targets = batch
            inputs = inputs.to(device)
            output = model(inputs)
            targets = targets.to(device)
            loss = loss_fn(output, targets)
            valid_loss += loss.data.item() * inputs.size(0)
            correct = torch.eq(torch.max(F.softmax(output), dim=1)[1], targets).view(-1)
            num_correct += torch.sum(correct).item()
            num_examples += correct.shape[0]
        valid_loss /= len(val_loader.dataset)

        print(
            'Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss,
                                                                                                  valid_loss,
                                                                                                  num_correct / num_examples))


def check_image(path):
    try:
        im = Image.open(path)
        return True
    except:
        return False


img_transforms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
train_data_path = "./train/"
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=img_transforms, is_valid_file=check_image)
val_data_path = "./val/"
val_data = torchvision.datasets.ImageFolder(root=val_data_path, transform=img_transforms, is_valid_file=check_image)
batch_size = 64
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size, shuffle=True)
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

cnnnet.to(device)
optimizer = optim.Adam(cnnnet.parameters(), lr=0.001)
train(cnnnet, optimizer, torch.nn.CrossEntropyLoss(), train_data_loader, val_data_loader, epochs=40, device=device)

Training results

Epoch: 0, Training Loss: 0.33, Validation Loss: 0.41, accuracy = 0.76
Epoch: 1, Training Loss: 0.40, Validation Loss: 0.38, accuracy = 0.82
Epoch: 2, Training Loss: 0.34, Validation Loss: 0.43, accuracy = 0.78
Epoch: 3, Training Loss: 0.28, Validation Loss: 0.44, accuracy = 0.76
Epoch: 4, Training Loss: 0.33, Validation Loss: 0.45, accuracy = 0.84
Epoch: 5, Training Loss: 0.31, Validation Loss: 0.41, accuracy = 0.84
Epoch: 6, Training Loss: 0.24, Validation Loss: 0.42, accuracy = 0.83
Epoch: 7, Training Loss: 0.24, Validation Loss: 0.39, accuracy = 0.82
Epoch: 8, Training Loss: 0.22, Validation Loss: 0.50, accuracy = 0.84
Epoch: 9, Training Loss: 0.22, Validation Loss: 0.73, accuracy = 0.69
Epoch: 10, Training Loss: 0.26, Validation Loss: 0.48, accuracy = 0.78
Epoch: 11, Training Loss: 0.19, Validation Loss: 0.80, accuracy = 0.79
Epoch: 12, Training Loss: 0.19, Validation Loss: 0.50, accuracy = 0.81
Epoch: 13, Training Loss: 0.15, Validation Loss: 0.72, accuracy = 0.76
Epoch: 14, Training Loss: 0.15, Validation Loss: 0.72, accuracy = 0.80
Epoch: 15, Training Loss: 0.22, Validation Loss: 0.40, accuracy = 0.81
Epoch: 16, Training Loss: 0.20, Validation Loss: 0.72, accuracy = 0.78
Epoch: 17, Training Loss: 0.11, Validation Loss: 0.86, accuracy = 0.83
Epoch: 18, Training Loss: 0.09, Validation Loss: 1.20, accuracy = 0.77
Epoch: 19, Training Loss: 0.09, Validation Loss: 0.87, accuracy = 0.76
Epoch: 20, Training Loss: 0.09, Validation Loss: 0.90, accuracy = 0.84
Epoch: 21, Training Loss: 0.14, Validation Loss: 0.75, accuracy = 0.81
Epoch: 22, Training Loss: 0.09, Validation Loss: 1.06, accuracy = 0.83
Epoch: 23, Training Loss: 0.09, Validation Loss: 0.95, accuracy = 0.83
Epoch: 24, Training Loss: 0.13, Validation Loss: 0.70, accuracy = 0.83
Epoch: 25, Training Loss: 0.08, Validation Loss: 1.14, accuracy = 0.74
Epoch: 26, Training Loss: 0.08, Validation Loss: 0.77, accuracy = 0.81
Epoch: 27, Training Loss: 0.03, Validation Loss: 1.45, accuracy = 0.85
Epoch: 28, Training Loss: 0.09, Validation Loss: 1.16, accuracy = 0.79
Epoch: 29, Training Loss: 0.10, Validation Loss: 0.94, accuracy = 0.85
Epoch: 30, Training Loss: 0.08, Validation Loss: 0.78, accuracy = 0.85
Epoch: 31, Training Loss: 0.08, Validation Loss: 0.78, accuracy = 0.83
Epoch: 32, Training Loss: 0.03, Validation Loss: 2.64, accuracy = 0.72
Epoch: 33, Training Loss: 0.24, Validation Loss: 0.38, accuracy = 0.86
Epoch: 34, Training Loss: 0.17, Validation Loss: 0.89, accuracy = 0.81
Epoch: 35, Training Loss: 0.05, Validation Loss: 1.04, accuracy = 0.83
Epoch: 36, Training Loss: 0.04, Validation Loss: 1.08, accuracy = 0.83
Epoch: 37, Training Loss: 0.02, Validation Loss: 1.49, accuracy = 0.83
Epoch: 38, Training Loss: 0.06, Validation Loss: 0.80, accuracy = 0.82
Epoch: 39, Training Loss: 0.07, Validation Loss: 1.55, accuracy = 0.76

CNN model in history

LeNet-5

LeNet-5 was first proposed in 1990

AlexNet

AlexNet was released in 2012, and the top-5 error rate was 15.3% in the ImageNet competition that year. First.

It first introduced maxpool and dropout, and extended the ReLU activation function. It also proves that deep learning is effective. It has also become a milestone in the history of deep learning.

Inception/GoogLeNet

This is the winner of the 2014 ImageNet competition. It solves some defects of AlexNet.

Its structure is shown in the figure. The network uses different convolution kernels to extract different features, and then outputs them in parallel. The total parameters are smaller than that of AlexNet, and the top-5 error rate is 6.67%.

VGG

In 2014, the second place of ImageNet was the Visual Geometry Group (VGG) network proposed by Oxford University. It realizes the deepening of simple structure through a multi-layer convolution and achieves good results. The error rate of top-5 is 8.8%.

Its disadvantage is that its final full connection layer makes the network parameters huge. Compared with Google net's 7 million parameters, this has 138 million.
Because of its simple structure, it is also loved by many people. Its structure is also used in style conversion (such as converting a photo into an image of Van Gogh)

ResNet

In 2015, Microsoft's ResNet won the first place with a top-5 error rate of 4.49% and a ResNet variant error rate of 3.57%. (basically beyond human beings)
The innovation is that when implementing the ordinary CNN module, the input is also added to the output module. As shown in the figure below.

This model effectively solves the problem of gradient disappearance.

Other structures

After 15 years, a number of other systems have improved ImageNet accuracy. For example, DenseNet (a 1000 layer network can be built based on ResNet structure), SqueezeNet and MobileNet provide reasonable accuracy, but its accuracy is very low compared with VGG, ResNet or Inception.
Google used the AutoMl system to design its own NASNet and achieved sota's results.

Using pre training model

For example, this task calls AlexNet.

import torchvision.models as models
alexnet = models.alexnet(num_classes=2)

Similarly, the api also provides definitions of VGG, ResNet, Inception, DenseNet and SqueezeNet variants. If called

models.alexnet(pretrained=True) 

A set of trained parameters can be obtained.

example

import torchvision.models as models
alexnet = models.alexnet(num_classes=1000, pretrained=True)

Download the pre training model first.

Output model structure.

print(alexnet)
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Download resnet and output:

resnet = models.resnet18(pretrained=True)
print(resnet)
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)

batch normalization

Abbreviated as BN, the BN layer has two parameters and participates in network training. Its mission is to batch through this layer_ Sample normalization of size (mean value is 0 and variance is 1). BN layer has little impact on small networks. However, for large networks, such as networks with more than 20 layers, the multiplication between each layer will have a great impact, and eventually lead to the disappearance or explosion of the gradient.
The use of BN layer also makes the gradient like ResNet-152 not explode or disappear.
In the previous example, we found that a normalization will be performed during image input, but it can also be trained without such processing. However, the training time is long, and the final BN layer automatically normalizes the input samples through training.

Model selection

The book suggests trying NASNet and PNAS (but not completely recommended in the book --) because they consume a lot of memory. This is not as efficient as the transfer learning mentioned in Chapter 4 compared with some manually designed structures such as ResNet.
You can also use torch.hub.list('pytorch/vision:v0.4.2 ') to view all models that can be downloaded.
For example:

['alexnet', 'deeplabv3_resnet101', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'fcn_resnet101', 'googlenet', 'inception_v3', 'mobilenet_v2', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'wide_resnet101_2', 'wide_resnet50_2']
model = torch.hub.load('pytorch/vision:v0.4.2', 'resnet50', pretrained=True)

Tags: Pytorch Deep Learning CNN

Posted on Wed, 10 Nov 2021 14:16:26 -0500 by nadman123