Establish CNN network for training
Define network
nn.Conv2d
nn.Conv2d(in_channels,out_channels, kernel_size, stride, padding)
- in_channels refers to the input channel, here is (36464)
- out_channels indicates the number of output channels
- kernel_size indicates the size of convolution kernel
- Stripe indicates the step size for each slide
- padding indicates the edge filling length. If the value is not assigned, this part will be discarded if there are not enough elements for convolution at the edge.
Convolution output size formula:
W
2
=
W
1
−
F
+
2
P
S
+
1
W_2=\frac{W_1-F+2P}{S}\quad+1
W2=SW1−F+2P+1
W: Width F: convolution kernel size S: step size P: padding
You can't divide by an integer.
Pooling
pooling combined with convolution can effectively reduce the parameters and accelerate the convergence speed.
Here we use maxpooling. pytorch also includes average pooling.
Dropout
It makes the network drop some neurons randomly during training and do not participate in training.
nn.Sequential()
A layer chain can be defined through nn.Sequential(). The network can be broken down into more logical arrangements.
Here we will set up a feature extraction layer and a classification layer.
According to the input picture format: 64 * 64, batchsize=64 as an example
class CNNNet(nn.Module): def __init__(self, num_classes=2): super(CNNNet, self).__init__() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), # Output: number of channels: 64 output size: 15 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 64, output size: 7 nn.Conv2d(64, 192, kernel_size=5, padding=2), # Output: number of channels: 192, output size: 7 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 192, output size: 3 nn.Conv2d(192, 384, kernel_size=3, padding=1), # Output: number of channels: 384, output size: 3 nn.ReLU(), nn.Conv2d(384, 256, kernel_size=3, padding=1), # Output: number of channels: 256, output size: 3 nn.ReLU(), nn.Conv2d(256, 256, kernel_size=3, padding=1), # Output: number of channels: 256, output size: 3 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 256, output size: 1 // batch_size = 64, channel = 265 outsize:1,1 ) self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) # Output: 256 * 6 * 6 self.classifier = nn.Sequential( nn.Dropout(), nn.Linear(256 * 6 * 6, 4096), nn.ReLU(), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(), nn.Linear(4096, num_classes) ) def forward(self, x): x = self.features(x) x = self.avgpool(x) # 64,256,6,6 x = torch.flatten(x, 1) # 64,9216 x = self.classifier(x) return x
The following is the complete code.
import torch import torch.nn as nn import torch.optim as optim import torch.utils.data import torch.nn.functional as F import torchvision from torchvision import transforms from PIL import Image # Defining neural networks class net(nn.Module): def __init__(self): super(net, self).__init__() self.fc1 = nn.Linear(12288, 84) # 64*64*3 self.fc2 = nn.Linear(84, 30) self.fc3 = nn.Linear(30, 84) self.fc4 = nn.Linear(84, 2) def forward(self, x): x = x.view(-1, 12288) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = self.fc4(x) return x # convnet class CNNNet(nn.Module): def __init__(self, num_classes=2): super(CNNNet, self).__init__() self.features = nn.Sequential( nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), # Output: number of channels: 64 output size: 15 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 64, output size: 7 nn.Conv2d(64, 192, kernel_size=5, padding=2), # Output: number of channels: 192, output size: 7 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 192, output size: 3 nn.Conv2d(192, 384, kernel_size=3, padding=1), # Output: number of channels: 384, output size: 3 nn.ReLU(), nn.Conv2d(384, 256, kernel_size=3, padding=1), # Output: number of channels: 256, output size: 3 nn.ReLU(), nn.Conv2d(256, 256, kernel_size=3, padding=1), # Output: number of channels: 256, output size: 3 nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), # Output: number of channels: 256, output size: 1 // batch_size = 64, channel = 265 outsize:1,1 ) self.avgpool = nn.AdaptiveAvgPool2d((6, 6)) # Output: 256 * 6 * 6 self.classifier = nn.Sequential( nn.Dropout(), nn.Linear(256 * 6 * 6, 4096), nn.ReLU(), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(), nn.Linear(4096, num_classes) ) def forward(self, x): x = self.features(x) x = self.avgpool(x) # 64,256,6,6 x = torch.flatten(x, 1) # 64,9216 x = self.classifier(x) return x cnnnet = CNNNet() def train(model, optimizer, loss_fn, train_loader, val_loader, epochs=20, device="cpu"): for epoch in range(epochs): training_loss = 0.0 valid_loss = 0.0 model.train() for batch in train_loader: optimizer.zero_grad() inputs, targets = batch inputs = inputs.to(device) targets = targets.to(device) output = model(inputs) loss = loss_fn(output, targets) loss.backward() optimizer.step() training_loss += loss.data.item() * inputs.size(0) training_loss /= len(train_loader.dataset) model.eval() num_correct = 0 num_examples = 0 for batch in val_loader: inputs, targets = batch inputs = inputs.to(device) output = model(inputs) targets = targets.to(device) loss = loss_fn(output, targets) valid_loss += loss.data.item() * inputs.size(0) correct = torch.eq(torch.max(F.softmax(output), dim=1)[1], targets).view(-1) num_correct += torch.sum(correct).item() num_examples += correct.shape[0] valid_loss /= len(val_loader.dataset) print( 'Epoch: {}, Training Loss: {:.2f}, Validation Loss: {:.2f}, accuracy = {:.2f}'.format(epoch, training_loss, valid_loss, num_correct / num_examples)) def check_image(path): try: im = Image.open(path) return True except: return False img_transforms = transforms.Compose([ transforms.Resize((64, 64)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) train_data_path = "./train/" train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=img_transforms, is_valid_file=check_image) val_data_path = "./val/" val_data = torchvision.datasets.ImageFolder(root=val_data_path, transform=img_transforms, is_valid_file=check_image) batch_size = 64 train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True) val_data_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size, shuffle=True) if torch.cuda.is_available(): device = torch.device("cuda") else: device = torch.device("cpu") cnnnet.to(device) optimizer = optim.Adam(cnnnet.parameters(), lr=0.001) train(cnnnet, optimizer, torch.nn.CrossEntropyLoss(), train_data_loader, val_data_loader, epochs=40, device=device)
Training results
Epoch: 0, Training Loss: 0.33, Validation Loss: 0.41, accuracy = 0.76 Epoch: 1, Training Loss: 0.40, Validation Loss: 0.38, accuracy = 0.82 Epoch: 2, Training Loss: 0.34, Validation Loss: 0.43, accuracy = 0.78 Epoch: 3, Training Loss: 0.28, Validation Loss: 0.44, accuracy = 0.76 Epoch: 4, Training Loss: 0.33, Validation Loss: 0.45, accuracy = 0.84 Epoch: 5, Training Loss: 0.31, Validation Loss: 0.41, accuracy = 0.84 Epoch: 6, Training Loss: 0.24, Validation Loss: 0.42, accuracy = 0.83 Epoch: 7, Training Loss: 0.24, Validation Loss: 0.39, accuracy = 0.82 Epoch: 8, Training Loss: 0.22, Validation Loss: 0.50, accuracy = 0.84 Epoch: 9, Training Loss: 0.22, Validation Loss: 0.73, accuracy = 0.69 Epoch: 10, Training Loss: 0.26, Validation Loss: 0.48, accuracy = 0.78 Epoch: 11, Training Loss: 0.19, Validation Loss: 0.80, accuracy = 0.79 Epoch: 12, Training Loss: 0.19, Validation Loss: 0.50, accuracy = 0.81 Epoch: 13, Training Loss: 0.15, Validation Loss: 0.72, accuracy = 0.76 Epoch: 14, Training Loss: 0.15, Validation Loss: 0.72, accuracy = 0.80 Epoch: 15, Training Loss: 0.22, Validation Loss: 0.40, accuracy = 0.81 Epoch: 16, Training Loss: 0.20, Validation Loss: 0.72, accuracy = 0.78 Epoch: 17, Training Loss: 0.11, Validation Loss: 0.86, accuracy = 0.83 Epoch: 18, Training Loss: 0.09, Validation Loss: 1.20, accuracy = 0.77 Epoch: 19, Training Loss: 0.09, Validation Loss: 0.87, accuracy = 0.76 Epoch: 20, Training Loss: 0.09, Validation Loss: 0.90, accuracy = 0.84 Epoch: 21, Training Loss: 0.14, Validation Loss: 0.75, accuracy = 0.81 Epoch: 22, Training Loss: 0.09, Validation Loss: 1.06, accuracy = 0.83 Epoch: 23, Training Loss: 0.09, Validation Loss: 0.95, accuracy = 0.83 Epoch: 24, Training Loss: 0.13, Validation Loss: 0.70, accuracy = 0.83 Epoch: 25, Training Loss: 0.08, Validation Loss: 1.14, accuracy = 0.74 Epoch: 26, Training Loss: 0.08, Validation Loss: 0.77, accuracy = 0.81 Epoch: 27, Training Loss: 0.03, Validation Loss: 1.45, accuracy = 0.85 Epoch: 28, Training Loss: 0.09, Validation Loss: 1.16, accuracy = 0.79 Epoch: 29, Training Loss: 0.10, Validation Loss: 0.94, accuracy = 0.85 Epoch: 30, Training Loss: 0.08, Validation Loss: 0.78, accuracy = 0.85 Epoch: 31, Training Loss: 0.08, Validation Loss: 0.78, accuracy = 0.83 Epoch: 32, Training Loss: 0.03, Validation Loss: 2.64, accuracy = 0.72 Epoch: 33, Training Loss: 0.24, Validation Loss: 0.38, accuracy = 0.86 Epoch: 34, Training Loss: 0.17, Validation Loss: 0.89, accuracy = 0.81 Epoch: 35, Training Loss: 0.05, Validation Loss: 1.04, accuracy = 0.83 Epoch: 36, Training Loss: 0.04, Validation Loss: 1.08, accuracy = 0.83 Epoch: 37, Training Loss: 0.02, Validation Loss: 1.49, accuracy = 0.83 Epoch: 38, Training Loss: 0.06, Validation Loss: 0.80, accuracy = 0.82 Epoch: 39, Training Loss: 0.07, Validation Loss: 1.55, accuracy = 0.76
CNN model in history
LeNet-5
LeNet-5 was first proposed in 1990
AlexNet
AlexNet was released in 2012, and the top-5 error rate was 15.3% in the ImageNet competition that year. First.
It first introduced maxpool and dropout, and extended the ReLU activation function. It also proves that deep learning is effective. It has also become a milestone in the history of deep learning.
Inception/GoogLeNet
This is the winner of the 2014 ImageNet competition. It solves some defects of AlexNet.
Its structure is shown in the figure. The network uses different convolution kernels to extract different features, and then outputs them in parallel. The total parameters are smaller than that of AlexNet, and the top-5 error rate is 6.67%.
VGG
In 2014, the second place of ImageNet was the Visual Geometry Group (VGG) network proposed by Oxford University. It realizes the deepening of simple structure through a multi-layer convolution and achieves good results. The error rate of top-5 is 8.8%.
Its disadvantage is that its final full connection layer makes the network parameters huge. Compared with Google net's 7 million parameters, this has 138 million.
Because of its simple structure, it is also loved by many people. Its structure is also used in style conversion (such as converting a photo into an image of Van Gogh)
ResNet
In 2015, Microsoft's ResNet won the first place with a top-5 error rate of 4.49% and a ResNet variant error rate of 3.57%. (basically beyond human beings)
The innovation is that when implementing the ordinary CNN module, the input is also added to the output module. As shown in the figure below.
This model effectively solves the problem of gradient disappearance.
Other structures
After 15 years, a number of other systems have improved ImageNet accuracy. For example, DenseNet (a 1000 layer network can be built based on ResNet structure), SqueezeNet and MobileNet provide reasonable accuracy, but its accuracy is very low compared with VGG, ResNet or Inception.
Google used the AutoMl system to design its own NASNet and achieved sota's results.
Using pre training model
For example, this task calls AlexNet.
import torchvision.models as models alexnet = models.alexnet(num_classes=2)
Similarly, the api also provides definitions of VGG, ResNet, Inception, DenseNet and SqueezeNet variants. If called
models.alexnet(pretrained=True)
A set of trained parameters can be obtained.
example
import torchvision.models as models alexnet = models.alexnet(num_classes=1000, pretrained=True)
Download the pre training model first.
Output model structure.
print(alexnet)
AlexNet( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2)) (1): ReLU(inplace=True) (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (4): ReLU(inplace=True) (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (7): ReLU(inplace=True) (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (9): ReLU(inplace=True) (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(6, 6)) (classifier): Sequential( (0): Dropout(p=0.5, inplace=False) (1): Linear(in_features=9216, out_features=4096, bias=True) (2): ReLU(inplace=True) (3): Dropout(p=0.5, inplace=False) (4): Linear(in_features=4096, out_features=4096, bias=True) (5): ReLU(inplace=True) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
Download resnet and output:
resnet = models.resnet18(pretrained=True) print(resnet)
ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer4): Sequential( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=512, out_features=1000, bias=True) )
batch normalization
Abbreviated as BN, the BN layer has two parameters and participates in network training. Its mission is to batch through this layer_ Sample normalization of size (mean value is 0 and variance is 1). BN layer has little impact on small networks. However, for large networks, such as networks with more than 20 layers, the multiplication between each layer will have a great impact, and eventually lead to the disappearance or explosion of the gradient.
The use of BN layer also makes the gradient like ResNet-152 not explode or disappear.
In the previous example, we found that a normalization will be performed during image input, but it can also be trained without such processing. However, the training time is long, and the final BN layer automatically normalizes the input samples through training.
Model selection
The book suggests trying NASNet and PNAS (but not completely recommended in the book --) because they consume a lot of memory. This is not as efficient as the transfer learning mentioned in Chapter 4 compared with some manually designed structures such as ResNet.
You can also use torch.hub.list('pytorch/vision:v0.4.2 ') to view all models that can be downloaded.
For example:
['alexnet', 'deeplabv3_resnet101', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'fcn_resnet101', 'googlenet', 'inception_v3', 'mobilenet_v2', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext50_32x4d', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'squeezenet1_0', 'squeezenet1_1', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'wide_resnet101_2', 'wide_resnet50_2']
model = torch.hub.load('pytorch/vision:v0.4.2', 'resnet50', pretrained=True)