Section two

The main content of this paper is the application of linear regression - kaggle house price prediction, convolution neural network

1, Linear regression practice

The website address of the competition is https://www.kaggle.com/c/house-prices-advanced-expression-technologies
The project steps are as follows: read + preprocess data, define neural network, define loss function and result calculation function (according to the requirements of the topic), define training network function, k-fold cross validation, and finally train all data for result prediction

1.1 data reading and preprocessing

The competition data is divided into training data set and test data set. Both data sets include the characteristics of each house, such as street type, construction year, roof type, basement condition and so on. These eigenvalues have continuous numbers, discrete labels or even missing values "na". We first use pandas to read the data.

import pandas as pd
train_data = pd.read_csv('/home/jiahui/PycharmProjects/house-prices-advanced-regression-techniques/train.csv')
test_data = pd.read_csv('/home/jiahui/PycharmProjects/house-prices-advanced-regression-techniques/test.csv')
#At this time, the data we read is of pandas type

The second step is to see what the data sheet looks like

train_data.shape # Output (1460, 81)
test_data.shape # Output (1459, 80)
train_data.iloc[0:4, [0, 1, 2, 3, -3, -2, -1]]#Output content in pandas

The third step is to combine the characteristics of training set and data set

all_features = pd.concat((train_data.iloc[:, 1:-1], test_data.iloc[:, 1:]))
#The last column of the training set is the label

Finally, we need to standardize the data. For the missing digital features, we use the mean value instead. For the non digital features, we introduce two additional columns, which are represented by 0 and 1 respectively. Here, I choose Z-scores standardization, where the mean value is 0 and the standard deviation is 1

numeric_features = all_features.dtypes[all_features.dtypes != 'object'].index
all_features[numeric_features] = all_features[numeric_features].apply(lambda x: (x - x.mean()) / (x.std()))
#Write 0 for unlabeled (mean)
all_features[numeric_features] = all_features[numeric_features].fillna(0)

# print(all_features.iloc['MSSubClass',0])
all_features = pd.get_dummies(all_features, dummy_na=True)

1.2 transform pandas data into sensor data and separate training set and test set

n_train = train_data.shape[0]  # Rows of pandas
n_features = all_features.shape[1]  # Columns of pandas
train_features = torch.tensor(all_features[:n_train].values,dtype=torch.float)
test_features = torch.tensor(all_features[n_train:].values,dtype=torch.float)
train_labels = torch.tensor(train_data.SalePrice.values,dtype=torch.float)

1.3 define neural network and loss function

In this practice, the loss function of mean square deviation is selected, and the optimization function is Adam (the optimized version of SGD, not particularly sensitive to learning rate)

#Mean square loss function
loss = nn.MSELoss()
#Result observation
def log_rmse(net, features, labels):
    with torch.no_grad():
        # Set the value less than 1 to 1, making the value more stable when taking logarithm
        clipped_preds = torch.max(net(features), torch.tensor(1.0))
        rmse = torch.sqrt(2 * loss(clipped_preds.log(), labels.log()).mean())
    return rmse.item()


#Training function (initialization + optimization): (,,,, training times, learning rate, weight attenuation, small batch (because each training needs to take some but not all))
def train(net, train_features, train_labels, test_features, test_labels,
          num_epochs, learning_rate, weight_decay, batch_size):
    #Training error and generalization error (for drawing)
    train_ls, test_ls = [], []
    dataset = torch.utils.data.TensorDataset(train_features, train_labels)
    train_iter = torch.utils.data.DataLoader(dataset, batch_size, shuffle=True)
    # Optimization function: Adam SGD advanced
    optimizer = torch.optim.Adam(params=net.parameters(), lr=learning_rate, weight_decay=weight_decay)
    net = net.float()#Convert all to float
    for epoch in range(num_epochs):
        for X, y in train_iter:
            l = loss(net(X.float()), y.float())
            optimizer.zero_grad()#Remember 0
            l.backward()
            optimizer.step()
        train_ls.append(log_rmse(net, train_features, train_labels))
        if test_labels is not None:
            test_ls.append(log_rmse(net, test_features, test_labels))
    return train_ls, test_ls

1.4 k-fold cross validation

In this method, the training set should be divided into k parts, one of which is the verification function and the other k-1 part is the training function, so that we can see whether our model is excellent while training

#k-fold cross validation
def get_k_fold_data(k, i, X, y):
    # Return the training and validation data required for the i-fold cross validation
    assert k > 1
    fold_size = X.shape[0] // k#It's divided into k pieces. The quantity of each piece is fold_
    X_train, y_train = None, None
    for j in range(k):
        idx = slice(j * fold_size, (j + 1) * fold_size)
        X_part, y_part = X[idx, :], y[idx]
        if j == i:
            X_valid, y_valid = X_part, y_part
        elif X_train is None:
            X_train, y_train = X_part, y_part
        else:
            X_train = torch.cat((X_train, X_part), dim=0)
            y_train = torch.cat((y_train, y_part), dim=0)
    return X_train, y_train, X_valid, y_valid

def k_fold(k, X_train, y_train, num_epochs,learning_rate, weight_decay, batch_size):
    train_l_sum, valid_l_sum = 0, 0
    for i in range(k):
        data = get_k_fold_data(k, i, X_train, y_train)
        net = get_net()
        train_ls, valid_ls = train(net, *data, num_epochs, learning_rate,
                                   weight_decay, batch_size)
        train_l_sum += train_ls[-1]
        valid_l_sum += valid_ls[-1]
        if i == 0:
            d2l.semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'rmse',
                         range(1, num_epochs + 1), valid_ls,
                         ['train', 'valid'])
        print('fold %d, train rmse %f, valid rmse %f' % (i, train_ls[-1], valid_ls[-1]))
    return train_l_sum / k, valid_l_sum / k


k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64#Generally, the batch size is 2 to the nth power
train_l, valid_l = k_fold(k, train_features, train_labels, num_epochs, lr, weight_decay, batch_size)
#print('%d-fold validation: avg train rmse %f, avg valid rmse %f' % (k, train_l, valid_l))

1.5 model training and output results

Train all data and predict output

def train_and_pred(train_features, test_features, train_labels, test_data,
                   num_epochs, lr, weight_decay, batch_size):
    net = get_net()
    train_ls, _ = train(net, train_features, train_labels, None, None,
                        num_epochs, lr, weight_decay, batch_size)
    d2l.semilogy(range(1, num_epochs + 1), train_ls, 'epochs', 'rmse')
    print('train rmse %f' % train_ls[-1])
    preds = net(test_features).detach().numpy()#Prediction results
    test_data['SalePrice'] = pd.Series(preds.reshape(1, -1)[0])
    submission = pd.concat([test_data['Id'], test_data['SalePrice']], axis=1)#Combination of pandas, torch.cat()
    submission.to_csv('./submission.csv', index=False)
k, num_epochs, lr, weight_decay, batch_size = 5, 100, 5, 0, 64
train_and_pred(train_features, test_features, train_labels, test_data, num_epochs, lr, weight_decay, batch_size)

2, Convolutional neural network CNN


When we do image classification, we usually think of linear regression. For example, we first expand a 12828 image into a 1 * 784 image, so we will destroy the spatial characteristics of the image and turn them into ordinary one-dimensional data. So we need to find a way to extract the characteristics of the image separately, just like people looking at objects, each neuron is only responsible for a certain Part of the data, so we think of convolutional neural network to separate the acquisition of the characteristics of items. CNN is mainly divided into three parts: convolution layer, galloping layer and full connection layer. Finally, the output of softmax classification is carried out. Here are some efficient net models.

2.1 LeNet

LeNet is divided into convolution layer block and full connection layer block.

The basic unit in the convolution layer block is the convolution layer followed by the average pooling layer: the convolution layer is used to identify spatial patterns in the image, such as lines and local objects, and the average pooling layer after the convolution layer is used to reduce the sensitivity of the convolution layer to location.

The convolution layer block consists of two such basic units stacked repeatedly. In the convolution layer block, each convolution layer uses a 5 * 5 window, and the sigmoid activation function is used on the output. The number of output channels of the first convolution layer is 6, and the number of output channels of the second convolution layer is increased to 16.

The full connection layer block includes three full connection layers. Their output numbers are 120, 84 and 10 respectively, where 10 is the number of output categories.

2.1.1 defining neural networks

#net
class Flatten(torch.nn.Module):  #Flattening operation
    def forward(self, x):
        return x.view(x.shape[0], -1)

class Reshape(torch.nn.Module): #Reshape image size
    def forward(self, x):
        return x.view(-1,1,28,28)      #(B x C x H x W)
    
net = torch.nn.Sequential(     #Lelet                                                  
    Reshape(),
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2), #Create a volume accumulation layer with a size of 5 * 5, a fill of 2, and a step number of 1 by default
    nn.Sigmoid(), #Activation function
    nn.AvgPool2d(kernel_size=2, stride=2),                              #Galloping layer
    nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),         
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),                             
    Flatten(),                                                 
    nn.Linear(in_features=16*5*5, out_features=120),
    nn.Sigmoid(),
    nn.Linear(120, 84),
    nn.Sigmoid(),
    nn.Linear(84, 10)
)

2.1.2 example display

Let's implement the LeNet model. We still use fashion MNIST as the training data set.

import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
import torch
import torch.nn as nn
import torch.optim as optim
import time
import matplotlib.pyplot as plt

# Lenet network initialization
class Flatten(torch.nn.Module):  # Flattening operation
    def forward(self, x):
        return x.view(x.shape[0], -1)


class Reshape(torch.nn.Module):  # Reshape image size
    def forward(self, x):
        return x.view(-1, 1, 28, 28)  # (B x C x H x W)


net = torch.nn.Sequential(  # Lelet
    Reshape(),
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2, stride= 1),  # b*1*28*28  =>b*6*28*28
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),  # b*6*28*28  =>b*6*14*14
    nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),  # b*6*14*14  =>b*16*10*10
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),  # b*16*10*10  => b*16*5*5
    Flatten(),  # b*16*5*5   => b*400
    nn.Linear(in_features=16 * 5 * 5, out_features=120),
    nn.Sigmoid(),
    nn.Linear(120, 84),
    nn.Sigmoid(),
    nn.Linear(84, 10)
)


def try_gpu():
    """If GPU is available, return torch.device as cuda:0; else return torch.device as cpu."""
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu')
    return device

device = try_gpu()

#Dataset loading
batch_size = 256 #Batch size
train_iter, test_iter = d2l.load_data_fashion_mnist(
    batch_size=batch_size, root='/home/kesci/input/FashionMNIST2065')
print(len(train_iter))


#Data display
def show_fashion_mnist(images, labels):
    d2l.use_svg_display()
    # Here "UU" means we ignore (do not use) variables
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

for Xdata,ylabel in train_iter:
    break
X, y = [], []
for i in range(10):
    print(Xdata[i].shape,ylabel[i].numpy())
    X.append(Xdata[i]) # Add the i th feature to X
    y.append(ylabel[i].numpy()) # Add the i-th label to y
show_fashion_mnist(X, y)

#Calculation accuracy
'''
(1). net.train()
  //Enable BatchNormalization and Dropout and set BatchNormalization and Dropout to True
(2). net.eval()
//Do not enable BatchNormalization and Dropout, set BatchNormalization and Dropout to False
'''

def evaluate_accuracy(data_iter, net,device=torch.device('cpu')):
    """Evaluate accuracy of a model on the given data set."""
    acc_sum,n = torch.tensor([0],dtype=torch.float32,device=device),0
    for X,y in data_iter:
        X,y = X.to(device),y.to(device)
        net.eval()
        with torch.no_grad():
            y = y.long()
            acc_sum += torch.sum((torch.argmax(net(X), dim=1) == y))  #[[0.2 ,0.4 ,0.5 ,0.6 ,0.8] ,[ 0.1,0.2 ,0.4 ,0.3 ,0.1]] => [ 4 , 2 ]
            n += y.shape[0]
    return acc_sum.item()/n


def train_ch5(net, train_iter, test_iter, criterion, num_epochs, batch_size, device, lr=None):

    net.to(device)
    optimizer = optim.SGD(net.parameters(), lr=lr)
    for epoch in range(num_epochs):
        train_l_sum = torch.tensor([0.0], dtype=torch.float32, device=device)
        train_acc_sum = torch.tensor([0.0], dtype=torch.float32, device=device)
        n, start = 0, time.time()
        for X, y in train_iter:
            net.train()

            optimizer.zero_grad()
            X, y = X.to(device), y.to(device)
            y_hat = net(X)
            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()

            with torch.no_grad():
                y = y.long()
                train_l_sum += loss.float()
                train_acc_sum += (torch.sum((torch.argmax(y_hat, dim=1) == y))).float()
                n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net, device)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, '
              'time %.1f sec'
              % (epoch + 1, train_l_sum / n, train_acc_sum / n, test_acc,
                 time.time() - start))

#train
lr, num_epochs = 0.9, 10

def init_weights(m):
    if type(m) == nn.Linear or type(m) == nn.Conv2d:
        torch.nn.init.xavier_uniform_(m.weight)

net.apply(init_weights)
net = net.to(device)

criterion = nn.CrossEntropyLoss()   #Cross entropy describes the distance between two probability distributions. The more cross entropy is, the closer they are
train_ch5(net, train_iter, test_iter, criterion,num_epochs, batch_size,device, lr)


This three runs out of the data in the local cpu environment, each training is slow, the accuracy is good

2.2AlexNet

LeNet: it doesn't always work on large real datasets.
1. The calculation of neural network is complex.
2. There is no deep research on parameter initialization and convex optimization algorithm.

AlexNet features:

1.8-layer transformation, including 5-layer convolution and 2-layer fully connected hidden layer, and 1 fully connected output layer.
2. The sigmoid activation function is changed to a simpler ReLU activation function.
3. Dropout is used to control the model complexity of the full connection layer.
4. Introduce data enhancement, such as flipping, cropping and color change, so as to further expand the data set to alleviate over fitting.

import time
import torch
from torch import nn, optim
import torchvision
import numpy as np
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
import os
import torch.nn.functional as F


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class AlexNet(nn.Module):
    def __init__(self):
        super(AlexNet, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding
            nn.ReLU(),
            nn.MaxPool2d(3, 2), # kernel_size, stride
            # Reduce convolution window, use fill 2 to make the height and width of input and output consistent, and increase the number of output channels
            nn.Conv2d(96, 256, 5, 1, 2),
            nn.ReLU(),
            nn.MaxPool2d(3, 2),
            # Three consecutive convolution layers and smaller convolution windows are used. In addition to the final convolution layer, the number of output channels is further increased.
            # Do not use pooling layer after the first two convolutions to reduce the height and width of input
            nn.Conv2d(256, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 384, 3, 1, 1),
            nn.ReLU(),
            nn.Conv2d(384, 256, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(3, 2)
        )
         # The output number of full connection layer here is several times larger than that in LeNet. Use discard layer to ease over fitting
        self.fc = nn.Sequential(
            nn.Linear(256*5*5, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            #Because CPU image is used to simplify the network, this layer can be added for GPU image
            #nn.Linear(4096, 4096),
            #nn.ReLU(),
            #nn.Dropout(0.5),

            # Output layer. Because fashion MNIST is used here, the number of categories is 10 instead of 1000 in the paper
            nn.Linear(4096, 10),
        )

    def forward(self, img):

        feature = self.conv(img)
        output = self.fc(feature.view(img.shape[0], -1))
        return output

net = AlexNet()



def load_data_fashion_mnist(batch_size, resize=None, root='/home/kesci/input/FashionMNIST2065'):
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_iter, test_iter



batch_size = 16
# If the error message "out of memory" appears, you can reduce the batch "size or resize
train_iter, test_iter = load_data_fashion_mnist(batch_size, 224)
for X, Y in train_iter:
    print('X =', X.shape,'\nY =', Y.type(torch.int32))
    break


lr, num_epochs = 0.001, 3
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

2.3 VGG11

VGG: build the depth model by repeating the simple foundation blocks.
Block: several identical fills are
1. Convolution layer with window shape of 33, followed by a step of
2. Maximum pooling layer with window shape 22.
The convolution layer keeps the height and width of the input constant, while the pooling layer halves them.

import time
import torch
from torch import nn, optim
import torchvision
import numpy as np
import sys
sys.path.append("/home/jiahui/PycharmProjects/deep_learning")
from deep_learning.d2lzh_pytorch import utils as d2l


def load_data_fashion_mnist(batch_size, resize=None, root='/home/jiahui/PycharmProjects/deep_learning/section05'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_iter, test_iter


def vgg_block(num_convs, in_channels, out_channels): #Number of convolution layers, number of input channels, number of output channels
    blk = []
    for i in range(num_convs):
        if i == 0:
            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
        else:
            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        blk.append(nn.ReLU())
    blk.append(nn.MaxPool2d(kernel_size=2, stride=2)) # It's going to halve the width and height
    return nn.Sequential(*blk)

conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))
# After five vgg_block s, the width and height will be halved five times to 224 / 32 = 7
fc_features = 512 * 7 * 7 # c * w * h
fc_hidden_units = 4096 # Arbitrarily


def vgg(conv_arch, fc_features, fc_hidden_units=4096):
    net = nn.Sequential()
    # Convolution layer part
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        # Every time you pass a VGg block, the width and height will be halved
        net.add_module("vgg_block_" + str(i+1), vgg_block(num_convs, in_channels, out_channels))
    # Full connection layer part
    net.add_module("fc", nn.Sequential(d2l.FlattenLayer(),
                                 nn.Linear(fc_features, fc_hidden_units),
                                 nn.ReLU(),
                                 nn.Dropout(0.5),
                                 nn.Linear(fc_hidden_units, fc_hidden_units),
                                 nn.ReLU(),
                                 nn.Dropout(0.5),
                                 nn.Linear(fc_hidden_units, 10)
                                ))
    return net


net = vgg(conv_arch, fc_features, fc_hidden_units)
X = torch.rand(1, 1, 224, 224)

# Named ABCD children get the first level sub module and its name (named modules will return all sub modules, including sub modules of sub modules)
for name, blk in net.named_children():
    X = blk(X)
    print(name, 'output shape: ', X.shape)

ratio = 8
small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio),
                   (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]
net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)
print(net)


batch_size = 16
# If the error message "out of memory" appears, you can reduce the batch "size or resize
train_iter, test_iter = load_data_fashion_mnist(batch_size, 224)
for X, Y in train_iter:
    print('X =', X.shape,'\nY =', Y.type(torch.int32))
    break

lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

2.4NiN

LeNet, AlexNet, and VGG: firstly, the spatial features are fully extracted from the modules composed of the volume accumulation layer, and then the classification results are output from the modules composed of the full connection layer.
NiN: connect several small collaterals composed of convolution layer and "all connected" layer in series to build deep collaterals.
The number of output channels is equal to the number of label categories, and then the global average pooling layer averages all elements in each channel and directly classifies them.

1 × 1 convolution kernel action
1. Number of channels: the number of channels can be reduced by controlling the number of convolution kernels.
2. Increase nonlinearity. The convolution process of 1 × 1 convolution kernel is equivalent to the calculation process of full connection layer, and the nonlinear activation function is added, so the nonlinearity of the network can be increased.
3. Few calculation parameters

import time
import torch
from torch import nn, optim
import torchvision
import numpy as np
import sys
sys.path.append("/home/jiahui/PycharmProjects/deep_learning")
from deep_learning.d2lzh_pytorch import utils as d2l

def load_data_fashion_mnist(batch_size, resize=None, root='/home/jiahui/PycharmProjects/deep_learning/section05'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_iter, test_iter


def nin_block(in_channels, out_channels, kernel_size, stride, padding):
    blk = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
                        nn.ReLU(),
                        nn.Conv2d(out_channels, out_channels, kernel_size=1),
                        nn.ReLU(),
                        nn.Conv2d(out_channels, out_channels, kernel_size=1),
                        nn.ReLU())
    return blk

class GlobalAvgPool2d(nn.Module):
    # The global average pooling layer can be achieved by setting the pool window shape to the input height and width
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()
    def forward(self, x):
        return F.avg_pool2d(x, kernel_size=x.size()[2:])

net = nn.Sequential(
    nin_block(1, 96, kernel_size=11, stride=4, padding=0),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(96, 256, kernel_size=5, stride=1, padding=2),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nin_block(256, 384, kernel_size=3, stride=1, padding=1),
    nn.MaxPool2d(kernel_size=3, stride=2),
    nn.Dropout(0.5),
    # The number of label categories is 10
    nin_block(384, 10, kernel_size=3, stride=1, padding=1),
    GlobalAvgPool2d(),
    # Convert the four-dimensional output to the two-dimensional output in the shape of (batch size, 10)
    d2l.FlattenLayer())

X = torch.rand(1, 1, 224, 224)
for name, blk in net.named_children():
    X = blk(X)
    print(name, 'output shape: ', X.shape)


batch_size = 128
train_iter, test_iter = load_data_fashion_mnist(batch_size, 224)
for X, Y in train_iter:
    print('X =', X.shape,'\nY =', Y.type(torch.int32))
    break


lr, num_epochs = 0.002, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)

2.5GoogLeNet

1. it is composed of Inception basic block.
2. The concept block is equivalent to a network with four lines. It extracts information through convolution layers and most pooling layers with different window shapes, and reduces the number of channels in 1 × 1 convolution layer, thus reducing the complexity of the model.
3. The super parameter that can be defined is the number of output channels of each layer, so we can control the complexity of the model.

Complete model structure

import time
import torch
from torch import nn, optim
import torchvision
import numpy as np
import sys
sys.path.append("/home/jiahui/PycharmProjects/deep_learning")
from deep_learning.d2lzh_pytorch import utils as d2l
import os
import torch.nn.functional as F


def load_data_fashion_mnist(batch_size, resize=None, root='/home/jiahui/PycharmProjects/deep_learning/section05'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())

    transform = torchvision.transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)

    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_iter, test_iter


class Inception(nn.Module):
    # c1 - c4 is the number of output channels of the layer in each line
    def __init__(self, in_c, c1, c2, c3, c4):
        super(Inception, self).__init__()
        # Line 1, single 1 x 1 roll up
        self.p1_1 = nn.Conv2d(in_c, c1, kernel_size=1)
        # Line 2, 1 x 1 roll up followed by 3 x 3 roll up
        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=1)
        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)
        # Line 3, 1 x 1 roll, followed by 5 x 5 roll
        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=1)
        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)
        # Line 4, 3 x 3 max. pooling layer followed by 1 x 1 convolution layer
        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
        self.p4_2 = nn.Conv2d(in_c, c4, kernel_size=1)

    def forward(self, x):
        p1 = F.relu(self.p1_1(x))
        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))
        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))
        p4 = F.relu(self.p4_2(self.p4_1(x)))
        return torch.cat((p1, p2, p3, p4), dim=1)  # Link output on channel dimension

b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                   nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1),
                   nn.Conv2d(64, 192, kernel_size=3, padding=1),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

b3 = nn.Sequential(Inception(192, 64, (96, 128), (16, 32), 32),
                   Inception(256, 128, (128, 192), (32, 96), 64),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

b4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),
                   Inception(512, 160, (112, 224), (24, 64), 64),
                   Inception(512, 128, (128, 256), (24, 64), 64),
                   Inception(512, 112, (144, 288), (32, 64), 64),
                   Inception(528, 256, (160, 320), (32, 128), 128),
                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),
                   Inception(832, 384, (192, 384), (48, 128), 128),
                   d2l.GlobalAvgPool2d())

net = nn.Sequential(b1, b2, b3, b4, b5,
                    d2l.FlattenLayer(), nn.Linear(1024, 10))

net = nn.Sequential(b1, b2, b3, b4, b5, d2l.FlattenLayer(), nn.Linear(1024, 10))

X = torch.rand(1, 1, 96, 96)

for blk in net.children():
    X = blk(X)
    print('output shape: ', X.shape)

batch_size = 16
train_iter, test_iter = load_data_fashion_mnist(batch_size, 224)
for X, Y in train_iter:
    print('X =', X.shape,'\nY =', Y.type(torch.int32))
    break

lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
Published 12 original articles, won praise 0, visited 4956
Private letter follow

Tags: network P4 Lambda less

Posted on Tue, 18 Feb 2020 06:25:57 -0500 by Undrium