Pytoch-4.1 fine tuning model

4.1 Fine tuning model

4.1.1 what is fine tuning

What if you don't have much training data for a task? We first find a similar model trained by others, take the ready-made trained model of others, replace it with our own data, adjust the parameters, and train again. This is fine tune.

Why fine tune

  • For the case that the data set itself is very small (thousands of pictures), it is unrealistic to train a large neural network with tens of millions of parameters from scratch, because the larger the model requires more data, and over fitting cannot be avoided. At this time, if you want to use the super feature extraction ability of large-scale neural network, you can only fine tune the trained model.
  • It can reduce the training cost: if the method of deriving feature vectors is used for migration learning, the later training cost is very low, there is no pressure with CPU, and there is no deep learning machine.
  • The model trained by predecessors will be stronger in probability than the model built from scratch. There is no need to build wheels again.

Transfer Learning

The original intention of migration learning is to save the time of manually labeling samples, so that the model can migrate from an existing labeled data field to an unlabeled data field, so as to train the model suitable for this field. The cost of learning directly from the target field is too high. Therefore, we turn to using the existing relevant knowledge to assist in learning new knowledge as soon as possible.
(recommended blog: )
According to the learning methods, transfer learning can be divided into sample based transfer, feature-based transfer, model-based transfer, relationship-based transfer and so on.

4.1.2 how to fine tune

For different fields, the methods of fine-tuning are also different. For example, in the field of speech recognition, the first layers are generally fine-tuned, and the later layers are fine-tuned for picture recognition. (to be learned)

4.1.3 precautions

  • If the new data set is similar to the original data set, you can directly fine tune the last FC layer or reassign a new classifier
  • If the new data set is relatively small and the original data set is quite different, you can start training from the middle of the model and only carry out fine tuning for the last few layers
  • The new data set is relatively small and the original data set is quite different. If the above method still fails to be standardized, it is best to retrain and only use the pre trained model as the data initialized by a new model
  • The size of the new data set must be the same as that of the original data set. For example, the size of the pictures input in CNN must be the same so that no error will be reported
  • If the data set size is different, you can add convolution or pool layer before the last fc layer to make the final output consistent with the fc layer, but this will lead to a significant decrease in accuracy, so it is not recommended
    Different learning rates can be set for different layers. Generally, it is recommended that the learning rate set for the layer initialized with the original data should be less than (generally less than 10 times) the initialized learning rate, so as to ensure that the initialized data will not twist too fast, and the new layer using the initialized learning rate can converge quickly.

4.1.4 fine tuning examples

Import related libraries

%matplotlib inline
import torch,os,torchvision
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from import DataLoader, Dataset
from torchvision import datasets, models, transforms
from PIL import Image
from sklearn.model_selection import StratifiedShuffleSplit

Here, we use the officially trained resnet50 to participate in the dog breed dog identification on kaggle to do a simple fine-tuning example.
First, we need to download the official data and decompress it. Just keep the directory structure of the data. Here, specify the location of the directory and look at the content

DATA_ROOT = 'data'
all_labels_df = pd.read_csv(os.path.join(DATA_ROOT,'labels.csv'))

Code on Kaggle:
You can directly import the data dog feed identification of Kaggle competition on Kaggle

DATA_ROOT = '../input/dog-breed-identification'
all_labels_df = pd.read_csv(os.path.join(DATA_ROOT,'labels.csv'))

Get the dog's classification and number according to the classification
Here, two dictionaries are defined, which correspond to the name and id respectively for later processing

breeds = all_labels_df.breed.unique()
breed2idx = dict((breed,idx) for idx,breed in enumerate(breeds))
idx2breed = dict((idx,breed) for idx,breed in enumerate(breeds))

Add to list

all_labels_df['label_idx'] = [breed2idx[b] for b in all_labels_df.breed]

Since our dataset is not in the official format, we define a dataset ourselves

#Since our dataset is not in the official format, we define a dataset ourselves

class DogDataset(Dataset):
    def __init__(self, labels_df, img_path, transform=None):
        self.labels_df = labels_df
        self.img_path = img_path
        self.transform = transform

    def __len__(self):
        return self.labels_df.shape[0]

    def __getitem__(self, idx):
        image_name = os.path.join(self.img_path,[idx]) + '.jpg'
        img =
        label = self.labels_df.label_idx[idx]

        if self.transform:
            img = self.transform(img)
        return img, label
# Define some super parameters

IMG_SIZE = 224 # The input of resnet50 is 224, so you need to unify the size of the picture
BATCH_SIZE= 256 #This batch size needs to occupy 4.6-5g of video memory. If it is not enough, the next batch can be changed. If the memory exceeds 10G, it can be changed to 512
IMG_MEAN = [0.485, 0.456, 0.406]
IMG_STD = [0.229, 0.224, 0.225]
DEVICE = torch.device("cuda" if CUDA else "cpu")
# Define picture transformation rules for training and validation data

train_transforms = transforms.Compose([
    transforms.Normalize(IMG_MEAN, IMG_STD)

val_transforms = transforms.Compose([
    transforms.Normalize(IMG_MEAN, IMG_STD)

We only segment 10% of the data as the verification data during training

# Use the official data loader to load data

image_transforms = {'train':train_transforms, 'valid':val_transforms}

train_dataset = DogDataset(train_df, os.path.join(DATA_ROOT,'train'), transform=image_transforms['train'])
val_dataset = DogDataset(val_df, os.path.join(DATA_ROOT,'train'), transform=image_transforms['valid'])
image_dataset = {'train':train_dataset, 'valid':val_dataset}

image_dataloader = {x:DataLoader(image_dataset[x],batch_size=BATCH_SIZE,shuffle=True,num_workers=0) for x in dataset_names}
dataset_sizes = {x:len(image_dataset[x]) for x in dataset_names}

Start configuring the network. Since ImageNet recognizes 1000 objects, our dog classification is only 120 in total, so we need to fine tune the last layer of the model, the full connection layer, and change the output from 1000 to 120

model_ft = models.resnet50(pretrained=True) # The official pre training model is automatically downloaded here, and
# Freeze all parameter layers
for param in model_ft.parameters():
    param.requires_grad = False
# Print the information of the full connection layer here

num_fc_ftr = model_ft.fc.in_features #Get input to fc layer
model_ft.fc = nn.Linear(num_fc_ftr, len(breeds)) # Define a new FC layer Put it in the device
print(model_ft) # Finally, print the new model

Set training parameters

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam([
], lr=0.001)#Specifies the learning rate of the newly added fc layer

Define training function

def train(model,device, train_loader, epoch):
    for batch_idx, data in enumerate(train_loader):
        x,y= data
        y_hat= model(x)
        loss = criterion(y_hat, y)
    print ('Train Epoch: {}\t Loss: {:.6f}'.format(epoch,loss.item()))

Define test function

def test(model, device, test_loader):
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for i,data in enumerate(test_loader):          
            x,y= data
            y_hat = model(x)
            test_loss += criterion(y_hat, y).item() # sum up batch loss
            pred = y_hat.max(1, keepdim=True)[1] # get the index of the max log-probability
            correct += pred.eq(y.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(val_dataset),
        100. * correct / len(val_dataset)))

Train 9 times to see the effect

for epoch in range(1, 10):
    %time train(model=model_ft,device=DEVICE, train_loader=image_dataloader["train"],epoch=epoch)
    test(model=model_ft, device=DEVICE, test_loader=image_dataloader["valid"])

Tags: Pytorch Deep Learning

Posted on Wed, 01 Dec 2021 18:53:47 -0500 by sols