Pytoch-4.1 fine tuning model

4.1.1 what is fine tuning

What if you don't have much training data for a task? We first find a similar model trained by others, take the ready-made trained model of others, replace it with our own data, adjust the parameters, and train again. This is fine tune.

Why fine tune

For the case that the data set itself is very small (thousands of pictures), it is unrealistic to train a large neural network with tens of millions of parameters from scratch, because the larger the model requires more data, and over fitting cannot be avoided. At this time, if you want to use the super feature extraction ability of large-scale neural network, you can only fine tune the trained model.
It can reduce the training cost: if the method of deriving feature vectors is used for migration learning, the later training cost is very low, there is no pressure with CPU, and there is no deep learning machine.
The model trained by predecessors will be stronger in probability than the model built from scratch. There is no need to build wheels again.

Transfer Learning

The original intention of migration learning is to save the time of manually labeling samples, so that the model can migrate from an existing labeled data field to an unlabeled data field, so as to train the model suitable for this field. The cost of learning directly from the target field is too high. Therefore, we turn to using the existing relevant knowledge to assist in learning new knowledge as soon as possible.
(recommended blog: https://blog.csdn.net/dakenz/article/details/85954548 )
According to the learning methods, transfer learning can be divided into sample based transfer, feature-based transfer, model-based transfer, relationship-based transfer and so on.

4.1.2 how to fine tune

For different fields, the methods of fine-tuning are also different. For example, in the field of speech recognition, the first layers are generally fine-tuned, and the later layers are fine-tuned for picture recognition. (to be learned)

4.1.3 precautions

If the new data set is similar to the original data set, you can directly fine tune the last FC layer or reassign a new classifier
If the new data set is relatively small and the original data set is quite different, you can start training from the middle of the model and only carry out fine tuning for the last few layers
The new data set is relatively small and the original data set is quite different. If the above method still fails to be standardized, it is best to retrain and only use the pre trained model as the data initialized by a new model
The size of the new data set must be the same as that of the original data set. For example, the size of the pictures input in CNN must be the same so that no error will be reported
If the data set size is different, you can add convolution or pool layer before the last fc layer to make the final output consistent with the fc layer, but this will lead to a significant decrease in accuracy, so it is not recommended
Different learning rates can be set for different layers. Generally, it is recommended that the learning rate set for the layer initialized with the original data should be less than (generally less than 10 times) the initialized learning rate, so as to ensure that the initialized data will not twist too fast, and the new layer using the initialized learning rate can converge quickly.

4.1.4 fine tuning examples

Import related libraries

%matplotlib inline import torch,os,torchvision import torch.nn as nn import torch.nn.functional as F import pandas as pd import numpy as np import matplotlib.pyplot as plt from torch.utils.data import DataLoader, Dataset from torchvision import datasets, models, transforms from PIL import Image from sklearn.model_selection import StratifiedShuffleSplit torch.__version__

Here, we use the officially trained resnet50 to participate in the dog breed dog identification on kaggle to do a simple fine-tuning example.
First, we need to download the official data and decompress it. Just keep the directory structure of the data. Here, specify the location of the directory and look at the content

DATA_ROOT = 'data' all_labels_df = pd.read_csv(os.path.join(DATA_ROOT,'labels.csv')) all_labels_df.head()

Code on Kaggle:
You can directly import the data dog feed identification of Kaggle competition on Kaggle

DATA_ROOT = '../input/dog-breed-identification' #../input/dog-breed-identification/labels.csv all_labels_df = pd.read_csv(os.path.join(DATA_ROOT,'labels.csv')) all_labels_df.head()

Get the dog's classification and number according to the classification
Here, two dictionaries are defined, which correspond to the name and id respectively for later processing

breeds = all_labels_df.breed.unique() breed2idx = dict((breed,idx) for idx,breed in enumerate(breeds)) idx2breed = dict((idx,breed) for idx,breed in enumerate(breeds)) len(breeds)#120

Add to list

all_labels_df['label_idx'] = [breed2idx[b] for b in all_labels_df.breed] all_labels_df.head()

Since our dataset is not in the official format, we define a dataset ourselves

#Since our dataset is not in the official format, we define a dataset ourselves class DogDataset(Dataset): def __init__(self, labels_df, img_path, transform=None): self.labels_df = labels_df self.img_path = img_path self.transform = transform def __len__(self): return self.labels_df.shape[0] def __getitem__(self, idx): image_name = os.path.join(self.img_path, self.labels_df.id[idx]) + '.jpg' img = Image.open(image_name) label = self.labels_df.label_idx[idx] if self.transform: img = self.transform(img) return img, label

# Define some super parameters IMG_SIZE = 224 # The input of resnet50 is 224, so you need to unify the size of the picture BATCH_SIZE= 256 #This batch size needs to occupy 4.6-5g of video memory. If it is not enough, the next batch can be changed. If the memory exceeds 10G, it can be changed to 512 IMG_MEAN = [0.485, 0.456, 0.406] IMG_STD = [0.229, 0.224, 0.225] CUDA=torch.cuda.is_available() DEVICE = torch.device("cuda" if CUDA else "cpu")

# Define picture transformation rules for training and validation data train_transforms = transforms.Compose([ transforms.Resize(IMG_SIZE), transforms.RandomResizedCrop(IMG_SIZE), transforms.RandomHorizontalFlip(), transforms.RandomRotation(30), transforms.ToTensor(), transforms.Normalize(IMG_MEAN, IMG_STD) ]) val_transforms = transforms.Compose([ transforms.Resize(IMG_SIZE), transforms.CenterCrop(IMG_SIZE), transforms.ToTensor(), transforms.Normalize(IMG_MEAN, IMG_STD) ])

We only segment 10% of the data as the verification data during training

# Use the official data loader to load data image_transforms = {'train':train_transforms, 'valid':val_transforms} train_dataset = DogDataset(train_df, os.path.join(DATA_ROOT,'train'), transform=image_transforms['train']) val_dataset = DogDataset(val_df, os.path.join(DATA_ROOT,'train'), transform=image_transforms['valid']) image_dataset = {'train':train_dataset, 'valid':val_dataset} image_dataloader = dataset_sizes =

Start configuring the network. Since ImageNet recognizes 1000 objects, our dog classification is only 120 in total, so we need to fine tune the last layer of the model, the full connection layer, and change the output from 1000 to 120

model_ft = models.resnet50(pretrained=True) # The official pre training model is automatically downloaded here, and # Freeze all parameter layers for param in model_ft.parameters(): param.requires_grad = False # Print the information of the full connection layer here print(model_ft.fc)

num_fc_ftr = model_ft.fc.in_features #Get input to fc layer model_ft.fc = nn.Linear(num_fc_ftr, len(breeds)) # Define a new FC layer model_ft=model_ft.to(DEVICE)# Put it in the device print(model_ft) # Finally, print the new model print(model_ft.fc)

Set training parameters

criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam([ {'params':model_ft.fc.parameters()} ], lr=0.001)#Specifies the learning rate of the newly added fc layer

Define training function

def train(model,device, train_loader, epoch): model.train() for batch_idx, data in enumerate(train_loader): x,y= data x=x.to(device) y=y.to(device) optimizer.zero_grad() y_hat= model(x) loss = criterion(y_hat, y) loss.backward() optimizer.step() print ('Train Epoch: {}\t Loss: {:.6f}'.format(epoch,loss.item()))

Define test function

def test(model, device, test_loader): model.eval() test_loss = 0 correct = 0 with torch.no_grad(): for i,data in enumerate(test_loader): x,y= data x=x.to(device) y=y.to(device) optimizer.zero_grad() y_hat = model(x) test_loss += criterion(y_hat, y).item() # sum up batch loss pred = y_hat.max(1, keepdim=True)[1] # get the index of the max log-probability correct += pred.eq(y.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( test_loss, correct, len(val_dataset), 100. * correct / len(val_dataset)))

Train 9 times to see the effect

for epoch in range(1, 10): %time train(model=model_ft,device=DEVICE, train_loader=image_dataloader["train"],epoch=epoch) test(model=model_ft, device=DEVICE, test_loader=image_dataloader["valid"])