PyTorch model read / write, parameter initialization, Finetune

After using PyTorch for some time, I can't put it down (0-0). I heard that there is already a C + + interface. In the application process, it is inevitable to use Finetune / parameter initialization / model loading, etc.

Model save / load

1. All model parameters

During the training process, sometimes the training will be stopped for various reasons. At this time, we need to pay attention to saving the model of each round of epoch (generally save the best model and the current round model). Generally, the storage method recommended in pytorch is used. This method saves the parameters of the model.

#Save the model to checkpoint.pth.tar, 'checkpoint.pth.tar')

The corresponding loading model method is (this method needs to deserialize the model first to obtain the parameter dictionary, so you must load the model first, and then load_state_dict):


After the above save, an example is given to illustrate how to use the information and / or resume train.

#Save the model status. You can set some parameters for later use
state = {'epoch': epoch + 1,#Current rounds saved
         'state_dict': mymodel.state_dict(),#Trained parameters
         'optimizer': optimizer.state_dict(),#Optimizer parameters for subsequent resume
         'best_pred': best_pred#Current best accuracy

#Save the model to checkpoint.pth.tar, 'checkpoint.pth.tar')
#If it is best, copy it
if is_best:
    shutil.copyfile(filename, directory + 'model_best.pth.tar')

checkpoint = torch.load('model_best.pth.tar')
model.load_state_dict(checkpoint['state_dict'])#model parameter 
optimizer.load_state_dict(checkpoint['optimizer'])#Optimization parameters
epoch = checkpoint['epoch']#epoch, which can be used to update the learning rate, etc

#With the above, you can continue to retrain, and you don't need to worry about stopping the program and retraining.

The above is the method suggested by pytorch. Of course, there is the second method. This method is not flexible and is not recommended.


mymodel = torch.load('checkpoint.pth.tar')

2. Some model parameters

In many cases, we load the trained model, and the trained model may not be exactly the same as the model we define, but we only want to use the same parameters of those layers.

There are several solutions:

(1) Build your own model directly from the trained model, that is, load the trained model first, and then define your own model based on it;

model_ft = models.resnet18(pretrained=use_pretrained)
self.conv1 = model_ft.conv1 =
... ...

(2) Define the model yourself and load the model directly

#The first method:
mymodelB = TheModelBClass(*args, **kwargs)
# strict=False, set to false, only parameters with the same key value are reserved
mymodelB.load_state_dict(model_zoo.load_url(model_urls['resnet18']), strict=False)

#The second method:
# Loading model
model_pretrained = models.resnet18(pretrained=use_pretrained)

# mymodel's state_dict,
# For example: conv1.weight 
#     conv1.bias  
mymodelB_dict = mymodelB.state_dict()

# Set model_ The pre trained model is compared with the user-defined model, and different models are eliminated
pretrained_dict = {k: v for k, v in model_pretrained.items() if k in mymodelB_dict}
# Update existing model_dict

# Load the state we really need_ dict

# Method 2 may be more intuitive

Parameter initialization

The second problem is parameter initialization, which is used in many codes. After all, not all have pre training parameters. At this time, it is necessary to initialize parameters that are not pre training parameters. Each Tensor in pytorch is actually an encapsulation of Variabl, which contains interfaces such as data and grad, so these interfaces can be used to directly assign values. It also provides how to directly assign the model parameters trained by other frameworks (caffe/tensorflow/mxnet/gluonCV, etc.) to pytorch. In fact, it is a direct assignment to data.

pytorch provides methods to initialize parameters:

 def weight_init(m):
    if isinstance(m,nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels,math.sqrt(2./n))
    elif isinstance(m,nn.BatchNorm2d):

However, generally, if there is no great demand for initialization parameters and there is no problem (when it is uncertain whether the performance will be affected), pytorch has default initialization parameters.


The last is fine tuning. We usually do experiments. At least the backbone uses the pre trained model as the feature extractor, or fine tune it.

When used for feature extraction, some parameters of feature extraction are not required to be learned, and pytorch provides requirements_ The grad parameter is used to determine whether to enter the gradient calculation, that is, whether to update the parameters. Take minist as an example and use resnet18 for feature extraction:

#Load pre training model
model = torchvision.models.resnet18(pretrained=True)

#Traverse each parameter and set it to not update the parameter, that is, not learn
for param in model.parameters():
    param.requires_grad = False

# Change the full connection layer to the 10 classes required by mnist. Note: after this change, require_ Grad defaults to True
model.fc = nn.Linear(512, 10)

# optimization
optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)         

When it is used for global fine tuning, we generally need to set different learning rates for different layers. The learning rate of the pre training layer is smaller and that of other layers is larger. How do you do that?

# Load pre training model
model = torchvision.models.resnet18(pretrained=True)
model.fc = nn.Linear(512, 10)

# reference resources:
ignored_params = list(map(id, model.fc.parameters()))
base_params = filter(lambda p: id(p) not in ignored_params, model.parameters())

# Set different learning rates for different parameters
params_list = [{'params': base_params, 'lr': 0.001},]
params_list.append({'params': model.fc.parameters(), 'lr': 0.01})

optimizer = torch.optim.SGD(params_list,

Finally, sort out the basic model of pytorch pre training at present:


Different pre training models have been provided in torchvision, which is generally enough.

pytorch/ is uploading... Re upload cancel

Various versions of alexnet/densenet (densenet121 / densenet169 / densenet201 / densenet161) / inception are included_ V3 / RESNET versions ('resnet18 ',' resnet34 ',' resnet50 ',' resnet101 ',' resnet152 ') / squeezenet versions ('squeezenet1_0', 'squeezenet1_1')/VGG versions ('vgg11 ',' vgg11_bn ',' vgg13_bn ',' vgg16_bn ',' vgg19_bn ',' vgg19 ')

(2) Other pre trained models, such as SENet/NASNet, etc.


(3)gluonCV to pytorch model, including classification network and segmentation network, the accuracy here is several percentage points higher than that of other frameworks.

zhanghang1989/ is uploading... Re upload cancel

Tags: Python Pytorch Deep Learning

Posted on Fri, 19 Nov 2021 20:53:06 -0500 by sixdollarshirt