After using PyTorch for some time, I can't put it down (0-0). I heard that there is already a C + + interface. In the application process, it is inevitable to use Finetune / parameter initialization / model loading, etc.
Model save / load
1. All model parameters
During the training process, sometimes the training will be stopped for various reasons. At this time, we need to pay attention to saving the model of each round of epoch (generally save the best model and the current round model). Generally, the storage method recommended in pytorch is used. This method saves the parameters of the model.
#Save the model to checkpoint.pth.tar torch.save(model.module.state_dict(), 'checkpoint.pth.tar')
The corresponding loading model method is (this method needs to deserialize the model first to obtain the parameter dictionary, so you must load the model first, and then load_state_dict):
mymodel.load_state_dict(torch.load('checkpoint.pth.tar'))
After the above save, an example is given to illustrate how to use the information and / or resume train.
#Save the model status. You can set some parameters for later use state = {'epoch': epoch + 1,#Current rounds saved 'state_dict': mymodel.state_dict(),#Trained parameters 'optimizer': optimizer.state_dict(),#Optimizer parameters for subsequent resume 'best_pred': best_pred#Current best accuracy ,....,...} #Save the model to checkpoint.pth.tar torch.save(state, 'checkpoint.pth.tar') #If it is best, copy it if is_best: shutil.copyfile(filename, directory + 'model_best.pth.tar') checkpoint = torch.load('model_best.pth.tar') model.load_state_dict(checkpoint['state_dict'])#model parameter optimizer.load_state_dict(checkpoint['optimizer'])#Optimization parameters epoch = checkpoint['epoch']#epoch, which can be used to update the learning rate, etc #With the above, you can continue to retrain, and you don't need to worry about stopping the program and retraining. train/eval .... ....
The above is the method suggested by pytorch. Of course, there is the second method. This method is not flexible and is not recommended.
#preservation torch.save(mymodel,'checkpoint.pth.tar') #load mymodel = torch.load('checkpoint.pth.tar')
2. Some model parameters
In many cases, we load the trained model, and the trained model may not be exactly the same as the model we define, but we only want to use the same parameters of those layers.
There are several solutions:
(1) Build your own model directly from the trained model, that is, load the trained model first, and then define your own model based on it;
model_ft = models.resnet18(pretrained=use_pretrained) self.conv1 = model_ft.conv1 self.bn = model_ft.bn ... ...
(2) Define the model yourself and load the model directly
#The first method: mymodelB = TheModelBClass(*args, **kwargs) # strict=False, set to false, only parameters with the same key value are reserved mymodelB.load_state_dict(model_zoo.load_url(model_urls['resnet18']), strict=False) #The second method: # Loading model model_pretrained = models.resnet18(pretrained=use_pretrained) # mymodel's state_dict, # For example: conv1.weight # conv1.bias mymodelB_dict = mymodelB.state_dict() # Set model_ The pre trained model is compared with the user-defined model, and different models are eliminated pretrained_dict = {k: v for k, v in model_pretrained.items() if k in mymodelB_dict} # Update existing model_dict mymodelB_dict.update(pretrained_dict) # Load the state we really need_ dict mymodelB.load_state_dict(mymodelB_dict) # Method 2 may be more intuitive
Parameter initialization
The second problem is parameter initialization, which is used in many codes. After all, not all have pre training parameters. At this time, it is necessary to initialize parameters that are not pre training parameters. Each Tensor in pytorch is actually an encapsulation of Variabl, which contains interfaces such as data and grad, so these interfaces can be used to directly assign values. It also provides how to directly assign the model parameters trained by other frameworks (caffe/tensorflow/mxnet/gluonCV, etc.) to pytorch. In fact, it is a direct assignment to data.
pytorch provides methods to initialize parameters:
def weight_init(m): if isinstance(m,nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0,math.sqrt(2./n)) elif isinstance(m,nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_()
However, generally, if there is no great demand for initialization parameters and there is no problem (when it is uncertain whether the performance will be affected), pytorch has default initialization parameters.
Fintune
The last is fine tuning. We usually do experiments. At least the backbone uses the pre trained model as the feature extractor, or fine tune it.
When used for feature extraction, some parameters of feature extraction are not required to be learned, and pytorch provides requirements_ The grad parameter is used to determine whether to enter the gradient calculation, that is, whether to update the parameters. Take minist as an example and use resnet18 for feature extraction:
#Load pre training model model = torchvision.models.resnet18(pretrained=True) #Traverse each parameter and set it to not update the parameter, that is, not learn for param in model.parameters(): param.requires_grad = False # Change the full connection layer to the 10 classes required by mnist. Note: after this change, require_ Grad defaults to True model.fc = nn.Linear(512, 10) # optimization optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)
When it is used for global fine tuning, we generally need to set different learning rates for different layers. The learning rate of the pre training layer is smaller and that of other layers is larger. How do you do that?
# Load pre training model model = torchvision.models.resnet18(pretrained=True) model.fc = nn.Linear(512, 10) # reference resources: https://blog.csdn.net/u012759136/article/details/65634477 ignored_params = list(map(id, model.fc.parameters())) base_params = filter(lambda p: id(p) not in ignored_params, model.parameters()) # Set different learning rates for different parameters params_list = [{'params': base_params, 'lr': 0.001},] params_list.append({'params': model.fc.parameters(), 'lr': 0.01}) optimizer = torch.optim.SGD(params_list, 0.001, momentum=args.momentum, weight_decay=args.weight_decay)
Finally, sort out the basic model of pytorch pre training at present:
(1)torchvision
Different pre training models have been provided in torchvision, which is generally enough.
pytorch/visiongithub.com is uploading... Re upload cancel
Various versions of alexnet/densenet (densenet121 / densenet169 / densenet201 / densenet161) / inception are included_ V3 / RESNET versions ('resnet18 ',' resnet34 ',' resnet50 ',' resnet101 ',' resnet152 ') / squeezenet versions ('squeezenet1_0', 'squeezenet1_1')/VGG versions ('vgg11 ',' vgg11_bn ',' vgg13_bn ',' vgg16_bn ',' vgg19_bn ',' vgg19 ')
(2) Other pre trained models, such as SENet/NASNet, etc.
Cadene/pretrained-models.pytorchgithub.com
(3)gluonCV to pytorch model, including classification network and segmentation network, the accuracy here is several percentage points higher than that of other frameworks.
zhanghang1989/gluoncv-torchgithub.com is uploading... Re upload cancel