Cambrian acceleration platform (MLU200 Series) Fishing Guide - model transplantation - segmentation network example

PS: if you want to reprint, please indicate the source. I own the copyright.

PS: This is only based on my own understanding,

If it conflicts with your principles and ideas, please understand and do not spray.

Environmental description
  • Ubuntu 18.04
  • One MLU270 accelerator card
  • Cambrian pytoch docker transplantation environment

preface

  before reading this article, please be sure to know the following pre article concepts:

  • Fishing Guide for Cambrian acceleration platform (MLU200 Series) (I) - basic concepts and related introduction( https://blog.csdn.net/u011728480/article/details/121194076 )
  • Fishing Guide for Cambrian acceleration platform (MLU200 Series) (II) - model transplantation - environment construction( https://blog.csdn.net/u011728480/article/details/121320982 )

  after the introduction of the previous two articles, we also have a simple understanding of the Cambrian acceleration platform. In order to deepen our understanding of the Cambrian platform, I will use an example of partition network to show the whole model transplantation and deployment process of the Cambrian platform.

   if there is infringement in the part quoted in the text, please contact me in time to delete it.





Basic introduction of examples

  here is a brief introduction to this simple segmentation network. CamVid data set is used for training. The input is 1 * 3 * 480 * 480. The output is 480 * 480.

  the final effect here is to segment the car in the input picture. The final network effect test is shown in the figure below:

  at this time, we also got a pth model file that can be used for transplantation and testing.





Basic steps of migrating model

   in fact, the model transplantation of Pytorch is relatively simple, and it can be tested according to a certain process. The basic process I summarized is as follows:

  • In docker, run the cpu version of the model reasoning code.
  • In docker, run the cpu version of the quantitative model to generate code, and test the quantitative model at the same time.
  • In docker, the quantitative model is converted into an offline model.


Running cpu inference code in Docker

  up to now, according to the description of Cambrian official documents, the existing docker environment is pytorch 1.3 environment, which may be different from pytorch 1.7 + supported by mainstream models. Therefore, in order to carry out the follow-up work smoothly, we should not start quantifying the model at first, but first ensure that the model can work normally in the pytorch 1.3 environment.

  after training the model, we will get the pth file, and then make a script to test the pth file in the training environment to judge the effect of the model. Similarly, we should put this test script into the migration environment and run it again. Generally speaking, there will be more or less problems.

    so far, we have encountered two types of problems. One is pytorch1.3. Some operators are not supported. You can replace them with other similar operators or implement this operator yourself. The second type is some version problems. For example, the format of model saving is zip format after pytorch1.6 (see the notes of torch.save api for details). The old version needs to be used to load the model_ use_new_zipfile_serialization=False re store the model file.

  generally speaking, the approximate model conversion code is as follows:

# There is a model test.pth(zip format)
# There is an acquired model network structure class: TestModel
import torch

model = TestModel()
state_dict = torch.load('test.pth', map_location=torch.device('cpu'))
model.load_state_dict(state_dict, strict=True)           

torch.save(model, 'new_test.pth', _use_new_zipfile_serialization=False)
# Got the old pth file. It is convenient to load under pytorch 1.6


Processing quantitative models in Docker

   there are two steps here. First, use the pytorch interface of Cambrian to generate the quantitative model, and then test the quantitative model. Note that there are two types of quantization models generated here, one is INT8 and the other is INT16. How to select them depends on the actual situation. Generally speaking, classification and segmentation algorithms can try to use INT8 directly, and target detection needs to be tested before reaching a conclusion. In addition, the reduction of INT8 also means the improvement of reasoning speed. Unless otherwise specified, INT8 mode is adopted by default.

   in addition, it should be noted that quantization generally refers to layers with large amount of parameters such as quantization convolution and full connection, and other model parameters still exist as FP16 or FP32.

  first generate a quantitative model:

# There is a model new_ Test.pth (non zip format)
# There is an acquired model network structure class: TestModel
import torch
import torch_mlu.core.mlu_quantize as mlu_quantize

model = TestModel()
state_dict = torch.load('new_test.pth', map_location=torch.device('cpu'))
model.load_state_dict(state_dict, False)          
mean=[]
std=[] 
# Note that this interface does not use firstconv optimization. It is used to accelerate the normalization in the first layer. However, it is not necessary to do this for the preprocessing of some models. For specific information, please refer to the official Cambrian documents.
net_quantization = mlu_quantize.quantize_dynamic_mlu(model, {'mean':mean, 'std':std, 'firstconv':False}, dtype='int8', gen_quant=True)
torch.save(net_quantization.state_dict(), 'test_quantization.pth')

# The quantitative model file test of INT8 is obtained_ quantization.pth

   then test on the quantitative model. The content of this step is to verify whether the customized pytorch quantization operator of Cambrian can normally obtain the results after model quantization:

# There is a quantitative model file test of INT8_ quantization.pth
# There is an acquired model network structure class: TestModel
import torch_mlu
import torch_mlu.core.mlu_model as ct
import torch_mlu.core.mlu_quantize as mlu_quantize

model = TestModel()
 
# step 1
net = mlu_quantize.quantize_dynamic_mlu(model)
# step 2
net.load_state_dict(torch.load('test_quantization.pth'))
# Here is
input_data=torch.randn((1,3,480,480))
# step 3
net_mlu = net.to(ct.mlu_device())
input_mlu = input_data.to(ct.mlu_device())
# step 4
output=net_mlu(input_mlu)
print(output.cpu())
# The shape of output is 480 * 480

  if the reasoning results after quantification here are accurate, it basically proves that the model transplantation is successful. In fact, it can be seen from here that the mlu here can be compared with cuda, and we can roughly guess what kind of existence mlu is.



Generate offline model in Docker

   on the basis of the previous, we actually quickly and conveniently generated the offline model, but there are also two kinds of offline models here. Remember that the quantization mentioned above only quantifies the parameters of some special layers, and other layers in the model also use FP16 or FP32. Therefore, there are two kinds of offline models, one is FP16 and the other is FP32. Generally speaking, an INT8 version of FP16 offline model is the best offline model.

  generate MLU220 offline model:

# There is a quantitative model file test of INT8_ quantization.pth
# There is an acquired model network structure class: TestModel
import torch_mlu
import torch_mlu.core.mlu_model as ct
import torch_mlu.core.mlu_quantize as mlu_quantize

model = TestModel()
 
# step 1
net = mlu_quantize.quantize_dynamic_mlu(model)
# step 2
net.load_state_dict(torch.load('test_quantization.pth'))
# 
input_data=torch.randn((1,3,480,480))
# step 3
net_mlu = net.to(ct.mlu_device())
input_mlu = input_data.to(ct.mlu_device())


# View the document in detail, generally 4
core_number = 4
ct.set_core_number(core_number)
ct.set_core_version('MLU220')
# torch_mlu.core.mlu_model.set_input_format(input_format)
ct.save_as_cambricon('test')


net_trace = torch.jit.trace(net_mlu, input_mlu, check_trace=False)

net_trace(input_mlu) 

torch_mlu.core.mlu_model.save_as_cambricon("")

# Finally, we get test. Cambricon and test.cambricon_twins.  test.cambricon_twins is the description file of the offline model, including information such as input data format and channel, as well as output related information.

  so far, we have obtained the offline model and completed the first half of our model migration.

   in addition, if you want to get the offline model of MLU270, you can also set_ core_ Change the version parameter to MLU270. If the model and input tensor are called half(), the model format of fp16 will be obtained. Refer to the official Cambrian documents for details.

  .cambricon_twins file plays an important role in describing the off-line model. The network is the input and output format and channel. After all, our network is generally not right, which is largely due to the problems of preprocessing and post-processing. I'll give. Cambricon below_ Two instances of twins are INT8FP32 and INT8FP16.





Postscript

  the migration process of the model is basically fixed. Once you are familiar with it, you will not change it.

   generally, there are six models generated in the end, two quantitative models of INT8 and INT16. Four offline models, INT8-FP32, INT8-FP16, INT16-FP32, INT16-FP16. Different models correspond to different speeds and accuracy, which can be selected according to the actual situation of the model.

   note that Cambrian supports caffe and tensorflow in addition to pytoch model transplantation. Therefore, if you need to convert these models, please check the corresponding documents.

reference

  • Fishing Guide for Cambrian acceleration platform (MLU200 Series) (I) - basic concepts and related introduction( https://blog.csdn.net/u011728480/article/details/121194076 )
  • Fishing Guide for Cambrian acceleration platform (MLU200 Series) (II) - model transplantation - environment construction( https://blog.csdn.net/u011728480/article/details/121320982 )
  • https://www.cambricon.com/
  • https://www.cambricon.com/docs/pytorch/index.html
  • Other relevant confidential information.


Reward, subscribe, collect, throw bananas and coins, please pay attention to the official account.

PS: please respect the original, don't spray if you don't like it.

PS: if you want to reprint, please indicate the source. I own all rights.

PS: please leave a message if you have any questions. I will reply as soon as I see it.

Tags: AI Embedded system dl

Posted on Sun, 21 Nov 2021 21:55:55 -0500 by pbdude23