Python, ONNX and ONNX tensorrt 5.1 customop registration

Preface

The ultimate purpose of registering op in these three frameworks is to solve the problem of special layer deployment in TRT.
The implementation process is mainly for reference onnx tutorial
The specific steps are as follows:

  1. Adding the custom operator implementation in C++ and registering it with TorchScript
  2. Exporting the custom Operator to ONNX, using: a combination of existing ONNX ops
    or a custom ONNX Operator (combine the implementation of existing onnx op or custom OP to export the custom OP to onnx)
  3. Register corresponding op in TRT and add code implementation

Practical operation

If you don't say much, this blog has great OP as an example.

Register Greater in torch:

static auto registry =
  torch::jit::RegisterOperators()
    .op("myop::Greater", &greater)
    ;

Add implementation * (only the size of the output function is defined here, and the Greater operation is not fully implemented. If you need help, you can add your own code to realize the full function.):

at::Tensor greater_cpu(const at::Tensor& input, const float thres) 
{
  AT_ASSERTM(!input.type().is_cuda(), "groudtruth input must be a CPU tensor");
    auto output = at::zeros(input.sizes(), at::device(at::kCPU).dtype(at::kByte).requires_grad(false));
    return output;
}

After the above process is implemented, you can replace the. gt function in pytorch with the customized Greater Op, as follows:

		# maskk = probs.gt(0.69)
		mask = torch.ops.myop.Greater(probs, 0.69)

Now we have successfully registered our own op in Python, and what we need to do is to make onnx recognize our Op.

Register Greater in onnx

def register_custom_op():
    def greater(g, input, thres):
        return g.op("myonnx_plugin::Greater", input, thres)

    from torch.onnx import register_custom_op_symbolic

    register_custom_op_symbolic("myop::Greater", greater, 9)

register_custom_op()

The previous code registers custom op in onnx, and then you can export the onnx model

torch.onnx.export(net, images, '~/Project/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/models/onnx/customop-ULFGFD-320.onnx',
                  opset_version=9,
                  operator_export_type=torch.onnx.OperatorExportTypes.ONNX_ATEN_FALLBACK,
                  verbose=True)#,opset_version = 9

The code to export onnx in the onnx tutorial is:

torch.onnx.export(CustomModel(), inputs, f,
                       opset_version=9,
                       example_outputs=None,
                       input_names=["X", "num_groups", "scale", "bias"], output_names=["Y"],
                       custom_opsets={"mydomain": 2})

Because I use torch 1.3, the torch.onnx.export function does not have the parameter of custom ﹐ opsets, so I use operator ﹐ export ﹐ type = torch.onnx.operatorexporttypes.onnx ﹐ aten ﹐ fallback

Next, you can export the onnx model

graph(%input.1 : Float(1, 3, 240, 320),
  ...
  %377 : Double() = onnx::Constant[value={0.69}](), scope: SSD
  %378 : Float(1, 4420) = myonnx_plugin::Greater(%376, %377), scope: SSD # ~/Project/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/vision/ssd/ssd.py:106:0
  return (%378)

Register onnx op in TRT

The main steps of registration in trt are

  1. Add the registration code in onnx-tensorrt-5.1/builtin'op'importers.cpp
    DEFINE_BUILTIN_OP_IMPORTER(Greater) {
    nvinfer1::ITensor& tensor = convertToTensor(inputs.at(0), ctx);
    nvinfer1::Dims dims = tensor.getDimensions();
    int nchan = dims.d[0];
    nvinfer1::Dims scalar_shape{1, {nchan}};
    bool istensor = inputs.at(1).is_tensor();
    bool isweight = inputs.at(1).is_weights();
    ShapedWeights weights = inputs.at(1).weights();
    ASSERT(weights.count() == 1, ErrorCode::kINVALID_VALUE);
    size_t nweight = nchan;
    float threshold;
    for(size_t i=0; i<1; i++)
    {
        threshold = (static_cast<double const*>(weights.values))[i]; //onnx only supports double
    }
    RETURN_FIRST_OUTPUT(ctx->addPluginV2(new GreaterPlugin(threshold),
        {&convertToTensor(inputs.at(0), ctx)}));
    }
    

2. Add cuda implementation of Greater layer.

Then we can export our TRT model
When registering TensorRT Op, we found the following errors:

Unsupported ONNX data type: UINT8 (2)
ERROR: /home/bokyliu/Work/TensorRT-5.1.5.0/onnx-tensorrt-5.1/ModelImporter.cpp:615 In function importModel:
[8] Assertion failed: convert_dtype( output.type().tensor_type().elem_type(), &output_trt_dtype)
Debugging has finished

Navigate to the wrong location
onnx2trt_utils.hpp

inline bool convert_dtype(int32_t onnx_dtype,
                          nvinfer1::DataType* trt_dtype) {
  switch( onnx_dtype ) {
  case ::ONNX_NAMESPACE::TensorProto::FLOAT:   *trt_dtype = nvinfer1::DataType::kFLOAT; break;
  case ::ONNX_NAMESPACE::TensorProto::INT8:    *trt_dtype = nvinfer1::DataType::kINT8;  break;
  case ::ONNX_NAMESPACE::TensorProto::FLOAT16: *trt_dtype = nvinfer1::DataType::kHALF;  break;
#if NV_TENSORRT_MAJOR >= 4
  // See ShapedWeights.cpp for sanity check if all values can be safetly downcasted to INT32
  case ::ONNX_NAMESPACE::TensorProto::INT64:   *trt_dtype = nvinfer1::DataType::kINT32; break;
  case ::ONNX_NAMESPACE::TensorProto::INT32:   *trt_dtype = nvinfer1::DataType::kINT32; break;
#endif
  default:
    cerr << "Unsupported ONNX data type: " << get_dtype_name(onnx_dtype)
         << " (" << std::to_string(onnx_dtype) << ")" << endl;
    return false;
  }
  return true;
}

It is found that TensorRT supports float32, int8, float16, etc., but it does not support unsigned types. Looking back at the reported error, I know that I made an error when defining the onnx layer. It should not be UINT8 but float. Then modify the onnx custom op code:
After amendment:

at::Tensor greater_cpu(const at::Tensor& input, const float thres) 
{
  AT_ASSERTM(!input.type().is_cuda(), "groudtruth input must be a CPU tensor");
    auto output = at::zeros(input.sizes(), at::device(at::kCPU).dtype(at::kFloat).requires_grad(false));
    return output;
}

Last

Finally, we can export our trt model, and run onnx-tensorrt-5.1 successfully!

----------------------------------------------------------------
Input filename:   /home/bokyliu/Project/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/models/onnx/customop-ULFGFD-320.onnx
ONNX IR version:  0.0.4
Opset version:    9
Producer name:    pytorch
Producer version: 1.3
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.4) than this parser was built against (0.0.3).
Parsing model
Building TensorRT engine, FP16 available:0
    Max batch size:     32
    Max workspace size: 1024 MiB
Writing TensorRT engine to /home/bokyliu/Project/Ultra-Light-Fast-Generic-Face-Detector-1MB-master/models/onnx/customop-ULFGFD-320.trt
All done
Published 8 original articles, won praise 1, visited 243
Private letter follow

Tags: Python

Posted on Sun, 19 Jan 2020 06:36:01 -0500 by Alien