Implementation of Resnet gesture recognition based on Edit nested model on AI DSW
AI-DSW (Data science workshop) is a cloud deep learning and development environment specially prepared for algorithm developers,
In DSW, only two kernels, namely, KerasCode and KerasGraph, have implemented the FastNeuralNetwork function.
- KerasCode: write deep learning network code first, and then turn the code into a graph
- KerasGraph: build a deep learning network directly from the canvas, and turn the graph into code
Next, we implement Resnet18 to realize gesture recognition, for example, to experience the use of AI-DSW
Our task is that the English alphabet data set of sign language contains the information of 26 English alphabets represented by sign language. We identify the English alphabet of sign language by building ResNet18 model
In the official document of AI-DSW, it is recommended that we use sequential method to build the model, but nested encapsulation can make the structure clearer and some contents can be reused. Let's see the code specifically:
def Conv2d_BN(x, nb_filter, kernel_size, strides=(1, 1), padding='same'): x = Conv2D(nb_filter, kernel_size, padding=padding, strides=strides)(x) x = BatchNormalization(axis=3)(x) x = Activation('relu')(x) return x
First, we encapsulate the most common CNN modules, including convolution, BN, activation function, which are used for reusing Resnet model;
def identity_Block(inpt, nb_filter, kernel_size, strides=(1, 1), with_conv_shortcut=False): x = Conv2d_BN(inpt, nb_filter=nb_filter, kernel_size=kernel_size, strides=strides, padding='same') x = Conv2d_BN(x, nb_filter=nb_filter, kernel_size=kernel_size, padding='same') if with_conv_shortcut:#The meaning of shortcut is: connect the input layer x with the last output layer y, as shown in the figure above shortcut = Conv2d_BN(inpt, nb_filter=nb_filter, strides=strides, kernel_size=kernel_size) x = add([x, shortcut]) return x else: x = add([x, inpt]) return x
Next, we implement Resnet for the residualblock module, that is, the Residual Block. Based on the Residual Block, we can effectively improve the network performance and the generalization ability of the model, as shown in the figure:
With the core module, we can start to build the core structure of the model, including a series of steps such as input, convolution, residual, pooling, full connection, output, etc
def resnet_18(width,height,channel,classes): inpt = Input(shape=(width, height, channel)) # x = ZeroPadding2D((3, 3))(inpt) #conv1 x = Conv2d_BN(inpt, nb_filter=64, kernel_size=(7, 7), strides=(2, 2), padding='valid') x = MaxPooling2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x) #conv2_x x = identity_Block(x, nb_filter=64, kernel_size=(3, 3)) x = identity_Block(x, nb_filter=64, kernel_size=(3, 3)) #conv3_x x = identity_Block(x, nb_filter=128, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True) x = identity_Block(x, nb_filter=128, kernel_size=(3, 3)) #conv4_x x = identity_Block(x, nb_filter=256, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True) x = identity_Block(x, nb_filter=256, kernel_size=(3, 3)) #conv5_x x = identity_Block(x, nb_filter=512, kernel_size=(3, 3), strides=(2, 2), with_conv_shortcut=True) x = identity_Block(x, nb_filter=512, kernel_size=(3, 3)) x = GlobalAvgPool2D()(x) x = Dense(classes, activation='softmax')(x) model = Model(inputs=inpt, outputs=x) return model
Official introduction at DSW https://www.alibabacloud.com/help/zh/doc-detail/126303.htm The sequential model is used to show the model. Here we find that the nested strategy can also be used to generate the model structure, as shown in the figure:
In the same way, according to the official documents, we can also visually edit the model, adjust the parameters, etc
After having the model, we define the loss function, add the training set verification set to train the optimization model, and finally get the result.
To sum up, after experiencing KerasGraph, I feel that it represents the latest evolution direction of ai development environment similar to low code Code) editor can quickly build model structure and verify model effect, improve our implementation efficiency of model structure, avoid tangle and tedious source code in TF, but Focus on model structure optimization itself, which is good in general.
Of course, there are some problems in the current use of KerasGraph:
- All kinds of pre training models, such as keras_bert and resnet, are not supported at this time. However, when the pre training model is supported and even the last several layers of the pre training model are edited, the practicability will be greatly improved
- The front end of the KerasGraph graphical interface takes up too much memory, sometimes causing page jams
- The usability of KerasGraph for parameter editing and definition of each layer needs to be improved. At present, it is not much easier than consulting documents
Of course, this does not prevent KerasGraph from being a better model presentation tool. I also believe that in time, KerasGraph will make a breakthrough in model editing