In essence, the attention mechanism in deep learning is similar to human selective visual attention mechanism. The core goal is to select information more critical to the current task goal from a large number of information. The attention mechanism is a kind of weighting.
Generally speaking, attention mechanism is to hope that the network can automatically learn the places that need attention in pictures or text sequences.
Convolution is a linear process. In order to increase the nonlinear characteristics, pool layer and activation layer are added. This process is a series of matrix multiplication and element corresponding nonlinear multiplication, and the characteristic elements interact with each other by over addition.
The attention mechanism adds the multiplication of corresponding elements, which can increase the nonlinearity of features and simplify other operations.
Attention mechanisms can be divided into:
- Channel attention mechanism: generate a mask mask for the channel and score it, representing senet and channel attention module
- Spatial attention mechanism: generate and score the spatial mask, representing the Spatial Attention Module
- Mixed domain attention mechanism: channel attention and spatial attention are evaluated and scored at the same time, represented by bam and CBAM
(1) Spatial attention
The input characteristic graph is averaged and maximized respectively from the channel dimension, and a convolution layer with the number of channels of 2 is combined. Then, through a convolution, a spatial attention with the number of channels of 1 is obtained. Finally, the characteristic graph and spatial attention are multiplied.
Theoretically, this module can be inserted into any layer, but because all channels share a conversion matrix, it is meaningful only after the input image, and the information generated by each convolution kernel is different after convolution.
(2) Channel attention
The characteristic graph (H*W*C) performs global average pooling (1*1*C) and global maximum pooling (1*1*C) at the same time. At the same time, input the full connection layer and add it (1*1*C), then input the activation function layer (sigmoid) to generate the weight (1*1*C). Finally, multiply the weight with the characteristic graph (H*W*C).
(3) Mixed attention
The full name of CBAM is the revolutionary block attention module ECCV2018 One of the representative works of attention mechanism published on. In this paper, the author studies attention in network architecture. Attention should not only tell us where to focus, but also improve the expression of focus. The goal is to increase expressiveness by using attention mechanisms, focus on important features and suppress unnecessary features. In order to emphasize the meaningful features in the two dimensions of space and channel, the author successively applies the channel and spatial attention modules to learn what to pay attention to and where to pay attention to in the channel and spatial dimensions respectively. In addition, understanding the information to be emphasized or suppressed also contributes to the flow of information within the network.
The main network architecture is also very simple. One is channel attention module and the other is spatial attention module. CBAM integrates channel attention module and spatial attention module successively.
#!/usr/bin/env python # -*- coding:utf-8 -*- """ @author: mengxie @software: PyCharm @file: UNet.py @time: 2021/9/10 @desc: """ import tensorflow as tf from keras import backend as K from keras import Model, layers from keras.applications import vgg16 from keras.layers import Lambda, Activation,Input, Conv2D, Multiply,BatchNormalization, Activation, Reshape, MaxPooling2D, concatenate, UpSampling2D, Dropout,Concatenate,Dense, GlobalAveragePooling2D, GlobalMaxPool2D #Channel attention (feature, number of convolution kernels, name of each layer) def channel_attention(input_feature, unit, name, ratio=8): print("input_feature:", input_feature.shape) channel = input_feature.get_shape()[-1] print(channel) # Channel Attention avgpool = GlobalAveragePooling2D(name=name+'channel_avgpool')(input_feature) #avg_pool = tf.reduce_mean(input_feature, axis=[1 ,2], keepdims=True) print("avgpool:", avgpool.shape) #assert avgpool.get_shape()[1:] == (1, 1, channel) maxpool = GlobalMaxPool2D(name=name+'channel_maxpool')(input_feature) #assert maxpool.get_shape()[1:] == (1, 1, channel) # Shared MLP Dense_layer1 = Dense(unit//ratio, activation='relu', name=name+'channel_fc1') Dense_layer2 = Dense(unit, activation='relu', name=name+'channel_fc2') avg_out = Dense_layer2(Dense_layer1(avgpool)) max_out = Dense_layer2(Dense_layer1(maxpool)) channel = layers.add([avg_out, max_out]) channel = Activation('sigmoid', name=name+'channel_sigmoid')(channel) channel_scale = Reshape((1, 1, unit), name=name+'channel_reshape')(channel) channel_out = Multiply()([input_feature, channel_scale]) print("channel_out:", channel_out.shape) return channel_out #Spatial attention channel (feature, name of each layer, convolution kernel size) def spatial_attention(input_feature, name, kernel_size=7,): avg_pool = layers.Lambda(lambda x: tf.reduce_mean(x, axis=3, keepdims=True, name=name+'spatial_avgpool'))(input_feature) assert avg_pool.get_shape()[-1] == 1 max_pool = layers.Lambda(lambda x: tf.reduce_max(x, axis=3, keepdims=True, name= name+'spatial_maxpool'))(input_feature) assert max_pool.get_shape()[-1] == 1 spatial = concatenate([avg_pool, max_pool], axis=3) assert spatial.get_shape()[-1] == 2 spatial = Conv2D(1, (kernel_size, kernel_size), strides=1, padding='same', name=name+'spatial_conv2d')(spatial) spatial_out = Activation('sigmoid', name=name+'spatial_sigmoid')(spatial) out = Multiply()([input_feature, spatial_out]) return out #Mixed attention mechanism (CABM) (feature, number of convolution kernels, name of each layer) def cbam_block(input_feature, unit, name, ratio=8): """Contains the implementation of Convolutional Block Attention Module(CBAM) block. As described in https://arxiv.org/abs/1807.06521. """ attention_feature = channel_attention(input_feature, unit, name, ratio) attention_feature = spatial_attention(attention_feature, name, kernel_size=7) return attention_feature
Spatial attention mechanism is to train a transformation space through certain methods to feel our target position. And added to the subsequent network to increase the training effect.
Channel attention is to enhance the occupancy ratio of useful features through learning.
https://blog.csdn.net/qq_29407397/article/details/105616932https://blog.csdn.net/qq_29407397/article/details/105616932 [Attention mechanism in CV] simple and effective CBAM module ---- Zhihu (great)