[super score] embedded block residual network: a recursive restoration model for single image super resolution

ICCV2019 

Thesis address: Embedded Block Residual Network: A Recursive Restoration Model for Single-Image Super-Resolution | IEEE Conference Publication | IEEE Xplorehttps://ieeexplore.ieee.org/abstract/document/9010860

Code address: https://github.com/alilajevardi/Embedded-Block-Residual-Networkhttps://github.com/alilajevardi/Embedded-Block-Residual-Network 

  Single image super-resolution restores the lost structure and texture in low resolution images, which has attracted extensive attention in the research community. The best performing in this field include deep or wide convolutional neural networks, or recursive neural networks. However, these methods enforce a single model to deal with various textures and structures. A typical operation is that a layer restores the texture based on the layer restored by the previous layer, while ignoring the characteristics of the image texture. This paper holds that the low-frequency and high-frequency information in the image has different complexity and should be restored by models with different representation capabilities. Inspired by this, the author proposes a new embedded block residual network (EBRN), which is the progress of incremental restoration of texture super-resolution. Specifically, different modules in the model recover information of different frequencies. For low-frequency information, we use the shallower module in the network to recover; For high-frequency information, we use deeper modules for recovery.

The complexity of low-frequency and high-frequency information is different. Therefore, when recovering these two parts of information, models with different complexity or network structures with different depths should be used. If the same network structure or model is used to restore these two parts of information, the following situations will occur:  

[picture source]: ICCV19 super resolution Article 2 - Zhihu

Considering that the feature distribution is different in different frequency bands, the low-frequency information is composed of simpler structure and texture, in which a simpler recovery function is required; High frequency information is composed of complex structure and texture, which requires more complex recovery function. At present, the existing methods based on depth model can not distinguish the frequency of images. The task of each layer in these models is to recover all information based on the characteristics of the previous layer. For shallow layers, the parameters may be suitable for low-frequency information (with simple texture), but not for high-frequency information (with complex texture). For the deep layer, the parameters can fit the high-frequency information, but over fit the low-frequency information.

The inconsistency between model complexity and frequency is a key problem that limits the performance of these depth cnn based methods. Although residual connection provides a method to segment information into recovered and unrecovered, the residual architecture has no correlation with the principle of frequency segmentation. Instead, they advocate that residual connections transmit shallow information to deep in a dense and direct way.

As shown in Figure 3, the block residual module (BRM) is the basic module of the model in the paper. It divides the data stream into super-resolution stream and back projection stream.

The former stream recovers most of the low-frequency structures and textures, while the latter stream calculates the high-frequency information and still recovers from a deeper level. In this way, BRM is responsible for recovering information at a lower frequency and transmitting information at a higher frequency to a deeper BRM. In order to fuse the outputs of all BRMS, a cyclic fusion technique is also proposed to stabilize the feature flow and gradient flow in training and encourage faster training convergence.

Main contributions:

  • A motivation is proposed, that is, the motivation of different complexity models to restore different frequency information in images. In the bad case, the low-frequency information can be over recovered by the deeper model, while the low-frequency information can be fully recovered by the shallower model.
  • A block residual module (BRM) is proposed, which attempts to recover the image structure and texture, and transfer the information that is difficult to recover to the deeper module. This enables each BRM to focus on the information of the appropriate frequency, which is important to ensure the correlation between the model complexity and the image frequency.
  • A new technology of embedding multiple BRM S is proposed, which can effectively improve the final reconstruction quality based on the output of each module. Experiments also show that the proposed model is superior to the most advanced model.

  Figure 2 overall network structure

 

 Block Residual Module

The purpose of block residual module (BRM) is to recover some HR information and transmit the remaining signals to deeper modules for recovery. In this regard, the module contains two data streams: super-resolution stream and back projection stream.

As shown in Fig. 3, the super-resolution stream (the upper branch in the figure) is a basic deconvolution network, which takes the LR feature map Ix as the input and uses stacked deconvolution (up) and three convolution layers for processing. The output of the stream is the super analytic feature map Ox, where x is the index of BRM in the model.

In order to calculate the information that has not been recovered from the super-resolution stream, the back projection stream adopts an operation. Firstly, the deconvolution feature map is downsampled to the LR space size, and then the difference between the input LR feature map and the downsampled feature map of the module is calculated. The calculated residuals convey information that cannot be recovered by super-resolution flow. Then, it is processed by the local residual learning stage to output a set of coding features I x+1 to form the input of the next BRM.  

All convolution layers use 3 × three × 64 convolution kernels. Except for down sampling, each layer is set with a step of 1 × 1. The filling size is 1 × 1.

The local residual learning stage is to encourage the training of a fast convergence rate, just as in other residual learning methods.

  BRM Code: (TensorFlow)

 

 def BRModule(self, input_data, BRM_x, scale=4):
        """
        A single Block Residual Module (BRM)
        :param input_data: tf object
        :param BRM_x: index of BRM, x sub-index in the paper
        :param scale: magnifying scale factor
        :return: two tf objects for upper (super resolved) and lower (back-projected) flows
        """

        x1 = Conv2DTranspose(filters=64, kernel_size=scale, strides=scale, padding='valid', activation=PReLU(),
                             kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                                distribution="untruncated_normal"),
                             name='BRM{}_CT'.format(str(BRM_x)))(input_data)
        xup = x1
        for i in range(3):
            xup = Conv2D(filters=64, kernel_size=3, padding='same', activation=PReLU(),
                         kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                            distribution="untruncated_normal"),
                         name='BRM{}_C{}_u'.format(str(BRM_x), str(i + 1)))(xup)


        x2 = Conv2D(filters=64, kernel_size=scale, strides=scale, padding='valid', activation=PReLU(),
                    kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                       distribution="untruncated_normal"),
                    name='BRM{}_C{}_b'.format(str(BRM_x), str(1)))(x1)

        x2 = Subtract(name='BRM{}_S_b'.format(str(BRM_x)))([input_data, x2])
        xdn = x2

        for i in range(3):
            x2 = Conv2D(filters=64, kernel_size=3, padding='same', activation=PReLU(),
                        kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                           distribution="untruncated_normal"),
                        name='BRM{}_C{}_b'.format(str(BRM_x), str(i + 2)))(x2)

        xdn = Add(name='BRM{}_A_b'.format(str(BRM_x)))([xdn, x2])
        return xup, xdn  # xup: SR flow in upper line,,, xdn: Residual flow in bottom line
Embedded Block Residual Network
The embedded block residual network (EBRN) is composed of multiple BRM S, as shown in Figure 2.

Before the first BRM, an initial feature extraction module is proposed to represent the shape of the feature graph. In this module, the first convolution layer generates 256 channel feature maps, and then two convolution layers are stacked, each of which outputs 64 channel feature maps. The convolution kernel size of these layers is 3 × 3.

    def FeatureExt(self, input_data):
        """
        The first part of EBR Network, extract features from input image
        :param input_data: input image batch
        :return: tf object
        """
        x = input_data
        f = 256
        for i in range(3):
            x = Conv2D(filters=f, kernel_size=3, padding='same', activation=PReLU(),
                       kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                          distribution="untruncated_normal"),
                       name='FE_C{}'.format(str(i + 1)))(x)
            f = 64

        return x

  BRM is composed in an embedded way, not a simple stacking way. That is, the first BRM is stacked on the output of the initial feature extraction module, the second BRM is connected to the output of the back projection stream of the first BRM, and so on. Each BRM is responsible for restoring the residual feature map generated by the back projection flow of its previous BRM.

Note that the last BRM contains only the super-resolution stream that discards the back projection stream. In this way, the information with lower frequency can only pass through the shallow BRM with lower model complexity. The problem of over fitting this part of information can be avoided. On the other hand, higher frequency information flows to deeper BRM with higher model complexity, which can alleviate the problem of under fitting. Therefore, the deeper BRM always tries to recover what the shallower BRM does not recover.

In order to combine the outputs of all BRMS, the authors note that the information recovered through the depth module can help improve the recovery of the shallow module. Therefore, this paper proposes a recursive fusion technique instead of simple summation. Specifically, the outputs of the super-resolution streams of two adjacent BRMS are added, followed by a convolution layer. As shown in the red box above.

f stands for convolution layer. Compared with simple summation, this technique can process the output smoothly and get better reconstruction.

  In addition, in order to avoid the gradient disappearance problem in training, the author proposes to connect the output of each BRM directly to the image reconstruction module. As shown in Figure 2 (purple part in the figure, concat), we combine the output of all BRMS through a connection layer and a reconstruction subnet. This design has two advantages:

  1. The error propagation mode to depth BRM is shortened and the fast convergence speed in training is accelerated.
  2. It is beneficial that the intermediate feature map of the model is reused for reconstruction.

Reconfiguration of subnet 3 × three × 64 convolution kernel, and the last layer generates a 3-channel RGB image.

Rebuild Code:

def Reconstruct(self, input_data):
        """
        The last part of network to reconstruct the final image
        :param input_data: tf object
        :return: batch of super resolution images
        """
        # reconstruction layer
        x = Conv2D(filters=3, kernel_size=3, padding='same', activation=PReLU(),
                   kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                      distribution="untruncated_normal"),
                   name='Rec_C')(input_data)
        return x

 

Full code:

from tensorflow.keras.layers import PReLU, Subtract, Add, Concatenate
from tensorflow.keras.layers import Input, Conv2D, Conv2DTranspose, Lambda
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam, schedules
from tensorflow.keras.losses import mean_squared_error, mean_absolute_error
from tensorflow.keras.initializers import VarianceScaling


class EBRNClass:
    def __init__(self, leaning_rate_dict, fine_tuning=False):
        """
        Construct the model class.
        :param leaning_rate_dict: learning rates and counterpart steps to change during training process
        :param fine_tuning: Boolean. train a model for first time (False) or fine tuning (True)
        of an already trained model
        """
        self.learning_rate_change = leaning_rate_dict
        self.lr_schedule = schedules.PiecewiseConstantDecay(boundaries=self.learning_rate_change['epoch'],
                                                            values=self.learning_rate_change['lr'])
        self.fine_tuning=fine_tuning

    def FeatureExt(self, input_data):
        """
        The first part of EBR Network, extract features from input image
        :param input_data: input image batch
        :return: tf object
        """
        x = input_data
        f = 256
        for i in range(3):
            x = Conv2D(filters=f, kernel_size=3, padding='same', activation=PReLU(),
                       kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                          distribution="untruncated_normal"),
                       name='FE_C{}'.format(str(i + 1)))(x)
            f = 64

        return x

    def BRModule(self, input_data, BRM_x, scale=4):
        """
        A single Block Residual Module (BRM)
        :param input_data: tf object
        :param BRM_x: index of BRM, x sub-index in the paper
        :param scale: magnifying scale factor
        :return: two tf objects for upper (super resolved) and lower (back-projected) flows
        """

        x1 = Conv2DTranspose(filters=64, kernel_size=scale, strides=scale, padding='valid', activation=PReLU(),
                             kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                                distribution="untruncated_normal"),
                             name='BRM{}_CT'.format(str(BRM_x)))(input_data)
        xup = x1
        for i in range(3):
            xup = Conv2D(filters=64, kernel_size=3, padding='same', activation=PReLU(),
                         kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                            distribution="untruncated_normal"),
                         name='BRM{}_C{}_u'.format(str(BRM_x), str(i + 1)))(xup)


        x2 = Conv2D(filters=64, kernel_size=scale, strides=scale, padding='valid', activation=PReLU(),
                    kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                       distribution="untruncated_normal"),
                    name='BRM{}_C{}_b'.format(str(BRM_x), str(1)))(x1)

        x2 = Subtract(name='BRM{}_S_b'.format(str(BRM_x)))([input_data, x2])
        xdn = x2

        for i in range(3):
            x2 = Conv2D(filters=64, kernel_size=3, padding='same', activation=PReLU(),
                        kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                           distribution="untruncated_normal"),
                        name='BRM{}_C{}_b'.format(str(BRM_x), str(i + 2)))(x2)

        xdn = Add(name='BRM{}_A_b'.format(str(BRM_x)))([xdn, x2])
        return xup, xdn  # xup: SR flow in upper line,,, xdn: Residual flow in bottom line


    def EmbeddedBR(self, input_data, n_blocks, scale):
        """
        Combination of n BRMs
        :param input_data: tf object
        :param n_blocks: number of BRM in network
        :param scale: magnifying scale factor
        :return: tf object
        """
        x1 = []
        x2 = []
        # for the first BRM data comes from feature extraction layer
        xdn = input_data

        # execute all block residual modules (BRMs) by passing xdn from one to next BRM
        for i in range(0, n_blocks):
            xup, xdn = self.BRModule(xdn, BRM_x=i + 1, scale=scale)
            x1.append(xup)
            x2.append(xdn)

        # Add output of one BRM with output of its upper BRM then apply Conv2D
        for i in range(n_blocks - 1, 0, -1):
            x = Add(name='BRM{}_A_BRM{}'.format(str(i + 1), str(i)))([x1[i], x1[i - 1]])
            x1[i - 1] = Conv2D(filters=64, kernel_size=3, padding='same', activation=PReLU(),
                               kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                                  distribution="untruncated_normal"),
                               name='BRM{}_C'.format(str(i)))(x)

        # Concatenate all outputs of BRMs
        xup = x1[n_blocks - 1]
        for i in range(n_blocks - 2, -1, -1):
            xup = Concatenate(axis=-1, name='BRM{}_BRM{}_Co'.format(str(i + 2), str(i + 1)))([x1[i], xup])

        return xup

    def Reconstruct(self, input_data):
        """
        The last part of network to reconstruct the final image
        :param input_data: tf object
        :return: batch of super resolution images
        """
        # reconstruction layer
        x = Conv2D(filters=3, kernel_size=3, padding='same', activation=PReLU(),
                   kernel_initializer=VarianceScaling(scale=2.0, mode="fan_in",
                                                      distribution="untruncated_normal"),
                   name='Rec_C')(input_data)
        return x

    @staticmethod
    def normalize_01(img):
        """
        Normalise pixel values to the range of 0 to 1 (from 0 to 255)
        :param img: image array
        :return: normalised image array
        """
        return img / 255.0

    @staticmethod
    def denormalize_0255(img):
        """
        Denormalised pixel values to the range of 0 to 255
        :param img:
        :return:
        """
        return img * 255

    def create_model(self, number_of_blocks, scale_factor, LR_img_size, channel=3):
        """
        Compile the complete network as a keras model
        :param number_of_blocks: number of BRM units
        :param scale_factor: magnifying scale factor
        :param LR_img_size: size of input low res image normally 64
        :param channel: number of image channels, PNG image in RGB mode has 3 channels
        :return: keras model
        """
        input_LR = Input(shape=(LR_img_size, LR_img_size, channel), name='input_LR')
        x = Lambda(self.normalize_01)(input_LR)
        x = self.FeatureExt(x)
        x = self.EmbeddedBR(x, number_of_blocks, scale=scale_factor)
        x = self.Reconstruct(x)
        output_HR = Lambda(self.denormalize_0255, name='output_img')(x)

        model = Model(inputs=input_LR, outputs=output_HR, name='EBR_Net')

        if not self.fine_tuning:
            model.compile(optimizer=Adam(learning_rate=self.lr_schedule, epsilon=1e-08),
                          loss=mean_absolute_error,
                          metrics={'output_img': ['mse', 'accuracy']})
        else:
            model.compile(optimizer=Adam(learning_rate=self.lr_schedule, epsilon=1e-08),
                          loss=mean_squared_error,
                          metrics={'output_img': ['mae', 'accuracy']})

        return model

Experimental part:

 

 

Tags: neural networks Deep Learning

Posted on Mon, 04 Oct 2021 17:32:52 -0400 by ramrod737