Image augmentation in deep learning

1 over fitting and its solution

The information that a model can provide in machine learning generally comes from two aspects:

  • Information implied in training data;
  • Prior information provided during model training.

When the training data is insufficient, the implicit information from the training data is insufficient, so more prior information is needed to ensure the effect of the model. Prior information can act on the model, such as making the model adopt specific internal structure, conditional assumption or adding some constraints. Prior information can also be used in data sets, that is, to adjust, transform or expand training data according to specific prior assumptions, so that it can show more useful information, so as to facilitate the training or learning of subsequent models.

In the task of image classification, the lack of training data is easy to cause model over fitting. Over fitting shows that the model is very good for the classification of training data, but not good for the classification of test samples.

There are two ways to solve the over fitting problem, the first is based on the model method, and the second is to expand the data to expand the scale of training data set.

Model based approach:

  • Simplify the model and reduce the expression ability of the model;
  • Add regularization terms, such as L2 regularization term to get a relatively uniform small value, L1 regularization term to get a relatively sparse solution;
  • Use dropout;
  • Integrated learning.

Data based approach, data augmentation:

  • The geometric transformation of an image, such as random rotation, scaling, translation, clipping, and mirroring, is equivalent to observing the results of the same object from different angles;
  • Add noise, such as salt and pepper noise, Gaussian white noise;
  • PCA Jittering is used to transform the color of the image;
  • Change the brightness, contrast, saturation and sharpness of the image;
  • The simulated training image is obtained by generating learning.

2 image enlargement

2.1 geometric transformation

This part of the code is implemented with pytorch. For more functions, please refer to

2.1.1 random rotation

    Input ='timg.jpg')

    rotater = torchvision.transforms.RandomRotation(90)
    Image_rotate = rotater(Input)"rotater")

2.1.2 random cutting

    Input ='timg.jpg')
    cropper = torchvision.transforms.RandomCrop((200,200))
    Image_crop = cropper(Input)

2.1.3 random scaling

    Input ='timg.jpg')
    new_size = [len*2 for len in Input.size]
    resizer = torchvision.transforms.Resize(tuple(new_size))
    Image_reize = resizer(Input)

2.1.4 random turning

    Input ='timg.jpg')
    fliper = torchvision.transforms.RandomHorizontalFlip()
    Image_flip = fliper(Input)

2.2 add noise

	from skimage import util
	#add noise
    InputImage = cv2.imread("timg.jpg")
    noiserImage = util.random_noise(InputImage,'salt',None,False)

random_ The noise function supports the following types of noise:

		- 'gaussian'  Gaussian-distributed additive noise.
        - 'localvar'  Gaussian-distributed additive noise, with specified
                      local variance at each point of `image`.
        - 'poisson'   Poisson-distributed noise generated from the data.
        - 'salt'      Replaces random pixels with 1.
        - 'pepper'    Replaces random pixels with 0 (for unsigned images) or
                      -1 (for signed images).
        - 's&p'       Replaces random pixels with either 1 or `low_val`, where
                      `low_val` is 0 for unsigned images or -1 for signed
        - 'speckle'   Multiplicative noise using out = image + n*image, where
                      n is uniform noise with specified mean & variance.

2.3 PCA Jittering

Implementation principle: PCA Jittering is to add a value to RGB channel of image for color transformation. The added value comes from PCA processing of image pixel value. The realization process is that each pixel of the image is regarded as a sample, and each channel is regarded as a one-dimensional feature. The covariance matrix of 3 * 3 is calculated, and the eigenvalues of the matrix are calculated as λ 1, λ 2, λ 3\lambda_1,\lambda_2,\lambda_3 λ 1, λ 2, λ 3 and eigenvectors p1,p2,p3p_1,p_2,p_3p1,p2,p3, and then add the increment [p1,p2,p3] * [α 1 λ 1, α 2 λ 2, α 3 λ 3] T [P] to the three channels of each pixel_ 1,p_ 2,p_ 3]*[\alpha_ 1\lambda_ 1,\alpha_ 2\lambda_ 2,\alpha_ 3\lambda_ 3] ^ T [p1,p2,p3] * [α 1 λ 1, α 2 λ 2, α 3 λ 3] T, where α 1, α 12, α 3\alpha_1,\alpha_12,\alpha_3 α 1, α 1 2, α 3 are random values of Gaussian distribution with mean value of 0 and small variance.

Code reference from:

import os
import numpy as np
import matplotlib.pyplot as plt
import random

def PCA_Jittering(path):  
    img_list = os.listdir(path)  
    img_num = len(img_list)  

    for i in range(img_num):  
        if not img_list[i].endswith('.jpg'):

        img_path = os.path.join(path, img_list[i])  
        #img =  
        img = cv2.imread(img_path)
        #img = np.asanyarray(img, dtype = 'float32')  

        img = img / 255.0  
        img_size = int(img.size / 3 )
        img1 = img.reshape(img_size, 3)  
        img1 = np.transpose(img1)  
        img_cov = np.cov([img1[0], img1[1], img1[2]])  
        lamda, p = np.linalg.eig(img_cov)  #Calculating matrix eigenvector

        p = np.transpose(p)  

        alpha1 = random.normalvariate(0,1)  #Generating random number of normal distribution
        alpha2 = random.normalvariate(0,0.5)  
        alpha3 = random.normalvariate(0,2)  

        v = np.transpose((alpha1*lamda[0], alpha2*lamda[1], alpha3*lamda[2])) #Add disturbance  
        add_num =,v)  

        img2 = np.array([img[:,:,0]+add_num[0], img[:,:,1]+add_num[1], img[:,:,2]+add_num[2]])  

        img2 = np.swapaxes(img2,0,2)  
        img2 = np.swapaxes(img2,0,1)  
        save_name = 'pre'+str(i)+'.png'  
        save_path = os.path.join(path, save_name)  
#         misc.imsave(save_path,img2)  

2.4 change the brightness, contrast, sharpness and saturation of the image

Adjust saturation:

	from PIL import Image,ImageEnhance
   	Input ='timg.jpg')

    random_factor = np.random.randint(0, 31) / 10.  # Random factor
    color_image = ImageEnhance.Color(Input).enhance(random_factor)  # Adjust image saturation

Adjust brightness:

	from PIL import Image,ImageEnhance
   	Input ='timg.jpg')

    random_factor = np.random.randint(10, 21) / 10.  # Random factor
    brightness_image = ImageEnhance.Brightness(Input).enhance(random_factor)  # Adjust the brightness of the image

Adjust contrast:

	from PIL import Image,ImageEnhance
   	Input ='timg.jpg')

    random_factor = np.random.randint(10, 21) / 10.  # Random factor
    contrast_image = ImageEnhance.Contrast(Input).enhance(random_factor)  # Adjust image contrast

Adjust sharpness:

	from PIL import Image,ImageEnhance
   	Input ='timg.jpg')

    random_factor = np.random.randint(0, 31) / 10.  # Random factor
    sharp_image = ImageEnhance.Sharpness(Input).enhance(random_factor)  # Adjust image sharpness

Posted on Fri, 26 Jun 2020 21:34:05 -0400 by Caesar