torchvision calls all kinds of transform source code to share Gauss_ Standardization, etc

Tip: Please praise and refuse white whoring


torchvision encapsulates a variety of image data enhancement operations, which is very convenient to call. Today, let's learn how to use the following transform.

1, Data enhancement

Data enhancement is a very effective method to improve the accuracy of the model. By making the model learn some features in the form of transposition and noise, the model can have better robustness.
However, it should be noted that it is not always suitable to arrange all transforms without brains. First, if the number of training sets is sufficient, but the features are simple enough, sometimes Transform is not required. In addition, the author believes that we should pay more attention to the data set in the test set, and make the training set sample close to the test set through Transform.

2, Transform

1. Example of foundation use

Define a transform, which should be at the same level as the defined function. The ToTensor method has mapped the value of the picture from 0-255 to 0-1. Don't divide it by 255.

from torchvision import transforms as transforms

loadfold = transforms.Compose([

The following is written in the main function. Read the pictures from the folder by inheriting the dataset class. Loadfold is the transform defined above, and Y_train_orig is the label of the dataset, and root is the relative path of the dataset. Note MyData_loadfold is a file under util, and util is a folder at the same level as main, mydata_ The loadfold file is also provided below.

from util.myData_loadfold import MyData_loadfold
from import DataLoader

train_dataset = MyData_loadfold(transform=loadfold, Y=Y_train_orig, root=imgtrainfold)
    train_loader = DataLoader(train_dataset, batch_size=1, shuffle=True)

import torch
from import DataLoader, Dataset
from torchvision import transforms
from PIL import Image
import os
import numpy as np
import re

class MyData_loadfold(Dataset):  # Inherit Dataset
    def __init__(self, transform=None, Y=0, root='', state='Train', k=0):  # __ init__ Are some basic parameters for initializing this class
        self.transform = transform  # Transformation
        imgs = os.listdir(root)
        imgs.sort(key=lambda x: int(re.match('(\d+)\.jpg', x).group(1)))
        self.imgs = [os.path.join(root, i) for i in imgs]
        self.transforms = transform
        self.root = root
        self.state = state
        self.k = k
        self.Y = Y
        self.size = tuple([len(imgs),1])

    def __len__(self):
        return len(self.imgs)

    def __getitem__(self, index):
        img_path = self.imgs[index]
        pil_img =
        if self.transforms:
            data = self.transforms(pil_img)
            pil_img = np.asarray(pil_img)
            data = torch.from_numpy(pil_img)
        return data, self.Y[index]

2. Standardization

In addition to the tree model, all data need to be standardized, and the standardization effect is better than the normalization effect. Although toTensor has converted the RGB value of the picture to 0-1, it has standardized it to (- 1, 1) is better. For standardization, we first need to find the mean and variance of the sample. For the picture, we need to find the mean and variance of its three channels. imagenet pre training network has given a mean and variance, but it does not conform to our training set data, so we need to find the mean and variance of the training set ourselves. * * note that the test set is also default It is standardized by using the standardization coefficient of the training set, which is a hypothetical premise of machine learning and deep learning. The test set conforms to the distribution of the training set. * * the following is the code for calculating the standardization coefficient of your own training set, in which the train_loader is defined in Section 1 above.

    for i, data in enumerate(train_loader , 1):
        img, label = data
    traindata =, dim=0)
    stdRGB = [0, 0, 0]
    avgRGB = [0, 0, 0]
    for i in range(3):
        avgRGB[i] = traindata[:, i, :, :].mean()
        stdRGB[i] = traindata[:, i, :, :].std()

3. Other Transform

This section introduces other transform s. Here we only introduce some common ones.

# Randomly zoom in and out and cut. First, randomly zoom in and out, and then cut according to imgsize*imgsize.
# Each epoch is also randomly cropped.
RRC = transforms.Compose([
# Cut randomly, and cut according to imgsize*imgsize.
# Each epoch is also randomly cropped.
CC = transforms.Compose([

# Here is standardization. The first one is the Imagenet parameter and the second one is the author parameter. Don't use it yourself
# Go to the second section and fill it in.
Normal_transform = transforms.Compose([
    # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),# Imagenet
    # transforms.Normalize(mean=[0.4772, 0.5595, 0.3851], std=[0.2871, 0.2960, 0.2952])  # all train

# Here is plus exposure. These parameters don't need to be changed too much. Anyway, it's useless for the author to change them
# But it's useful to add exposure.
CJ_transform = transforms.Compose([
    transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5),
# Here is the rotation. The angle is 30. Other parameters can be ignored
RR_transform = transforms.Compose([
    transforms.RandomRotation(30, resample=False, expand=False, fill=0),

# Horizontal, p is the probability
RHF_transform = transforms.Compose([
# Here is the meaning of reversing the picture directly from top to bottom. Vertical, p is the probability
RVF_transform = transforms.Compose([
    # transforms.RandomResizedCrop(imgsize),
# Here is to add Gaussian noise. A separate class is required, which will be provided later.
from util import AddGaussianNoise
GN_transform = transforms.Compose([
    # transforms.RandomResizedCrop(imgsize),
    AddGaussianNoise.AddGaussianNoise(mean=1, variance=1, amplitude=10),

import PIL.Image
import numpy as np

class AddGaussianNoise(object):

    def __init__(self, mean=0.0, variance=1.0, amplitude=1.0):

        self.mean = mean
        self.variance = variance
        self.amplitude = amplitude

    def __call__(self, img):
        img = np.array(img)
        h, w, c = img.shape
        N = self.amplitude * np.random.normal(loc=self.mean, scale=self.variance, size=(h, w, 1))
        N = np.repeat(N, c, axis=2)
        img = N + img
        img[img > 255] = 255                       # Avoid inversion when the value exceeds 255
        img = PIL.Image.fromarray(img.astype('uint8')).convert('RGB')
        return img


The author and my classmates participated in some competitions of picture classification. They thought that it was OK to add transform directly to the training set, but the result accuracy was lower than that without. The author thought that it was more important to analyze the state of the pictures in the training set and the test set. In fact, most of the pictures in the test set were normal, including the training set No inversion, no noise, etc. if transform is directly added to the source samples in the training set, some normal source samples will be lost in each epoch of the model. Therefore, the author realizes the cumulative transform through list and other operations, and finally expands the 400 pictures to 3600 pictures, including 3 source samples, 2 random clipping, exposure, left-right inversion, top-down inversion and rotation Turn 30 degrees each. In this way, the training time of an epoch is slower, but the convergence is faster. The important thing is that the effect is better. Finally, what I want to say is that we can't apply transform for 10% and keep trying.

Tags: neural networks Pytorch Deep Learning

Posted on Mon, 27 Sep 2021 23:25:58 -0400 by Digitry Designs