Tip: Please praise and refuse white whoring
torchvision encapsulates a variety of image data enhancement operations, which is very convenient to call. Today, let's learn how to use the following transform.
1, Data enhancement
Data enhancement is a very effective method to improve the accuracy of the model. By making the model learn some features in the form of transposition and noise, the model can have better robustness.
However, it should be noted that it is not always suitable to arrange all transforms without brains. First, if the number of training sets is sufficient, but the features are simple enough, sometimes Transform is not required. In addition, the author believes that we should pay more attention to the data set in the test set, and make the training set sample close to the test set through Transform.
1. Example of foundation use
Define a transform, which should be at the same level as the defined function. The ToTensor method has mapped the value of the picture from 0-255 to 0-1. Don't divide it by 255.
from torchvision import transforms as transforms loadfold = transforms.Compose([ transforms.ToTensor(), ])
The following is written in the main function. Read the pictures from the folder by inheriting the dataset class. Loadfold is the transform defined above, and Y_train_orig is the label of the dataset, and root is the relative path of the dataset. Note MyData_loadfold is a file under util, and util is a folder at the same level as main, mydata_ The loadfold file is also provided below.
from util.myData_loadfold import MyData_loadfold from torch.utils.data import DataLoader train_dataset = MyData_loadfold(transform=loadfold, Y=Y_train_orig, root=imgtrainfold) train_loader = DataLoader(train_dataset, batch_size=1, shuffle=True)
import torch from torch.utils.data import DataLoader, Dataset from torchvision import transforms from PIL import Image import os import numpy as np import re class MyData_loadfold(Dataset): # Inherit Dataset def __init__(self, transform=None, Y=0, root='', state='Train', k=0): # __ init__ Are some basic parameters for initializing this class self.transform = transform # Transformation imgs = os.listdir(root) imgs.sort(key=lambda x: int(re.match('(\d+)\.jpg', x).group(1))) self.imgs = [os.path.join(root, i) for i in imgs] self.transforms = transform self.root = root self.state = state self.k = k self.Y = Y self.size = tuple([len(imgs),1]) def __len__(self): return len(self.imgs) def __getitem__(self, index): img_path = self.imgs[index] pil_img = Image.open(img_path) if self.transforms: data = self.transforms(pil_img) else: pil_img = np.asarray(pil_img) data = torch.from_numpy(pil_img) return data, self.Y[index]
In addition to the tree model, all data need to be standardized, and the standardization effect is better than the normalization effect. Although toTensor has converted the RGB value of the picture to 0-1, it has standardized it to (- 1, 1) is better. For standardization, we first need to find the mean and variance of the sample. For the picture, we need to find the mean and variance of its three channels. imagenet pre training network has given a mean and variance, but it does not conform to our training set data, so we need to find the mean and variance of the training set ourselves. * * note that the test set is also default It is standardized by using the standardization coefficient of the training set, which is a hypothetical premise of machine learning and deep learning. The test set conforms to the distribution of the training set. * * the following is the code for calculating the standardization coefficient of your own training set, in which the train_loader is defined in Section 1 above.
for i, data in enumerate(train_loader , 1): img, label = data labellist.append(label) imglist.append(img) traindata = torch.cat(imglist, dim=0) stdRGB = [0, 0, 0] avgRGB = [0, 0, 0] for i in range(3): avgRGB[i] = traindata[:, i, :, :].mean() stdRGB[i] = traindata[:, i, :, :].std()
3. Other Transform
This section introduces other transform s. Here we only introduce some common ones.
# Randomly zoom in and out and cut. First, randomly zoom in and out, and then cut according to imgsize*imgsize. # Each epoch is also randomly cropped. RRC = transforms.Compose([ transforms.RandomResizedCrop(imgsize), transforms.ToTensor(), ]) # Cut randomly, and cut according to imgsize*imgsize. # Each epoch is also randomly cropped. CC = transforms.Compose([ transforms.CenterCrop(imgsize), transforms.ToTensor(), ]) # Here is standardization. The first one is the Imagenet parameter and the second one is the author parameter. Don't use it yourself # Go to the second section and fill it in. Normal_transform = transforms.Compose([ # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),# Imagenet # transforms.Normalize(mean=[0.4772, 0.5595, 0.3851], std=[0.2871, 0.2960, 0.2952]) # all train ]) # Here is plus exposure. These parameters don't need to be changed too much. Anyway, it's useless for the author to change them # But it's useful to add exposure. CJ_transform = transforms.Compose([ ResizeImage(imgsize), transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5), transforms.ToTensor(), ]) # Here is the rotation. The angle is 30. Other parameters can be ignored RR_transform = transforms.Compose([ ResizeImage(imgsize), transforms.RandomRotation(30, resample=False, expand=False, fill=0), transforms.ToTensor(), ]) # Horizontal, p is the probability RHF_transform = transforms.Compose([ ResizeImage(imgsize), transforms.RandomHorizontalFlip(p=1), transforms.ToTensor(), ]) # Here is the meaning of reversing the picture directly from top to bottom. Vertical, p is the probability RVF_transform = transforms.Compose([ ResizeImage(imgsize), # transforms.RandomResizedCrop(imgsize), transforms.RandomVerticalFlip(p=1), transforms.ToTensor(), ]) # Here is to add Gaussian noise. A separate class is required, which will be provided later. from util import AddGaussianNoise GN_transform = transforms.Compose([ ResizeImage(imgsize), # transforms.RandomResizedCrop(imgsize), AddGaussianNoise.AddGaussianNoise(mean=1, variance=1, amplitude=10), transforms.ToTensor(), ])
import PIL.Image import numpy as np class AddGaussianNoise(object): def __init__(self, mean=0.0, variance=1.0, amplitude=1.0): self.mean = mean self.variance = variance self.amplitude = amplitude def __call__(self, img): img = np.array(img) h, w, c = img.shape N = self.amplitude * np.random.normal(loc=self.mean, scale=self.variance, size=(h, w, 1)) N = np.repeat(N, c, axis=2) img = N + img img[img > 255] = 255 # Avoid inversion when the value exceeds 255 img = PIL.Image.fromarray(img.astype('uint8')).convert('RGB') return img
The author and my classmates participated in some competitions of picture classification. They thought that it was OK to add transform directly to the training set, but the result accuracy was lower than that without. The author thought that it was more important to analyze the state of the pictures in the training set and the test set. In fact, most of the pictures in the test set were normal, including the training set No inversion, no noise, etc. if transform is directly added to the source samples in the training set, some normal source samples will be lost in each epoch of the model. Therefore, the author realizes the cumulative transform through list and other operations, and finally expands the 400 pictures to 3600 pictures, including 3 source samples, 2 random clipping, exposure, left-right inversion, top-down inversion and rotation Turn 30 degrees each. In this way, the training time of an epoch is slower, but the convergence is faster. The important thing is that the effect is better. Finally, what I want to say is that we can't apply transform for 10% and keep trying.