pytorch training classifier

data

Typically, when you have to work with image, text, audio, or video data, you can use the standard Python package that loads the data into the NumPy array. You can then convert the array to torch.*Tensor.

  • Packages such as pilot and OpenCV are very useful for images
  • For audio, use packages such as SciPy and librosa
  • For text, raw loading based on Python or python, or NLTK and SpaCy, is useful

Specifically for vision, we created a package called torchvision, which contains data loaders for common data sets (such as Imagenet, CIFAR10, MNIST, etc.) and data converters for images (i.e. torchvision.datasets and torch.utils.data.DataLoader).

This provides great convenience and avoids writing boilerplate code.

In this tutorial, we will use the CIFAR10 dataset. It has the following categories: "aircraft", "automobile", "bird", "cat", "deer", "dog", "frog", "horse", "ship", "truck". The image size in CIFAR-10 is 3x32x32, that is, a 3-channel color image with a size of 32x32 pixels.

 

 

 

Training image classifier

We will perform the following steps in order:

  1. Loading and standardizing CIFAR10 training and test data sets using torchvision
  2. Define convolutional neural network
  3. Define loss function
  4. Training network based on training data
  5. Test network on test data

1. Load and standardize CIFAR10

Using torchvision, it is very easy to load CIFAR10.

1 import torch
2 import torchvision
3 import torchvision.transforms as transforms

The output of the TorchVision dataset is a PILImage image in the range [0, 1]. We convert them into tensors of the normalized range [- 1, 1] be careful:

1 If running on Windows and you get a BrokenPipeError, try setting
2 the num_worker of torch.utils.data.DataLoader() to 0.
 1 transform = transforms.Compose(
 2     [transforms.ToTensor(),
 3      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
 4 
 5 trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
 6                                         download=True, transform=transform)
 7 trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
 8                                           shuffle=True, num_workers=2)
 9 
10 testset = torchvision.datasets.CIFAR10(root='./data', train=False,
11                                        download=True, transform=transform)
12 testloader = torch.utils.data.DataLoader(testset, batch_size=4,
13                                          shuffle=False, num_workers=2)
14 
15 classes = ('plane', 'car', 'bird', 'cat',
16            'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Out:

1 Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
2 Extracting ./data/cifar-10-python.tar.gz to ./data
3 Files already downloaded and verified

Let's show some training images. It's very interesting.

 1 import matplotlib.pyplot as plt
 2 import numpy as np
 3 
 4 # functions to show an image
 5 
 6 def imshow(img):
 7     img = img / 2 + 0.5     # unnormalize
 8     npimg = img.numpy()
 9     plt.imshow(np.transpose(npimg, (1, 2, 0)))
10     plt.show()
11 
12 # get some random training images
13 dataiter = iter(trainloader)
14 images, labels = dataiter.next()
15 
16 # show images
17 imshow(torchvision.utils.make_grid(images))
18 # print labels
19 print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

 

  Out:

1 dog truck  frog horse

2. Define convolutional neural network

The neural network was previously copied from the neural network section and then modified to obtain a 3-channel image (instead of the defined 1-channel image).

 1 import torch.nn as nn
 2 import torch.nn.functional as F
 3 
 4 class Net(nn.Module):
 5     def __init__(self):
 6         super(Net, self).__init__()
 7         self.conv1 = nn.Conv2d(3, 6, 5)
 8         self.pool = nn.MaxPool2d(2, 2)
 9         self.conv2 = nn.Conv2d(6, 16, 5)
10         self.fc1 = nn.Linear(16 * 5 * 5, 120)
11         self.fc2 = nn.Linear(120, 84)
12         self.fc3 = nn.Linear(84, 10)
13 
14     def forward(self, x):
15         x = self.pool(F.relu(self.conv1(x)))
16         x = self.pool(F.relu(self.conv2(x)))
17         x = x.view(-1, 16 * 5 * 5)
18         x = F.relu(self.fc1(x))
19         x = F.relu(self.fc2(x))
20         x = self.fc3(x)
21         return x
22 
23 net = Net()

3. Define loss function and optimizer

Let's use the classification of cross entropy loss and SGD with momentum.

1 import torch.optim as optim
2 
3 criterion = nn.CrossEntropyLoss()
4 optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Training network

This is when things start to get interesting. We just need to traverse the data iterator, then feed the input to the network and optimize it.

 1 for epoch in range(2):  # loop over the dataset multiple times
 2 
 3     running_loss = 0.0
 4     for i, data in enumerate(trainloader, 0):
 5         # get the inputs; data is a list of [inputs, labels]
 6         inputs, labels = data
 7 
 8         # zero the parameter gradients
 9         optimizer.zero_grad()
10 
11         # forward + backward + optimize
12         outputs = net(inputs)
13         loss = criterion(outputs, labels)
14         loss.backward()
15         optimizer.step()
16 
17         # print statistics
18         running_loss += loss.item()
19         if i % 2000 == 1999:    # print every 2000 mini-batches
20             print('[%d, %5d] loss: %.3f' %
21                   (epoch + 1, i + 1, running_loss / 2000))
22             running_loss = 0.0
23 
24 print('Finished Training')

Out:

 1 [1,  2000] loss: 2.196
 2 [1,  4000] loss: 1.849
 3 [1,  6000] loss: 1.671
 4 [1,  8000] loss: 1.589
 5 [1, 10000] loss: 1.547
 6 [1, 12000] loss: 1.462
 7 [2,  2000] loss: 1.382
 8 [2,  4000] loss: 1.389
 9 [2,  6000] loss: 1.369
10 [2,  8000] loss: 1.332
11 [2, 10000] loss: 1.304
12 [2, 12000] loss: 1.288
13 Finished Training

Let's quickly save our trained model:

1 PATH = './cifar_net.pth'
2 torch.save(net.state_dict(), PATH)

5. Test the network according to the test data

We have trained the network twice in the training data set. But we need to check whether the network has learned anything.

We will check by predicting the category label output by the neural network and checking it according to the actual situation. If the prediction is correct, add the sample to the correct prediction list.

OK, step one. Let's show the images in the test set to familiarize them.

1 dataiter = iter(testloader)
2 images, labels = dataiter.next()
3 
4 # print images
5 imshow(torchvision.utils.make_grid(images))
6 print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

 

  Out:

1 GroundTruth:    cat  ship  ship plane

Next, let's reload the saved model (Note: there is no need to save and reload the model here, just to illustrate how to do this):

1 net = Net()
2 net.load_state_dict(torch.load(PATH))

OK, now let's take a look at what neural networks think of the above examples:

1 outputs = net(images)

The output is 10 types of energy. The higher the energy of a category, the network considers the image to belong to a specific category. So let's get the index of the highest energy:

1 _, predicted = torch.max(outputs, 1)
2 
3 print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
4                               for j in range(4)))

Out:

1 Predicted:    cat  ship  ship plane

The results seem to be good.

Let's look at the performance of the network on the whole data set.

 1 correct = 0
 2 total = 0
 3 with torch.no_grad():
 4     for data in testloader:
 5         images, labels = data
 6         outputs = net(images)
 7         _, predicted = torch.max(outputs.data, 1)
 8         total += labels.size(0)
 9         correct += (predicted == labels).sum().item()
10 
11 print('Accuracy of the network on the 10000 test images: %d %%' % (
12     100 * correct / total))

Out:

1 Accuracy of the network on the 10000 test images: 53 %

It looks better than chance, with an accuracy of 10% (choose a class randomly from 10 classes). It seems that the Internet has learned something.

Well, which classes perform well and which classes perform poorly:

 1 class_correct = list(0. for i in range(10))
 2 class_total = list(0. for i in range(10))
 3 with torch.no_grad():
 4     for data in testloader:
 5         images, labels = data
 6         outputs = net(images)
 7         _, predicted = torch.max(outputs, 1)
 8         c = (predicted == labels).squeeze()
 9         for i in range(4):
10             label = labels[i]
11             class_correct[label] += c[i].item()
12             class_total[label] += 1
13 
14 for i in range(10):
15     print('Accuracy of %5s : %2d %%' % (
16         classes[i], 100 * class_correct[i] / class_total[i]))

Out:

 1 Accuracy of plane : 50 %
 2 Accuracy of   car : 62 %
 3 Accuracy of  bird : 51 %
 4 Accuracy of   cat : 32 %
 5 Accuracy of  deer : 31 %
 6 Accuracy of   dog : 35 %
 7 Accuracy of  frog : 77 %
 8 Accuracy of horse : 70 %
 9 Accuracy of  ship : 71 %
10 Accuracy of truck : 52 %

OK, what's the next step?

How do we run these neural networks on GPU?

Training on GPU

Just as you transfer the tensor to the GPU, you also transfer the neural network to the GPU.

If cuda can be used, first define our device as the first visible cuda device:

1 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
2 
3 # Assuming that we are on a CUDA machine, this should print a CUDA device:
4 
5 print(device)

Out:

1 cuda:0

The rest of this section assumes that the device is a CUDA device.

These methods then recursively traverse all modules and convert their parameters and buffers into CUDA tensors:

1 net.to(device)

Remember, you must also send the input and target of each step to the GPU:

1 inputs, labels = data[0].to(device), data[1].to(device)

 

Posted on Tue, 23 Nov 2021 03:48:27 -0500 by mwalsh