About torch.nn
_Use Pytorch to build neural networks, the main tools are in the torch.nn package
nn relies on autograd to define the model and derive it automatically
Typical processes for building neural networks
Define a neural network with learnable parameters
Traversing training datasets
Process input data to flow through a neural network
Calculating the value of loss
Reverse Propagation of Gradients of Network Parameters
Update the weights of the network with certain rules
First, define a Pytorch-implemented neural network
# Import several Toolkits import torch import torch.nn as nn import torch.nn.functional as F # Define a simple network class class Net(nn.Module): # An initialization function def __init__(self): super(Net, self).__init__() # Define the first layer of convolution neural network, input channel dimension=1, output channel dimension=6, convolution and size 3*3 self.conv1 = nn.Conv2d(1, 6, 3) # Define the second layer convolution neural network with input channel dimension = 6, output channel dimension = 16, convolution and size 3*3 self.conv2 = nn.Conv2d(6, 16, 3) # Define a three-tier fully connected network self.fc1 = nn.Linear(16 * 6 * 6, 120) self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # Perform maximum pooling under (2,2) pooling window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x # Flattening Dimensions def num_flat_features(self, x): # Calculate size, except batch_on dimension 0 Size size = x.size()[1:] num_features = 1 for s in size: num_features *= s return num_features net = Net() print(net)
Output results
Net( (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1)) (fc1): Linear(in_features=576, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) )
Attention
· All trainable parameters in the model can be obtained by net.parameters()
params = list(net.parameters())#Encapsulate with list print(len(params)) print(params[0].size())
Output results
10 torch.Size([6, 1, 3, 3])
/ Assume that the input size of the image is 32*32:
input = torch.randn(1, 1, 32, 32) out = net(input) print(out)
Output results
tensor([[ 0.1065, 0.0852, 0.0484, 0.0806, -0.0398, -0.0307, -0.1036, -0.0510, -0.1005, 0.0150]], grad_fn=<AddmmBackward>)
With the output tensor, you can perform gradient zero and reverse propagation operations.
net.zero_grad() out.backward(torch.randn(1, 10))
Attention
The neural network constructed by. torch.nn only supports the input of mini-batches and does not support the input of a single sample.
For example, nn.Conv2d requires a 4D Tensor in the shape of (nSamples,nChannels,Height,Width). If your input is in a single sample form, you need to execute input.unsqueeze(0) to actively extend the 3D Tensor to a 4D Tensor.
loss function
The input to the loss function is a pair of inputs: (output,target), and a numerical value is calculated to evaluate the difference between output and target.
There are several different loss functions available in torch.nn, such as nn.MSELoss, which evaluates the difference between the input and the target values by calculating the mean variance loss.
An example of using nn.MESLoss to calculate losses:
output = net(input) target = torch.randn(10) # Change the shape of the target to a two-dimensional tensor to match output target = target.view(1, -1) criterion = nn.MSELoss() loss = criterion(output, target) print(loss)
Output results
tensor(0.8401, grad_fn=<MseLossBackward>)
_Chains about Directional Propagation: If we track the direction of loss reverse propagation, use grad_ If you print the FN attribute, you will see a complete calculation as follows:
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d -> view -> linear -> relu -> linear -> relu -> linear -> MSELoss -> loss
When loss.backward() is called, the entire calculation graph automatically derives loss, and Tensors with all the attributes required-grad=True participate in the gradient derivation operation and add the gradient to the.Grad attribute in Tensors.
print(loss.grad_fn)# MSELoss print(loss.grad_fn.next_functions[0][0])# Linear print(loss.grad_fn.next_functions[0][0].next_functions[0][0])# ReLU
Output results
<MseLossBackward object at 0x000001BDFF5C7E80> <AddmmBackward object at 0x000001BDFFB42710> <AccumulateGrad object at 0x000001BDFFB42710>
backpropagation
/ Reverse propagation in Pytorch is very easy, the whole operation is loss.backward().
Before performing reverse propagation, the gradient must be cleared, otherwise the gradient will be accumulated between different batches of data.
. Perform a small example of reverse propagation
#Code to perform gradient zeroing in Pytorch net.zero_grad() print('conv1.bias.grad before backward') print(net.conv1.bias.grad) #Code to perform reverse propagation in Pytorch loss.backward() print('conv1.bias.grad after backward') print(net.conv1.bias.grad)
Output results
conv1.bias.grad before backward tensor([0., 0., 0., 0., 0., 0.]) conv1.bias.grad after backward tensor([-0.0007, 0.0024, 0.0136, 0.0216, 0.0032, 0.0132])
Update network parameters
The easiest algorithm to update parameters is SGD (random gradient descent)
. Specific algorithm formula expression is: weight = weight - learning_rate * gradient
First, SGD is implemented using traditional Python code as follows:
learning_rate = 0.01 for f in net.parameters(): f.data.sub_(f.grad.data * learning_rate)
* Then use the standard code officially recommended by Pytorch as follows:
# First import the optimizer package, optim contains several commonly used optimization algorithms, such as SGD, Adam, etc. import torch.optim as optim # Creating optimizer objects through opotim optimizer = optim.SGD(net.parameters(), lr=0.01) # Perform gradient zeroing on the optimizer optimizer.zero_grad() output = net(input) loss = criterion(output, target) # Reverse propagation of loss values loss.backward() # Updates to parameters are performed through a standard line of code optimizer.step()
summary
A typical process for building a neural network
· Define a Neural Network with Learning Parameters
· Traversing datasets
· Processing input data to flow through a neural network
· Calculate loss value
· Reverse Propagation of Gradients of Network Parameters
· Update the weights of the network with certain rules
Definition of loss function
· Calculating mean square error using torch.nn.MSELoss()
· When backward propagation calculations are performed with loss.backward(), the entire calculation diagram automatically derives loss, with all attributes required_ Tensors with grad=True will participate in the gradient derivation and will add the gradient to the.Grad attribute in Tensors.
The calculation method for reverse propagation
· Reverse propagation in Pytorch is very easy, all operation is loss.backward().
· Before performing a reverse operation, the gradient must be cleared, otherwise batch data with different gradients will be accumulated
·net.zero_grad()
·loss.backward()
Update methods for parameters
· Define optimizers to perform parameter optimization and update
·optimizer = optim.SGD(net.parameters(), lr=0.01)
· Perform specific parameter updates through the optimizer.
·optimizer.step()