Summary of problems in learning pytorch framework

pytorch framework learning

1: Linear region, Logistic Regression, Softmax Classifier

1. Model inheritance and construction

import torch
from torch.autograd import Variable
 
# data define(3*1)
x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))
y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))
 
# model class
class Model(torch.nn.Module):
    def __init__(self):      # Define construction method
        """       In constructor       instantiation 
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # one in and one out  
 
    def forward(self, x):
        """
        In the forward function we accept a Variable of input data
        and we must return a Variable of output data. we can use modules
        defined in the constructor as well as arbitrary operator on Variable.
        """
        y_pred = self.linear(x)
        return y_pred
 
# our model
model = Model()
 
# construct our loss function and an optimizer.
# The call to model.parameters() in the SGD constructor will contain the learnable
# parameters of the two nn.Linear modules which are members of the model.
criterion = torch.nn.MSELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
 
# training loop
for epoch in range(500):
    # forward pass: compute predicted y by passing x to the model
    y_pred = model(x_data)
 
    # compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.item())
 
    # zero gradients, perform a backward pass, and update the weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
 
# after training -- test
hour_val = Variable(torch.Tensor([[4.0]]))
print("predict (after training)", 4, model.forward(hour_val).data[0][0])

2: Model learning nn.module

1. Model establishment nn.module


1.1nn.Module

• parameters: storage management nn.Parameter class
• modules: storage management nn.Module class
• buffers: storage management buffer attributes, such as running in BN layer_ mean
• ***_ hooks: storage management hook function

1.2.nn.Module summary

• a module can contain multiple sub modules
• a module is equivalent to an operation, and the forward() function must be implemented
• each module has 8 dictionaries to manage its properties

2. Model Containers

2.1 nn.Sequential

nn.Sequential is the container of nn.module, which is used to wrap a set of network layers in order
• sequencing: each network layer is constructed in strict order
• built in forward(): in the built-in forward, forward propagation operations are performed successively through the for loop

2.2 nn.ModuleList

nn.ModuleList is the container of nn.module, which is used to wrap a set of network layers and call the network layer iteratively
Main methods:
• append(): add a network layer after the ModuleList
• extend(): splice two modulelists
• insert(): specify the location in the ModuleList to insert the network layer

2.3 nn.ModuleDict

nn.ModuleDict is the container of nn.module, which is used to wrap a set of network layers and call the network layer by index
Main methods:
• clear(): clear ModuleDict
• items(): returns key value pairs that can be iterated
• keys(): the key that returns the dictionary
• values(): returns the value of the dictionary
• pop(): returns a pair of key values and deletes them from the dictionary

2.4 vessel summary

Summary of Containers
• nn.Sequential: sequential. Each network layer is executed in strict order. It is often used for block construction
• nn.ModuleList: iterative. It is often used for the construction of a large number of duplicate nets, and the repeated construction is realized through the for loop
• nn.ModuleDict: indexability, commonly used for optional network layers

3: Weight initialization, loss function, optimizer

1. Weight initialization
1.1 gradient disappearance and explosion
1.2 Xavier initialization

Variance consistency: keep the data scale in an appropriate range, usually the variance is 1. Activation function: saturation function, such as Sigmoid and Tanh

# The activation function is Tanh
a = np.sqrt(6 / (self.neural_num + self.neural_num))
tanh_gain = nn.init.calculate_gain('tanh')
a *= tanh_gain
nn.init.uniform_(m.weight.data, -a, a)
##########################################
 nn.init.xavier_uniform_(m.weight.data, gain=tanh_gain)   # The xavier initialization method provided by pytorch is the same as the result of the manual calculation implementation above
1.3 Kaiming initialization

Variance consistency: keep the data scale in an appropriate range, usually the variance is 1. Activation function: ReLU and its variants

# nn.init.normal_(m.weight.data, std=np.sqrt(2 / self.neural_num))
nn.init.kaiming_normal_(m.weight.data)  # The provided by pytorch is equivalent to the above statement
1.2 nn.init.calculate_gain

nn.init.calculate_gain(nonlinearity, param=None)
Main function: calculate the variance change scale of activation function
Main parameters:
• nonlinearity: name of the activation function
• param: parameters of the activation function, such as the negative of Leaky ReLU_ slop

x = torch.randn(10000)
    out = torch.tanh(x)

    gain = x.std() / out.std()
    print('gain:{}'.format(gain))

    tanh_gain = nn.init.calculate_gain('tanh')
    print('tanh_gain in PyTorch:', tanh_gain)
2. Loss function
1,nn.CrossEntropyLoss


Function: nn.LogSoftmax() and nn.NLLLoss() are combined to calculate cross entropy
Main parameters:
• weight: set the weight of loss of each category
• ignore _index: ignore a category
• reduction: calculation mode, which can be none/sum/mean
none element by element calculation
Sum - sum all elements and return scalar
mean weighted average, return scalar

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# fake data
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float)
target = torch.tensor([0, 1, 1], dtype=torch.long)

# ----------------------------------- CrossEntropy loss: reduction -----------------------------------
flag = 0
# flag = 1
if flag:
    # def loss function
    loss_f_none = nn.CrossEntropyLoss(weight=None, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=None, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=None, reduction='mean')  # By default

    # forward
    loss_none = loss_f_none(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("Cross Entropy Loss:\n ", loss_none, loss_sum, loss_mean)

# --------------------------------- compute by hand
flag = 0
# flag = 1
if flag:

    idx = 0

    input_1 = inputs.detach().numpy()[idx]      # [1, 2]
    target_1 = target.numpy()[idx]              # [0]

    # First item
    x_class = input_1[target_1]

    # Item 2
    sigma_exp_x = np.sum(list(map(np.exp, input_1)))
    log_sigma_exp_x = np.log(sigma_exp_x)

    # Output loss
    loss_1 = -x_class + log_sigma_exp_x

    print("First sample loss by: ", loss_1)
# ----------------------------------- weight -----------------------------------
flag = 0
# flag = 1
if flag:
    # def loss function
    weights = torch.tensor([1, 2], dtype=torch.float)
    # weights = torch.tensor([0.7, 0.3], dtype=torch.float)

    loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none')
    loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum')
    loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean')

    # forward
    loss_none_w = loss_f_none_w(inputs, target)
    loss_sum = loss_f_sum(inputs, target)
    loss_mean = loss_f_mean(inputs, target)

    # view
    print("\nweights: ", weights)
    print(loss_none_w, loss_sum, loss_mean)

D:\Anaconda3\envs\pytorch\python.exe D:/PythonProject/Eye of depth pytorch/04-02-code-loss function (one)/lesson-15/loss_function_1.py
Cross Entropy Loss:
  tensor([1.3133, 0.1269, 0.1269]) tensor(1.5671) tensor(0.5224)

weights:  tensor([1., 2.])
tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

Process finished with exit code 0
3. optimizer
3.1 concept

pytorch optimizer: manages and updates the values of learnable parameters in the model to make the model output closer to the real label

class Optimizer(object):
    def __init__(self, params, defaults):
        self.defaults = defaults
        self.state = defaultdict(dict)
        self.param_groups = []
        ......
        param_groups = [{'params': param_groups}]

Basic properties
• defaults: optimizer super parameters (learning rate, etc.)
• state: cache of parameters, such as momentum
• params_groups: managed parameter groups
• _ step_count: records the number of updates, which is used in learning rate adjustment

Basic method
• zero_grad(): clear the gradient of managed parameters (pytorch property: tensor gradient is not automatically cleared)
• step(): perform a one-step update
• add_param_group(): add parameter group
• state_dict(): get the current state information dictionary of the optimizer
• load_state_dict(): load status information dictionary

# -*- coding: utf-8 -*-

import os
import torch
import torch.optim as optim
from nn Network layer and volume layer n.code.tools.common_tools import set_seed

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
set_seed(1)  # Set random seed

weight = torch.randn((2, 2), requires_grad=True)
weight.grad = torch.ones((2, 2))

optimizer = optim.SGD([weight], lr=0.1)

# ----------------------------------- step -----------------------------------
flag = 0
# flag = 1
if flag:
    print("weight before step:{}".format(weight.data))
    optimizer.step()        # Modify the observation of lr=1 0.1
    print("weight after step:{}".format(weight.data))


# ----------------------------------- zero_grad -----------------------------------
flag = 0
# flag = 1
if flag:

    print("weight before step:{}".format(weight.data))
    optimizer.step()        # Modify the observation of lr=1 0.1
    print("weight after step:{}".format(weight.data))

    print("weight in optimizer:{}\nweight in weight:{}\n".format(id(optimizer.param_groups[0]['params'][0]), id(weight)))

    print("weight.grad is {}\n".format(weight.grad))
    optimizer.zero_grad()
    print("after optimizer.zero_grad(), weight.grad is\n{}".format(weight.grad))


# ----------------------------------- add_param_group -----------------------------------
# flag = 0
flag = 1
if flag:
    print("optimizer.param_groups is\n{}".format(optimizer.param_groups))

    w2 = torch.randn((3, 3), requires_grad=True)

    optimizer.add_param_group({"params": w2, 'lr': 0.0001})

    print("optimizer.param_groups is\n{}".format(optimizer.param_groups))

# ----------------------------------- state_dict -----------------------------------
flag = 0
# flag = 1
if flag:

    optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
    opt_state_dict = optimizer.state_dict()

    print("state_dict before step:\n", opt_state_dict)

    for i in range(10):
        optimizer.step()

    print("state_dict after step:\n", optimizer.state_dict())

    torch.save(optimizer.state_dict(), os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))

# -----------------------------------load state_dict -----------------------------------
flag = 0
# flag = 1
if flag:

    optimizer = optim.SGD([weight], lr=0.1, momentum=0.9)
    state_dict = torch.load(os.path.join(BASE_DIR, "optimizer_state_dict.pkl"))

    print("state_dict before load state:\n", optimizer.state_dict())
    optimizer.load_state_dict(state_dict)
    print("state_dict after load state:\n", optimizer.state_dict())
tensorboard usage

There is no runs file in the folder: tensorboard --logdir=./runs

There are runs files under the folder: tensorboard --logdir =/

Tags: Python Pytorch Deep Learning

Posted on Mon, 20 Sep 2021 10:32:12 -0400 by ozman26