DataWhale team learning punch

Preface

Record the second time of punch in team learning of "manual learning deep learning"

Daily attendance

Linear regression code implementation (based on Python)

Theoretical review

Part of linear regression theory can be referred to Last blog

Realization of linear regression model from zero

With the help of jupyter running code, it is convenient to clearly show the output of each link.
1. Import basic module
In [ ]:

# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
from mpl_toolkits import mplot3d as p3d
import numpy as np
import random

print(torch.__version__)

2. Generate data set
Use linear model to generate data set, and generate a data set of 1000 samples. The following is the linear relationship used to generate data:
price=ware⋅area+wage⋅age+bprice=w_{are} \cdot area+w_{age} \cdot age + bprice=ware​⋅area+wage​⋅age+b
In [ ]:

# set input feature number 
num_inputs = 2
# set example number
num_examples = 1000

# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs,
                      dtype=torch.float32)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),
                       dtype=torch.float32)

The randomly generated features here are a tensor of 1000 * 21000 * 21000 * 2, the first column represents the area of the attribute, the second column represents the age of the attribute, and the pricetrice of the corresponding formula of labels is a tensor of 1000 * 11000 * 11000 * 1. At the end of the code, based on the labels calculated by the formula, a random disturbance is added to make the data more authentic (obviously, pricetrice will not only be affected by areaareaarea and ageage, adding random disturbance is equivalent to adding other unknown factors to pricetrice).
Of course, this is just a simple example, not too much simulation of each attribute value to make it more realistic. (for example, in real life, area area and age must be positive numbers, and there is a large gap between area area and age in the order of magnitude.)
This paper focuses on helping readers understand the process of linear regression from the perspective of code implementation. In the actual project development process, data naturally has its access.
3. Display the generated data with images
In []: shows a two-dimensional scatter diagram, the relationship between a column of attributes and labels

plt.scatter(features[:, 1].numpy(), labels.numpy(), 1);

In []: display the three-dimensional scatter diagram and the relationship between the two columns of attributes and labels

fig = plt.figure()
ax = p3d.Axes3D(fig)
X = features[:, 0].numpy()
Y = features[:, 1].numpy()
Z = labels.numpy()
ax.scatter3D(X, Y, Z);

4. Read data set
In [ ]:

def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    random.shuffle(indices)  # random read 10 samples
    for i in range(0, num_examples, batch_size):
        j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) # the last time may be not enough for a whole batch
        yield  features.index_select(0, j), labels.index_select(0, j)

In []: take 10 samples to view

batch_size = 10

for X, y in data_iter(batch_size, features, labels):
    print(X, '\n', y)
    break

5. Initialize model parameters
In [ ]:

w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)

w.requires_grad_(requires_grad=True)
b.requires_grad_(requires_grad=True)

The parameters that need to be learned in the model are initialized and the gradient attribute is turned on (in the optimization process, the two parameters need to be updated through gradient iteration). W w w is a tensor of 2 * 12 * 12 * 1, which is equivalent to [ware wage] \ begin {bMatrix} w {area} \ \ w {age} \ end {bMatrix} [warea wave]; bbb is scalar.
6. Define the model
Define the training model for training parameters:
price=warea⋅area+wage⋅age+bprice=w_{area} \cdot area + w_{age} \cdot age + b price=warea​⋅area+wage​⋅age+b
In [ ]:

def linreg(X, w, b):
    return torch.mm(X, w) + b

7. Define loss function
Use the mean square error loss function:
l(i)(w,b)=12(y^(i)−y(i))2l^{(i)}(\bm w, b) = \frac 12 (\hat y^{(i)} - y^{(i)})^2l(i)(w,b)=21​(y^​(i)−y(i))2
In [ ]:

def squared_loss(y_hat, y): 
    return (y_hat - y.view(y_hat.size())) ** 2 / 2

Note: y.view() is equivalent to y.reshape()
8. Define optimization function
Use small batch random gradient descent:
(w,b)←(w,b)−η∣B∣∑i∈B∂(w,b)l(i)(w,b)(\bm w, b) \leftarrow (\bm w, b) - \frac {\eta} {|B|} \sum_{i \in B} \partial_{(\bm w, b)}l^{(i)}(\bm w, b)(w,b)←(w,b)−∣B∣η​i∈B∑​∂(w,b)​l(i)(w,b)
In [ ]:

def sgd(params, lr, batch_size): 
    for param in params:
        param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track

9. training
When the data set, model, loss function and optimization function are defined, they can prepare for model training.
In [ ]:

# super parameters init
lr = 0.03
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
    # in each epoch, all the samples in dataset will be used once
    
    # X is the feature and y is the label of a batch sample
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y).sum()  
        # calculate the gradient of batch sample loss 
        l.backward()  
        # using small batch random gradient descent to iter model parameters
        sgd([w, b], lr, batch_size)  
        # reset parameter gradient
        w.grad.data.zero_()
        b.grad.data.zero_()
    train_l = loss(net(features, w, b), labels)
    print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))

In []: simply show the parameters obtained after training and the real parameters

w, true_w, b, true_b

The linear regression model is realized from zero.

Of course, we can also use PyTorch to implement linear regression model

1. Import basic module
In [ ]:

import torch
from torch import nn
import numpy as np
torch.manual_seed(1)

print(torch.__version__)
torch.set_default_tensor_type('torch.FloatTensor')

2. Generate data set
In [ ]:

num_inputs = 2
num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)

3. Read data set
In [ ]:

import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

# put dataset into DataLoader
data_iter = Data.DataLoader(
    dataset=dataset,            # torch TensorDataset format
    batch_size=batch_size,      # mini batch size
    shuffle=True,               # whether shuffle the data or not
    num_workers=2,              # read data in multithreading
)

In []: take 10 samples to view

for X, y in data_iter:
    print(X, '\n', y)
    break

4. Define the model
In [ ]:

class LinearNet(nn.Module):
    def __init__(self, n_feature):
        super(LinearNet, self).__init__()      # call father function to init 
        self.linear = nn.Linear(n_feature, 1)  # function prototype: `torch.nn.Linear(in_features, out_features, bias=True)`

    def forward(self, x):
        y = self.linear(x)
        return y
    
net = LinearNet(num_inputs)
print(net)

5. Initialize model parameters
In [ ]:

from torch.nn import init

init.normal_(net[0].weight, mean=0.0, std=0.01)
init.constant_(net[0].bias, val=0.0)  # or you can use `net[0].bias.data.fill_(0)` to modify it directly

In []: View network parameters

for param in net.parameters():
    print(param)

6. Define loss function
In [ ]:

loss = nn.MSELoss()    # nn built-in squared loss function
                       # function prototype: `torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`

7. Define optimization function
In [ ]:

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)   # built-in random gradient descent function
print(optimizer)  # function prototype: `torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)`

8. training
In [ ]:

num_epochs = 3
for epoch in range(1, num_epochs + 1):
    for X, y in data_iter:
        output = net(X)
        l = loss(output, y.view(-1, 1))
        optimizer.zero_grad() # reset gradient, equal to net.zero_grad()
        l.backward()
        optimizer.step()
    print('epoch %d, loss: %f' % (epoch, l.item()))

In []: simply show the parameters obtained after training and the real parameters

# result comparision
dense = net[0]
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)

Comparison of two implementation methods

  1. Implementation from scratch (recommended for learning)
    Better understanding of the underlying principles of models and neural networks
  2. Simple implementation using PyTorch
    Can complete the design and implementation of the model more quickly
Published 4 original articles, won praise 4, visited 1309
Private letter follow

Tags: Attribute Python jupyter IPython

Posted on Tue, 18 Feb 2020 07:18:55 -0500 by gotry