# linear regression

1. Basic elements of linear regression
2. The realization of linear regression model from zero
3. Simple implementation of linear regression model using pytorch

## Basic elements of linear regression

### Model

For the sake of simplicity, here we assume that the price only depends on two factors of housing condition, namely area (M2) and housing age (year). Next, we hope to explore the specific relationship between price and these two factors. Linear regression assumes that there is a linear relationship between the output and each input:

price=warea⋅area+wage⋅age+b \mathrm{price} = w_{\mathrm{area}} \cdot \mathrm{area} + w_{\mathrm{age}} \cdot \mathrm{age} + b price=warea​⋅area+wage​⋅age+b

### data set

We usually collect a series of real data, such as the real selling price of multiple houses and their corresponding area and age. We hope to find the model parameters on this data to minimize the error between the predicted price and the real price. In machine learning terminology, this data set is called training data set or training set. A house is called a sample, its real selling price is called label, and the two factors used to predict the label are called feature. Characteristics are used to represent the characteristics of samples.

### loss function

In model training, we need to measure the error between the predicted value and the real value. Generally, we choose a non negative number as the error, and the smaller the value is, the smaller the error is. A common choice is the square function. Its expression in evaluating the sample error with index iii is

l(i)(w,b)=12(y^(i)−y(i))2, l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2, l(i)(w,b)=21​(y^​(i)−y(i))2,

L(w,b)=1n∑i=1nl(i)(w,b)=1n∑i=1n12(w⊤x(i)+b−y(i))2. L(\mathbf{w}, b) =\frac{1}{n}\sum_{i=1}^n l^{(i)}(\mathbf{w}, b) =\frac{1}{n} \sum_{i=1}^n \frac{1}{2}\left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right)^2. L(w,b)=n1​i=1∑n​l(i)(w,b)=n1​i=1∑n​21​(w⊤x(i)+b−y(i))2.

### Optimization function random gradient descent

When the form of model and loss function is simple, the solution of the above error minimization problem can be directly expressed by formula. Such solutions are called analytical solution s. The linear regression and square error used in this section fall into this category. However, most deep learning models don't have analytic solutions, so we can only reduce the value of loss function as much as possible by optimizing the model parameters of finite iterations of algorithm. Such solutions are called numerical solution s.

In the optimization algorithm of numerical solution, mini batch stochastic gradient descent is widely used in deep learning. Its algorithm is very simple: first select the initial value of a group of model parameters, such as random selection; then iterate the parameters many times, so that each iteration may reduce the value of the loss function. In each iteration, a small batch (B \ match {B} b) consisting of a fixed number of training data samples is randomly and uniformly sampled, and then the derivative (gradient) of model parameters related to the average loss of data samples in small batch is calculated. Finally, the product of this result and a preset positive number is used as the reduction of model parameters in this iteration.

(w,b)←(w,b)−η∣B∣∑i∈B∂(w,b)l(i)(w,b) (\mathbf{w},b) \leftarrow (\mathbf{w},b) - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{(\mathbf{w},b)} l^{(i)}(\mathbf{w},b) (w,b)←(w,b)−∣B∣η​i∈B∑​∂(w,b)​l(i)(w,b)

Learning rate: η \ eta η represents the step size that can be learned in each optimization
Batch size: B \ batch {B} B is the batch size in small batch calculation

To summarize, there are two steps to optimize the function:

• (i) Initialization of model parameters, generally using random initialization;
• (ii) we iterate on the data several times, and update each parameter by moving the parameter in the negative gradient direction.

## Vector computing

In model training or prediction, we often process multiple data samples at the same time and use vector calculation. Before introducing the vector calculation expression of linear regression, let's consider two methods of adding two vectors.

1. One way to add vectors is to add them scalar by element.
2. Another way to add vectors is to add these two vectors directly.
import torch
import time

# init variable a, b as 1000 dimension vector
n = 1000
a = torch.ones(n)
b = torch.ones(n)


# define a timer class to record time
class Timer(object):
"""Record multiple running times."""
def __init__(self):
self.times = []
self.start()

def start(self):
# start the timer
self.start_time = time.time()

def stop(self):
# stop the timer and record time into a list
self.times.append(time.time() - self.start_time)
return self.times[-1]

def avg(self):
# calculate the average and return
return sum(self.times)/len(self.times)

def sum(self):
# return the sum of recorded time
return sum(self.times)


Now we can test it. First, we use the for loop to do scalar addition by element.

timer = Timer()
c = torch.zeros(n)
for i in range(n):
c[i] = a[i] + b[i]
'%.5f sec' % timer.stop()

'0.01162 sec'


timer.start()
d = a + b
'%.5f sec' % timer.stop()

'0.00027 sec'


The results show that the latter is faster than the former. Therefore, we should use vector calculation as much as possible to improve the efficiency of calculation.

## Realization of linear regression model from zero

# import packages and modules
%matplotlib inline
import torch
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import random

print(torch.__version__)

1.3.0


### Generate data set

Use linear model to generate data set, and generate a data set of 1000 samples. The following is the linear relationship used to generate data:

price=warea⋅area+wage⋅age+b \mathrm{price} = w_{\mathrm{area}} \cdot \mathrm{area} + w_{\mathrm{age}} \cdot \mathrm{age} + b price=warea​⋅area+wage​⋅age+b

# set input feature number
num_inputs = 2
# set example number
num_examples = 1000

# set true weight and bias in order to generate corresponded label
true_w = [2, -3.4]
true_b = 4.2

features = torch.randn(num_examples, num_inputs,
dtype=torch.float32)
labels = true_w * features[:, 0] + true_w * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()),   # It is impossible to strictly linear distribution, plus a deviation
dtype=torch.float32)


### Use images to show the generated data

plt.scatter(features[:, 1].numpy(), labels.numpy(), 1) def data_iter(batch_size, features, labels):
num_examples = len(features)
indices = list(range(num_examples))
random.shuffle(indices)  # random read 10 samples
for i in range(0, num_examples, batch_size):
# the last time may be not enough for a whole batch
j = torch.LongTensor(indices[i: min(i + batch_size, num_examples)]) #B? Size per jump
yield features.index_select(0, j), labels.index_select(0, j)   # Return one and exit

batch_size = 10

for X, y in data_iter(batch_size, features, labels):
print(X, '\n', y)
break

tensor([[ 0.6802, -0.1529],
[-1.1139, -1.0506],
[ 1.1355,  0.6313],
[ 0.9057,  1.4527],
[-0.7477, -0.4656],
[-0.4012, -0.1810],
[ 0.0038, -1.3701],
[-0.9438, -1.6571],
[ 0.4016,  0.4233],
[ 0.8824, -1.0067]])
tensor([6.0793, 5.5343, 4.3112, 1.0657, 4.2856, 4.0040, 8.8709, 7.9642, 3.5696,
9.3887])


### Initialize model parameters

w = torch.tensor(np.random.normal(0, 0.01, (num_inputs, 1)), dtype=torch.float32)
b = torch.zeros(1, dtype=torch.float32)


tensor([0.], requires_grad=True)


### Definition model

Define the training model for training parameters:

price=warea⋅area+wage⋅age+b \mathrm{price} = w_{\mathrm{area}} \cdot \mathrm{area} + w_{\mathrm{age}} \cdot \mathrm{age} + b price=warea​⋅area+wage​⋅age+b

def linreg(X, w, b):


### Define loss function

We use the mean square error loss function:
l(i)(w,b)=12(y^(i)−y(i))2, l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2, l(i)(w,b)=21​(y^​(i)−y(i))2,

def squared_loss(y_hat, y):
return (y_hat - y.view(y_hat.size())) ** 2 / 2   # view indicates the same data and different size

# s = squared_loss(2.33,3.14) + squared_loss(1.07,0.98) + squared_loss(1.23,1.32)
y_hat = torch.tensor([[2.33],
[ 1.07],
[ 1.23]])
y = torch.tensor([3.14, 0.98, 1.32])
s = squared_loss(y_hat, y)
print(s.mean())

tensor(0.1121)


### Define optimization function

Here, the optimization function uses random gradient descent of small batch:

(w,b)←(w,b)−η∣B∣∑i∈B∂(w,b)l(i)(w,b) (\mathbf{w},b) \leftarrow (\mathbf{w},b) - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{(\mathbf{w},b)} l^{(i)}(\mathbf{w},b) (w,b)←(w,b)−∣B∣η​i∈B∑​∂(w,b)​l(i)(w,b)

def sgd(params, lr, batch_size):
# params : [w, b]
for param in params:
# Accumulate a value in the negative direction of the gradient to achieve the optimization effect (tend to the bottom)
param.data -= lr * param.grad / batch_size # ues .data to operate param without gradient track


### train

When the data set, model, loss function and optimization function are defined, they can prepare for model training.

# super parameters init
lr = 0.03  # lr learning rate
num_epochs = 5

net = linreg
loss = squared_loss

# training
for epoch in range(num_epochs):  # training repeats num_epochs times
# in each epoch, all the samples in dataset will be used once

# X is the feature and y is the label of a batch sample
for X, y in data_iter(batch_size, features, labels):
l = loss(net(X, w, b), y).sum()
# calculate the gradient of batch sample loss
l.backward()
# using small batch random gradient descent to iter model parameters
sgd([w, b], lr, batch_size)
w.grad.data.zero_()  # After each operation, the previous gradient record needs to be cleared
train_l = loss(net(features, w, b), labels)
print('epoch %d, loss %f' % (epoch + 1, train_l.mean().item()))


## Simple implementation of linear regression model using pytorch

import torch
from torch import nn
import numpy as np
torch.manual_seed(1)   # The same random initialization seed ensures that the results can be repeated

print(torch.__version__)
torch.set_default_tensor_type('torch.FloatTensor')

1.3.0


### Generate data set

Generating datasets here is exactly the same as in a zero based implementation.

num_inputs = 2
num_examples = 1000

true_w = [2, -3.4]
true_b = 4.2

features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)
labels = true_w * features[:, 0] + true_w * features[:, 1] + true_b
labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)


import torch.utils.data as Data

batch_size = 10

# combine featues and labels of dataset
dataset = Data.TensorDataset(features, labels)

dataset=dataset,            # torch TensorDataset format
batch_size=batch_size,      # mini batch size
shuffle=True,               # whether shuffle the data or not
)

for X, y in data_iter:
print(X, '\n', y)
break

tensor([[ 0.2134,  0.7981],
[-0.3899, -0.4544],
[ 1.4472, -1.2160],
[-0.7354,  1.2216],
[-1.3233,  0.6937],
[-0.2810,  0.5505],
[-1.6620, -0.1457],
[-0.7635,  0.7058],
[ 0.6079, -0.7497],
[ 0.4924, -0.1376]])
tensor([ 1.9045,  4.9770, 11.2503, -1.4180, -0.8159,  1.7737,  1.3910,  0.2721,
7.9648,  5.6408])


### Definition model

class LinearNet(nn.Module):
def __init__(self, n_feature):
super(LinearNet, self).__init__()      # call father function to init
self.linear = nn.Linear(n_feature, 1)  # function prototype: torch.nn.Linear(in_features, out_features, bias=True)

def forward(self, x):
y = self.linear(x)
return y

net = LinearNet(num_inputs)
print(net)

LinearNet(
(linear): Linear(in_features=2, out_features=1, bias=True)
)

# ways to init a multilayer network
# method one
net = nn.Sequential(   # A timing container in which Modules are added in the order they are passed in
nn.Linear(num_inputs, 1)
# other layers can be added here
)

# method two
net = nn.Sequential()

# method three
from collections import OrderedDict
net = nn.Sequential(OrderedDict([
('linear', nn.Linear(num_inputs, 1))
# ......
]))

print(net.weight)
print(net.bias)
# print(net)

Parameter containing:
Parameter containing:


### Initialize model parameters

from torch.nn import init

init.normal_(net.weight, mean=0.0, std=0.01)  # weight regeneration
init.constant_(net.bias, val=0.0)  # bias [assigned 0

print(net.weight)
print(net.bias)

Parameter containing:
Parameter containing:

for param in net.parameters():
print(param)  # param is the above weight and bias

Parameter containing:
Parameter containing:


### Define loss function

loss = nn.MSELoss()    # NN built in squared loss function
# function prototype: torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')


### Define optimization function

import torch.optim as optim

optimizer = optim.SGD(net.parameters(), lr=0.03)   # built-in random gradient descent function
print(optimizer)  # function prototype: torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)

SGD (
Parameter Group 0
dampening: 0
lr: 0.03
momentum: 0
nesterov: False
weight_decay: 0
)


### train

num_epochs = 3
for epoch in range(1, num_epochs + 1):
for X, y in data_iter:
output = net(X)
l = loss(output, y.view(-1, 1))
l.backward()
optimizer.step()
print('epoch %d, loss: %f' % (epoch, l.item()))

epoch 1, loss: 0.000299
epoch 2, loss: 0.000058
epoch 3, loss: 0.000051

# result comparision
dense = net
print(true_w, dense.weight.data)
print(true_b, dense.bias.data)

[2, -3.4] tensor([[ 2.0007, -3.3992]])
4.2 tensor([4.1989])  Published 2 original articles · praised 0 · visited 1

Tags: IPython network

Posted on Fri, 14 Feb 2020 09:08:06 -0500 by WDPEjoe