Univariate linear regression: TensorFlow practice (practice)

Introduction

Muke: Deep learning application development TensorFlow practice
Chapter 5: univariate linear regression: TensorFlow practice
Please see the theoretical part of this lecture https://blog.csdn.net/tangkcc/article/details/120614863 , TensorFlow version 2.3

What we want to achieve is a single variable linear equation, which can be expressed as y=w*x+b. In this case, an approximate sampling random distribution is randomly generated by generating an artificial data set, so that w=2.0, b=1, and a noise is added, and the maximum amplitude of the noise is 0.4

Import library and generate dataset

Import library

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#Using matplotlib to display images in jupyter needs to be set to inline mode, otherwise the images will not be displayed in the web page
%matplotlib inline
tf.__version__

output:

2.3.0

Generate dataset

Firstly, the input data is generated, x and y satisfying this function are constructed, and some noise that does not satisfy the equation is added

#The method of generating an equal difference sequence by np is directly used to generate 100 points, and the value is [- 1,1]
x_data=np.linspace(-1,1,100)
np.random.seed(5)#Set random number seed
#Generate the corresponding value of y=2x+1 and add noise at the same time
y_data=2*x_data+1.0+np.random.randn(*x_data.shape)*0.4

np.random.randn returns one or more sample values from the standard normal distribution, for example

The effect of np.random.randn(*x_data.shape) in the source code is the same as np.random.randn(100)

Now let's draw the generated x and y images

plt.scatter(x_data,y_data)
plt.xlabel('x')
plt.ylabel('y')
plt.title("Training Data")
# Draw the objective function we want y=2x+1
plt.plot(x_data,1.0+2*x_data,'r',linewidth=3)

Build model

modeling

This model is relatively simple

def model(x,w,b):
    return tf.multiply(x,w)+b

Create variables to be optimized

w=tf.Variable(np.random.randn(),tf.float32)# Slope
b=tf.Variable(0.0,tf.float32)#intercept

Define loss function

The loss function is used to describe the error between the predicted value and the real value, so as to guide the convergence direction of the model. Commonly used are mean square error (MSE) and cross entropy
Here we use mean square deviation

#Define mean square loss function
def loss(x,y,w,b):
    err=model(x,w,b)-y#Calculate the difference between the predicted value and the real value
    squarred_err=tf.square(err)#Find the square to get the variance
    return tf.reduce_mean(squarred_err)#Find the mean and get the mean square deviation

Training model

Set training parameters

epochs=10#Number of iterations
lr=0.01#Learning rate

Define calculation gradient function

def grad(x,y,w,b):
    with tf.GradientTape() as tape:
        loss_=loss(x,y,w,b)
    return tape.gradient(loss_,[w,b])# Return gradient vector

There is a difference between TF1 and TF2. In TensorFlow 2, tf.GradientTape() is used as the context manager to encapsulate the calculation steps that need derivation, and its gradient() method is used for derivation.

Executive Training (SGD)

step=0# Record the number of training steps
loss_list=[]# Save a list of loss values
display_step=10# Control the data display frequency in the training process, not a super parameter
for epoch in range(epochs):
    for xs,ys in zip(x_data,y_data):
        loss_=loss(xs,ys,w,b)# Calculate loss
        loss_list.append(loss_)
        delta_w,delta_b=grad(xs,ys,w,b)#Calculated gradient
        change_w=delta_w*lr#Calculate w the amount to be adjusted
        change_b=delta_b*lr#Calculate b the amount to be adjusted
        w.assign_sub(change_w)
        b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change
        step=step+1
        if step%display_step==0:#Show training process
            print(f'Training Epoch:{epoch+1}  Step: {step}  Loss:{loss_}')
    plt.plot(x_data,w.numpy()*x_data+b.numpy())


Here are some of the running results
The model fitted in this case is relatively simple, and it is close to convergence after 5 rounds of training. For complex models, it needs more training to converge

Display and visualize training results

print(f'w:{w.numpy()}, b:{b.numpy()}')

plt.scatter(x_data,y_data,label='Original data')
plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3)
plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3)
plt.legend(loc=2)#Set legend location

View loss changes

plt.plot(loss_list)

Make predictions

x_test=3.21
predict=model(x_test,w.numpy(),b.numpy())
target=2*x_test+1.0
print(f'Predicted value:{predict}, Target value:{target}')

output

Predicted value: 7.426405906677246, Target value: 7.42

Batch gradient descent BGD model training

The random gradient descent method (SGD) uses only one sample per iteration (batch size is 1). If enough iterations are carried out, SGD can also play a role. The term "random" means that a sample constituting each batch is randomly selected.
In the gradient descent method, batch refers to the total number of samples used to calculate the gradient in a single iteration.
Assuming that batch refers to the entire data set, the data set usually contains large samples (tens of thousands or even hundreds of billions). In addition, the data set usually contains multiple features. Therefore, a batch can be quite large. If it is a very large batch, it may take a long time to calculate in a single iteration.
Small batch stochastic gradient descent method (small batch SGD) is a compromise between full batch iteration and SGD. Small batches usually contain 10-1000 randomly selected samples. Small batch SGD can reduce the number of messy samples in SGD, but it is still more efficient than full batch.
Then, the next step is implementation. We only need to modify part of the above code.

Modify training super parameters

epochs=100#Number of iterations
lr=0.05#Learning rate

The training cycle and learning rate need to be adjusted. The training cycle is temporarily set to 100, which means that all samples should participate in 100 trainings. The learning rate is set to 0.05, which is larger than that of SGD version.

Modify model training process

loss_list=[]# Save a list of loss values
for epoch in range(epochs):
    loss_=loss(x_data,y_data,w,b)# Calculate loss
    loss_list.append(loss_)
    delta_w,delta_b=grad(x_data,y_data,w,b)#Calculated gradient
    change_w=delta_w*lr#Calculate w the amount to be adjusted
    change_b=delta_b*lr#Calculate b the amount to be adjusted
    w.assign_sub(change_w)
    b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change
    print(f'Training Epoch:{epoch+1}  Loss={loss_}')
    plt.plot(x_data,w.numpy()*x_data+b.numpy())


Display and visualize training results

print(f'w:{w.numpy()}, b:{b.numpy()}')
plt.scatter(x_data,y_data,label='Original data')
plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3)
plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3)
plt.legend(loc=2)#Set legend location

Loss value visualization

plt.plot(loss_list)

Complete code

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
#Using matplotlib to display images in jupyter needs to be set to inline mode, otherwise the images will not be displayed in the web page
%matplotlib inline
#The method of generating an equal difference sequence by np is directly used to generate 100 points, and the value is [- 1,1]
x_data=np.linspace(-1,1,100)
np.random.seed(5)#Set random number seed
#Generate the corresponding value of y=2x+1 and add noise at the same time
y_data=2*x_data+1.0+np.random.randn(*x_data.shape)*0.4
# np.random.randn is one or more sample values returned from the standard normal distribution,
np.random.randn(10)
plt.scatter(x_data,y_data)
plt.xlabel('x')
plt.ylabel('y')
plt.title("Training Data")
# Draw the objective function we want y=2x+1
plt.plot(x_data,1.0+2*x_data,'r',linewidth=3)
def model(x,w,b):
    return tf.multiply(x,w)+b
w=tf.Variable(np.random.randn(),tf.float32)# Slope
b=tf.Variable(0.0,tf.float32)#intercept
#Define mean square loss function
def loss(x,y,w,b):
    err=model(x,w,b)-y#Calculate the difference between the predicted value and the real value
    squarred_err=tf.square(err)#Find the square to get the variance
    return tf.reduce_mean(squarred_err)#Find the mean and get the mean square deviation
def grad(x,y,w,b):
    with tf.GradientTape() as tape:
        loss_=loss(x,y,w,b)
    return tape.gradient(loss_,[w,b])# Return gradient vector
epochs=100#Number of iterations
lr=0.05#Learning rate
loss_list=[]# Save a list of loss values
for epoch in range(epochs):
    loss_=loss(x_data,y_data,w,b)# Calculate loss
    loss_list.append(loss_)
    delta_w,delta_b=grad(x_data,y_data,w,b)#Calculated gradient
    change_w=delta_w*lr#Calculate w the amount to be adjusted
    change_b=delta_b*lr#Calculate b the amount to be adjusted
    w.assign_sub(change_w)
    b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change
    print(f'Training Epoch:{epoch+1}  Loss={loss_}')
    plt.plot(x_data,w.numpy()*x_data+b.numpy())
print(f'w:{w.numpy()}, b:{b.numpy()}')
plt.scatter(x_data,y_data,label='Original data')
plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3)
plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3)
plt.legend(loc=2)#Set legend location
plt.plot(loss_list)

Summary of gradient descent algorithm

Batch gradient descent considers all samples in each iteration and does global optimization, but it costs large computational resources. If the training data is very large, it is impossible to realize the synchronous participation of all samples.
Random gradient descent only takes one sample data per iteration. Because the training of a single sample may bring a lot of noise, SGD does not move towards the overall optimization direction every iteration. Therefore, it may converge quickly at the beginning of training, but it will become very slow after training for a period of time.
Between SGD and BGD, there is another method that integrates the advantages of two gradient descent methods: MBGD (Mini batch gradient descent). A small batch is randomly selected from the training samples for training in each iteration, and the number of this small batch is also a super parameter.

Summary

This paper introduces the idea of using Tensorflow to realize machine learning through a simple example, focusing on the following steps:

  1. Generating artificial data sets and their visualization
  2. Build a linear model
  3. Define loss function
  4. (gradient descent) optimization process
  5. Visualization of training results
  6. Use the learned model for prediction

Study notes are for reference only. If there are errors, please correct them!

Tags: Python TensorFlow Deep Learning

Posted on Wed, 06 Oct 2021 13:35:04 -0400 by adzie