Introduction
Muke: Deep learning application development TensorFlow practice
Chapter 5: univariate linear regression: TensorFlow practice
Please see the theoretical part of this lecture https://blog.csdn.net/tangkcc/article/details/120614863 , TensorFlow version 2.3
What we want to achieve is a single variable linear equation, which can be expressed as y=w*x+b. In this case, an approximate sampling random distribution is randomly generated by generating an artificial data set, so that w=2.0, b=1, and a noise is added, and the maximum amplitude of the noise is 0.4
Import library and generate dataset
Import library
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt #Using matplotlib to display images in jupyter needs to be set to inline mode, otherwise the images will not be displayed in the web page %matplotlib inline tf.__version__
output:
2.3.0
Generate dataset
Firstly, the input data is generated, x and y satisfying this function are constructed, and some noise that does not satisfy the equation is added
#The method of generating an equal difference sequence by np is directly used to generate 100 points, and the value is [- 1,1] x_data=np.linspace(-1,1,100) np.random.seed(5)#Set random number seed #Generate the corresponding value of y=2x+1 and add noise at the same time y_data=2*x_data+1.0+np.random.randn(*x_data.shape)*0.4
np.random.randn returns one or more sample values from the standard normal distribution, for example
The effect of np.random.randn(*x_data.shape) in the source code is the same as np.random.randn(100)
Now let's draw the generated x and y images
plt.scatter(x_data,y_data) plt.xlabel('x') plt.ylabel('y') plt.title("Training Data") # Draw the objective function we want y=2x+1 plt.plot(x_data,1.0+2*x_data,'r',linewidth=3)
Build model
modeling
This model is relatively simple
def model(x,w,b): return tf.multiply(x,w)+b
Create variables to be optimized
w=tf.Variable(np.random.randn(),tf.float32)# Slope b=tf.Variable(0.0,tf.float32)#intercept
Define loss function
The loss function is used to describe the error between the predicted value and the real value, so as to guide the convergence direction of the model. Commonly used are mean square error (MSE) and cross entropy
Here we use mean square deviation
#Define mean square loss function def loss(x,y,w,b): err=model(x,w,b)-y#Calculate the difference between the predicted value and the real value squarred_err=tf.square(err)#Find the square to get the variance return tf.reduce_mean(squarred_err)#Find the mean and get the mean square deviation
Training model
Set training parameters
epochs=10#Number of iterations lr=0.01#Learning rate
Define calculation gradient function
def grad(x,y,w,b): with tf.GradientTape() as tape: loss_=loss(x,y,w,b) return tape.gradient(loss_,[w,b])# Return gradient vector
There is a difference between TF1 and TF2. In TensorFlow 2, tf.GradientTape() is used as the context manager to encapsulate the calculation steps that need derivation, and its gradient() method is used for derivation.
Executive Training (SGD)
step=0# Record the number of training steps loss_list=[]# Save a list of loss values display_step=10# Control the data display frequency in the training process, not a super parameter for epoch in range(epochs): for xs,ys in zip(x_data,y_data): loss_=loss(xs,ys,w,b)# Calculate loss loss_list.append(loss_) delta_w,delta_b=grad(xs,ys,w,b)#Calculated gradient change_w=delta_w*lr#Calculate w the amount to be adjusted change_b=delta_b*lr#Calculate b the amount to be adjusted w.assign_sub(change_w) b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change step=step+1 if step%display_step==0:#Show training process print(f'Training Epoch:{epoch+1} Step: {step} Loss:{loss_}') plt.plot(x_data,w.numpy()*x_data+b.numpy())
Here are some of the running results
The model fitted in this case is relatively simple, and it is close to convergence after 5 rounds of training. For complex models, it needs more training to converge
Display and visualize training results
print(f'w:{w.numpy()}, b:{b.numpy()}')
plt.scatter(x_data,y_data,label='Original data') plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3) plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3) plt.legend(loc=2)#Set legend location
View loss changes
plt.plot(loss_list)
Make predictions
x_test=3.21 predict=model(x_test,w.numpy(),b.numpy()) target=2*x_test+1.0 print(f'Predicted value:{predict}, Target value:{target}')
output
Predicted value: 7.426405906677246, Target value: 7.42
Batch gradient descent BGD model training
The random gradient descent method (SGD) uses only one sample per iteration (batch size is 1). If enough iterations are carried out, SGD can also play a role. The term "random" means that a sample constituting each batch is randomly selected.
In the gradient descent method, batch refers to the total number of samples used to calculate the gradient in a single iteration.
Assuming that batch refers to the entire data set, the data set usually contains large samples (tens of thousands or even hundreds of billions). In addition, the data set usually contains multiple features. Therefore, a batch can be quite large. If it is a very large batch, it may take a long time to calculate in a single iteration.
Small batch stochastic gradient descent method (small batch SGD) is a compromise between full batch iteration and SGD. Small batches usually contain 10-1000 randomly selected samples. Small batch SGD can reduce the number of messy samples in SGD, but it is still more efficient than full batch.
Then, the next step is implementation. We only need to modify part of the above code.
Modify training super parameters
epochs=100#Number of iterations lr=0.05#Learning rate
The training cycle and learning rate need to be adjusted. The training cycle is temporarily set to 100, which means that all samples should participate in 100 trainings. The learning rate is set to 0.05, which is larger than that of SGD version.
Modify model training process
loss_list=[]# Save a list of loss values for epoch in range(epochs): loss_=loss(x_data,y_data,w,b)# Calculate loss loss_list.append(loss_) delta_w,delta_b=grad(x_data,y_data,w,b)#Calculated gradient change_w=delta_w*lr#Calculate w the amount to be adjusted change_b=delta_b*lr#Calculate b the amount to be adjusted w.assign_sub(change_w) b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change print(f'Training Epoch:{epoch+1} Loss={loss_}') plt.plot(x_data,w.numpy()*x_data+b.numpy())
Display and visualize training results
print(f'w:{w.numpy()}, b:{b.numpy()}') plt.scatter(x_data,y_data,label='Original data') plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3) plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3) plt.legend(loc=2)#Set legend location
Loss value visualization
plt.plot(loss_list)
Complete code
import tensorflow as tf import numpy as np import matplotlib.pyplot as plt #Using matplotlib to display images in jupyter needs to be set to inline mode, otherwise the images will not be displayed in the web page %matplotlib inline #The method of generating an equal difference sequence by np is directly used to generate 100 points, and the value is [- 1,1] x_data=np.linspace(-1,1,100) np.random.seed(5)#Set random number seed #Generate the corresponding value of y=2x+1 and add noise at the same time y_data=2*x_data+1.0+np.random.randn(*x_data.shape)*0.4 # np.random.randn is one or more sample values returned from the standard normal distribution, np.random.randn(10) plt.scatter(x_data,y_data) plt.xlabel('x') plt.ylabel('y') plt.title("Training Data") # Draw the objective function we want y=2x+1 plt.plot(x_data,1.0+2*x_data,'r',linewidth=3) def model(x,w,b): return tf.multiply(x,w)+b w=tf.Variable(np.random.randn(),tf.float32)# Slope b=tf.Variable(0.0,tf.float32)#intercept #Define mean square loss function def loss(x,y,w,b): err=model(x,w,b)-y#Calculate the difference between the predicted value and the real value squarred_err=tf.square(err)#Find the square to get the variance return tf.reduce_mean(squarred_err)#Find the mean and get the mean square deviation def grad(x,y,w,b): with tf.GradientTape() as tape: loss_=loss(x,y,w,b) return tape.gradient(loss_,[w,b])# Return gradient vector epochs=100#Number of iterations lr=0.05#Learning rate loss_list=[]# Save a list of loss values for epoch in range(epochs): loss_=loss(x_data,y_data,w,b)# Calculate loss loss_list.append(loss_) delta_w,delta_b=grad(x_data,y_data,w,b)#Calculated gradient change_w=delta_w*lr#Calculate w the amount to be adjusted change_b=delta_b*lr#Calculate b the amount to be adjusted w.assign_sub(change_w) b.assign_sub(change_b)#Change w, b to the value after subtracting the corresponding change print(f'Training Epoch:{epoch+1} Loss={loss_}') plt.plot(x_data,w.numpy()*x_data+b.numpy()) print(f'w:{w.numpy()}, b:{b.numpy()}') plt.scatter(x_data,y_data,label='Original data') plt.plot(x_data,x_data*2.0+1.0,label="Object line",color='g',linewidth=3) plt.plot(x_data,x_data*w.numpy()+b.numpy(),label="Fitted line",color='r',linewidth=3) plt.legend(loc=2)#Set legend location plt.plot(loss_list)
Summary of gradient descent algorithm
Batch gradient descent considers all samples in each iteration and does global optimization, but it costs large computational resources. If the training data is very large, it is impossible to realize the synchronous participation of all samples.
Random gradient descent only takes one sample data per iteration. Because the training of a single sample may bring a lot of noise, SGD does not move towards the overall optimization direction every iteration. Therefore, it may converge quickly at the beginning of training, but it will become very slow after training for a period of time.
Between SGD and BGD, there is another method that integrates the advantages of two gradient descent methods: MBGD (Mini batch gradient descent). A small batch is randomly selected from the training samples for training in each iteration, and the number of this small batch is also a super parameter.
Summary
This paper introduces the idea of using Tensorflow to realize machine learning through a simple example, focusing on the following steps:
- Generating artificial data sets and their visualization
- Build a linear model
- Define loss function
- (gradient descent) optimization process
- Visualization of training results
- Use the learned model for prediction
Study notes are for reference only. If there are errors, please correct them!