Tensorflow&numpy&keras'most detailed learning notes (with sample code and exercise program for each function)

The author systematically summarizes Tensorflow's learning and notes in combination with Tensorflow learning web course in Peking University and some personal understandings, including the implementation and improvement of Tensorflow_keras, which constructs BP,CNN,RNN network models from the basic tensor creation to the in-depth construction, and makes self-adjustment and implementation by using the samples in Tensorflow learning web course in Peking University.
This paper can be used and referenced by beginners of the same network as the author, or by scholars who need to use the network framework quickly. If you need to consult detailed network construction and mathematical derivation such as CNN, RNN, LSTM, DHNN, the author here recommends a good book: Neural Networks and Deep Learning.Qiu Xipeng, Editor, Machinery Industry Publication. Here is a brief introduction to the construction of networks and some superficial mathematical derivations. For readers who need to know the construction and working principle of neural networks, they can refer to the e-book Neuronal Dynamics From single neurons to networks and models of cognition for their interest. The original e-book address is as follows:
The original address of the Tensorflow Web course at Peking University is as follows. Readers can check it for themselves if they want.
This article does not contain strict mathematical derivation, only application.
To read this paper, you need to know the basic framework and calculation method of the neural network. If you need to know, you can refer to the working method of the BP network in Zhou Zhihua's Machine Learning and the working principle of RNN and CNN in the two books above.
Now I have written about BP, the author will keep updating the contents of RNN and CNN, I hope you can support it!
Enter the topic below

1. Tensor creation and tensor conversion

Tensorflow's tensor is defined as follows:

1. tf.constant (tensor content, dtype=data type)
Usage explanations and explanations of functions:

1 Common dtype examples are as follows:

Sequence Number12345
type t f . i n t tf.int tf.int t f . f l o a t tf.float tf.float t f . f l o a t 32 tf.float32 tf.float32 t f . f l o a t 64 tf.float64 tf.float64 t f . i n t 32 tf.int32 tf.int32

Of course, this is not limited to tables, such as tf.string and tf.bool, which are not commonly listed by the authors. Readers who want to use them can find out more.
(2) Tensors can be one-dimensional values, two-dimensional vectors, or three-dimensional matrices, n-dimensional vectors /.

tf.constant(35,dtype=tf.int64)#This is the definition of number
tf.constant([1.1,2.2,3.3],dtype=tf.float32)#This is the definition of a vector
tf.constant([[1.1,2.2],[3.3,4.4]],dtype=tf.float32)#This is the definition of the matrix

By analogy, you can define how many dimensions you want to define.
Tensorflow's tensor conversion function is as follows:

2. tf.convert_to_numpy (data name to be converted, dtype=data type)

Since python data types are often imported as numpy np objects in daily life, data conversion between np and tf is required.

import numpy as np
import tensorflow as tf
n_data=tf.convert_to_tensor(data,dtype=tf.int64)#Data conversion to tf type data

2. Common Tensorflow functions

3. tf.zeros (dimension) tf.ones (dimension) tf.fill (dimension, specified fill value)
tf.zeros([2,2])#- Generate a zero matrix of 2 x 2.
tf.ones(7)#- 7 1-D vectors are generated.
tf.fill([4,4],9)#- Generate a 4 x 4 full 9 matrix.
4. Normal Distribution (Truncated and Non-Truncated) - Uniform Distribution
1 tf.random.normal (dimension, mean=mean, stddev=standard deviation)
(2) tf.random.truncated_normal (dimension, mean=mean, stddev=standard deviation)
(3) tf.random.uniform (dimension, minval=minimum, maxval=maximum)
Usage explanations and explanations of functions:

The first is to generate normal distribution random numbers, the second is to generate truncated normal distribution random numbers, where the truncated means to generate normal distribution data with a mean of the origin and a double standard deviation of the variation range, which is equivalent to truncating the tails on both sides. The third function is to generate data with a uniform distribution [minval, maxval].

#Random number with 2 x 2-dimensional normal distribution, which satisfies a mean of 5 and a variance of 1
#Truncated normal distribution, the rest as above.
#Generate [-1,6] uniformly distributed 2*2 matrix data.
5. Other common operations (addition, subtraction, multiplication, division, etc.)
1 tf.cast (Tensor name, dtype=data type) - Tensor cast
(2) tf.reduce_min (tensor name, axis=0 or 1) - Minimum, axis 0 for column requirement (minimum of each column), axis 1 for row minimum (minimum of each row).
(3) tf.reduce_max (tensor name, axis=0 or 1) - Maximum, axis 0 for column by column (maximum in each column) and axis 1 for row by row (maximum in each row).
(4) tf.reduce_mean (tensor name, axis=0 or 1) - mean, axis 0 means column by column (average of each column), axis 1 means row by row (average of each row).
Tf.reduce_sum (tensor name, axis=0 or 1) - sum, axis 0 for column requirements (sum of each column), axis 1 for row requirements (sum of each row)
Tf.add (tensor 1, tensor 2) (tensor 1, tensor 2) (tensor 1+tensor 2), tf.subtract (tensor 1, tensor 2) (tensor 1-tensor 2), tf.multiply (tensor 1, tensor 2) (tensor 1-tensor 2), tf.multiply (tensor 1, tensor 1, tensor 2) (tensor 1_tensor 2), tf.square (tensor 1) (tensor 1square) (tensor 1square), TF 1 square, tf.pow (tensor 1, tf.pow (tensor 1, tensor 2) (tensor power of tensor 1), tf.subtract (tf.sqrt (tf 1), tf.sqrt (tensor 1), TF 1 (sqrt (tensor 1 tf.matmul (tensor 1, 2)(Matrix Multiplication of Tensor 1 x Tensor 2, tf.greater (Tensor 1, Tensor 2) returns a bool matrix if Tensor 1 is greater than Tensor 2.
Tf.where (conditional statement, A, B) - Conditional statement executes A true, otherwise executes B
import tensorflow as tf
tf.compat.v1.disable_eager_execution()#Version 2.0 is not compatible with Session
tf.cast(data,dtype=tf.float64)#Cast Data
tf.reduce_min(data,axis=1)#Minimum by column with result [1.1,3.3,5.5]
tf.reduce_min(data,axis=0)#Minimum by row with result [1.1,2.2]
tf.reduce_max(data,axis=1)#Maximum by column, result [2.2, 4.4, 6.6]
tf.reduce_max(data,axis=0)#Maximum by row, result [5.5, 6.6]
tf.reduce_mean(data,axis=1)#Average by column, result is [1.6500001, 3.85, 6.05]
tf.reduce_mean(data,axis=0)#Average by row with result [3.3 4.4]
tf.reduce_sum(data,axis=1)#Sum by column, the result is [3.3000002,7.7,12.1]
tf.reduce_sum(data,axis=0)#Sum by row, the result is [9.9,13.200001]
tf.add(data1,data2)#Add, and the result is 6
tf.subtract(data1,data2)#Subtraction, the result is -5
tf.multiply(data1,data2)#Multiplication, result is 5
tf.divide(data1,data2)#Division, the result is 0.2
tf.square(data1)#Square, the result is 1
tf.pow(data1,data2)#Multiplier, result is 1
tf.sqrt(data1)#Formula, result is 1
tf.matmul(data3,data3)#Matrix multiplication with the following results:
#[[ 30  36  42], [ 66  81  96], [102 126 150]]
sess=tf.compat.v1.Session()#Version 2.0 is not compatible with Session.
A=tf.where(tf.greater(data1,data2),1,0)#Is data1 larger than data2? Yes, 1, otherwise 0
print(sess.run(A))#Since data1<data2 is false, perform the second assignment, A is 0.
#Output is 0
B=tf.where(tf.greater(y1,y_1),1,0)#Is y1 larger than y_1? Yes, 1, otherwise 0
print(sess.run(B))#Output as [0 0 0 1]

3. Usage of Tensorflow Network Framework Fundamental Functions

6. Initialization of network parameters and trainable function tf.Variable (tensor)
Usage explanations and explanations of functions:

Tf.Variable (Tensor) sets a tensor as a trainable parameter, which can be used for training and reverse propagation in neural networks.

b1=tf.Variable(tf.constant(0.01,shape=[35]))#2*35 Layer 1
b2=tf.Variable(tf.constant(0.01,shape=[6]))#35*6 Layer 2
b3=tf.Variable(tf.constant(0.01,shape=[1]))#6*1 Layer 3

Here the author defines a three-layer neural network parameter. Assuming our input is a two-dimensional vector, which is 1 x 2, the first layer has 2 x 35 parameters, the offset is 1 x 35, the first layer has a vector of 1 x 35 after walking, the second layer has 35 x 6 parameters, the offset is 1 x 6, the second layer has a vector of 1 x 6 after walking, we assume the output is 1 value, then the third layer has 6There are six trainable variables defined here that can be used in future reverse propagation and other operations.

7. Loss Function-Activation Function

First, the softmax function is introduced:

tf.nn.softmax(X)#Distribute the input N-dimensional vector X in probability p i p_i The pi output is as follows:

p i = e ( x i ) ∑ j = 1 N e ( x j ) p_i=\frac{e^{(x_i)}}{\sum_{j=1}^Ne^{(x_j)}} pi​=∑j=1N​e(xj​)e(xi​)​

I. Loss function loss(y1,y_1), we assume that the true value is Y1 and the network output value is y_1
1. Square loss function tf.reduce.mean(tf.square(y_1-y1))
(2) Cross-entropy loss function tf.losses.categorical_crossentropy(y_1,y1)

Key usage of the cross-entropy loss function: Y 1 and Y 1 input to the cross-entropy function are both probability distributions, for example!
So it is generally converted into probability distribution by softmax before cross-entropy calculation. The calculation method of cross-entropy is as follows: H(y_1,y1)= − ∑ -\sum −∑(y_1)log(y1)

(3) The cross-entropy loss function that can be calculated directly without software Max is equivalent to the combined version of software Max and cross-entropy: tf.nn.softmax_cross_entropy_with_logits(y_1,y)
tf.losses.categorical_crossentropy(y_1,y1)#Loss of cross-entropy, result 0.938
tf.reduce_mean(tf.square(y_1-y1))#Square loss, result 0.020000001
tf.nn.softmax_cross_entropy_with_logits(y_2,y2)#Direct calculation of cross-entropy loss yields 22.82478 (no softmax required)

(4) Custom loss functions: for example, the following piecewise functions:

The definition here is different, and the code is as follows:

example8: custom loss function:
print(sess.run(loss))#The result is 0.16, please calculate it yourself to verify the correctness
II. Activation Function

Generally speaking, if one level of input over the network is x x x, without any activation, the output of this layer y y y should look like this: w w w is the weight matrix, b b b is the offset vector.
y = − x T w + b y=-x^Tw+b y=−xTw+b
But in this case, it is a linear equation, and our neural network aims to simulate the majority of non-linearity, so we need to add non-linearity factors. How to increase non-linearity? This introduces the emergence of activation function.
The improvement of the activation function is that the original output is no longer linear but adds a non-linear factor:
y = f ( − x T w + b ) y=f(-x^Tw+b) y=f(−xTw+b)
There f f f We call it an activation function, and the upper form is also the key to forward propagation. There are several main categories of activation functions:

①sigmod: f ( x ) = 1 1 + e x f(x)=\frac{1}{1+e^x} f(x)=1+ex1 is prone to gradient disappearance
②tanh: f ( x ) = 1 − e − 2 x 1 + e − 2 x f(x)=\frac{1-e^{-2x}}{1+e^{-2x}} f(x)=1+e_2x 1_e_2x The gradient disappears easily
③Relu: f ( x ) = 0 ( x < 0 ) , x ( x ≥ 0 ) f(x)=0(x<0),x(x≥0) F(x) =0 (x < 0), x (x < 0) resolves the disappearance of gradients greater than zero but not less than zero, resulting in slow convergence
④Leaky Relu: f ( x ) = m a x ( a x , x ) f(x)=max(ax,x) f(x)=max(ax,x) a a A is a superparameter, which we usually give ourselves. This can solve the problem of vanishing gradients less than zero, but this function is rarely used.
Other custom activation functions

The quality of the activation function actually determines how well the network is built. Therefore, if we select the activation function better, the network effect and accuracy will be higher. Conversely, if you think your network training is not good, you can start with the activation function, or you can start with the network optimization later.

8. Calculation of Classic BP Network Gradients and Gradient Use of tf.GradientTape()

Here mooc gives a method of calculating the gradient using the with...as structure. As you can see, it is somewhat similar to a simple omitted try...except statement. The detailed usage is not described here.

Usage explanations and explanations of functions:
with tf.GradientTape() as tp:
     #This is where the forward propagation formula of the network is written.
     #After writing the forward propagation formula of the network, write the definition of loss function loss of the network
grades=tp.gradient(loss,List of parameters to derive)#Find the gradient 

We know that in traditional BP network, the most classic is reverse propagation, which requires learning rate and gradient to update what we set before w w w Weight Matrix and b b That is, given a learning rate LR, the gradient update rule is (with w w w and b b b Example:
w ∗ = w − L R × g r a d e s w ( g r a d e s w by w Of ladder degree ) w^*=w-LR*grades_w (gradient of grades_w to w) W=w_LR*gradesw (gradesw is the gradient of w)
b ∗ = b − L R × g r a d e s b ( g r a d e s b by b Of ladder degree ) b^*=b-LR*grades_b (gradient of grades_b to b) B=b_LR*gradesb (gradesb is the gradient of b)
Here some readers will be very confused when they read this structure and don't know what it is and how to use it. But that's okay. Let's leave a suspense, remember this form and reverse propagation, and understand that this function is gradient oriented. I'll give you a complete BP network model later. I'll explain it for you.

9. Data feature processing and parameter selection before network training.

The following two principles are important, so it is important to remember that we assume that the number of features to be trained is N. That is, the number of features to be input into the network is N.

(1) Parameters for initializing the network α α Alpha must satisfy the following normal distribution:

α ~ N ( 0 , 2 N ) α~N(0,\sqrt{\frac{2}{N}}) α~N(0,N2​ ​)

(2) The input features need to be centralized to follow the normal distribution:

N ( 0 , 1 ) N(0,1) N(0,1)

10. Features, Bach Selection Functions for Label Merging and Feeding Network
Tf.data.Dataset.from_tensor_slices (input features, output labels). batch( 2 n 2^n 2n)
Usage explanations and explanations of functions:

Bach represents the basic unit of training fed into a neural network. It is usually the n power of 2. Given n n n itself, for example, taking the 5 power of 2, 32 is then trained as a set of 32 feeding neural networks. This function packages the features and labels together and paves the way for subsequent function calls of the neural network. In other words, it is a merging function.

11. Other important functions:
(1) enumerate enumeration function, used in the for loop, not only can get the element value, but also the corresponding position of the element, do two things at a time, instead of writing tedious for functions:
for location,value in enumerate(list1):
    print(location,value)#Location returns location, value returns value of corresponding location
#The results were 0 34, 1 35, 2 36.
(2) Variable.assign_sub
a.assign_sub(2)#The equivalent of a becomes 1-2=-1, which means a=a-2
a.assign_sub(-2)#The equivalent of a becoming 1-(-2)=3 is a=a-(-2)

4. Construction of TensorFlow, the most basic neural network

With the content and knowledge we have described above, we can begin to build our first neural network. This example file uses the dot.csv file provided by mooc. I will explain it step by step below.

step1:import-related library functions

We need pandas to read csv files, numpy to preprocess the read data, tensorflow to build the network, and of course, pictures are better to draw, so we need these four libraries to make calls.

import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
step2: Read the relevant feature labels and initially set up the network content.

Here's the dataset I'm using. It has two features x1 and x2, and it has an output y_c, so when we choose to start with 9, they should all obey when initializing the network parameters N ( 0 , 1 ) N(0,1) N(0,1) normal distribution.

file=pd.read_csv('C:/Users/DELL/Desktop/dot.csv')#Read file information
x_data=np.array(file[['x1','x2']])#Pick out the two columns named in the file
y_data=np.array(file['y_c'])#Pick out the file named y_c
x_train=np.vstack(x_data).reshape(-1,2)#Vertically stacked and converted into two columns
y_train=np.vstack(y_data).reshape(-1,1)#Vertically stacked and converted into a column
#Here the np.vstack function stands for vertical stacking
#reshape(-1,2) stands for conversion to two columns (because of two characteristics)
x_train=tf.cast(x_train,tf.float32)#Convert data type to tf object type
y_train=tf.cast(y_train,tf.float32)#Convert data type to tf object type
#32 sets of feeding neural networks, combined into a total dataset
Ir=0.005#Set up the learning rate of the network
epoch=3000#Set the number of training rounds for the network
step3: Build Network Framework

The author here selected a three-layer neural network framework, I believe that after reading the previous notes, you have a good understanding of how I set up, you can not see example6.

b1=tf.Variable(tf.constant(0.01,shape=[35]))#1*35 Layer 1 Output
w2=tf.Variable(tf.random.normal([35,7]),dtype=tf.float32)#35*7 Layer 2
b2=tf.Variable(tf.constant(0.01,shape=[7]))#1*7 Layer 2 Output
w3=tf.Variable(tf.random.normal([7,1]),dtype=tf.float32)#7*1 Layer 3
b3=tf.Variable(tf.constant(0.01,shape=[1]))#1*1 Layer 3 Output
step4: forward and reverse propagation update parameters

I believe you have already remembered that structure, so it should be easy to understand what these are doing:

for epoch in range(epoch):#Training 3000 rounds
    for i,(x_train,y_train) in enumerate(data):#Iterate over data
        with tf.GradientTape() as tape:#Following are forward propagations
            #End of Forward Propagation
            loss=tf.reduce_mean(tf.square(y_train-y))#Define loss function
        variables=[w1,b1,w2,b2,w3,b3]#What are the parameters for setting the gradient to be evaluated
        grads=tape.gradient(loss,variables)#loss gradients each variable
        w1.assign_sub(Ir*grads[0])#Following is the parameter update process, reverse propagation
        b3.assign_sub(Ir*grads[5])#Six parameters updated, end
   if(epoch%20==0):#How big is the following loss output per iteration of 20 rounds
step5: verify and test the network

So we've built the network, but we don't know if it's good or bad. First we need to draw our training scatter plot. Here we'll give you two colors:

Y_c=[['red' if y else 'blue'] for y in y_train]#Change 0-1 to a list of colors
#If y_train contains 1, then assign'red', otherwise assign'blue'
x1=x_data[:,0]#First coordinate
x2=x_data[:,1]#Second coordinate
plt.scatter(x1,x2,color=np.squeeze(Y_c))#Draw its image

The result is as follows
Let's create some x1 and x2 as test data to see how the network works

#Set grid point with step 0.1 corresponding to x 1 and x2
#ravel is a flattening function that flattens them.
#np.c transforms it into a grid coordinate point, xx.ravel() as the transverse coordinate, and yy.ravel() as the ordinate coordinate
grid=tf.cast(grid,tf.float32)#Data conversion for easy calculation
probs=[]#Store output y
for x_test in grid:#Replace with a trained network
    probs.append(y)#Add to Array
probs=np.array(probs).reshape(xx.shape)#Convert to xx shape for easy drawing
plt.scatter(x1,x2,color=np.squeeze(Y_c))#Draw its image

The following are the effects:

It is obvious that there is a fitting phenomenon! That is, the curve is not stable, which is also the model optimization problem we are going to talk about now.
By combining the above code, we get the complete network structure and source code as follows:

import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
file=pd.read_csv('C:/Users/DELL/Desktop/dot.csv')#Read file information stored as csv file
x_data=np.array(file[['x1','x2']])#Pick out the two columns named in the file
y_data=np.array(file['y_c'])#Pick out the file named y_c
x_train=np.vstack(x_data).reshape(-1,2)#Convert to two column merge
y_train=np.vstack(y_data).reshape(-1,1)#Convert to a column merge
Y_c=[['red' if y else 'blue'] for y in y_train]#Change 0-1 to a list of colors
x_train=tf.cast(x_train,tf.float32)#shifting clause
y_train=tf.cast(y_train,tf.float32)#shifting clause
data=tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(32)#32 sets of feeding neural networks, combined into a total dataset
b1=tf.Variable(tf.constant(0.01,shape=[35]))#2*35 Layer 1 Output
w2=tf.Variable(tf.random.normal([35,7]),dtype=tf.float32)#35*36 Layer 2
b2=tf.Variable(tf.constant(0.01,shape=[7]))#2*36 Layer 2 Output
w3=tf.Variable(tf.random.normal([7,1]),dtype=tf.float32)#36*1 Layer 3
Ir=0.005#learning rate
epoch=2000#Number of training rounds
for epoch in range(epoch):
    for i,(x_train,y_train) in enumerate(data):
        with tf.GradientTape() as tape:
for x_test in grid:

(Continuous updates) I hope you can support! Continue to update network optimization, octagon, CNN and RNN related content!

Tags: Python neural networks TensorFlow keras

Posted on Thu, 07 Oct 2021 15:29:54 -0400 by spiceweasel