# Tensorflow&numpy&keras'most detailed learning notes (with sample code and exercise program for each function)

The author systematically summarizes Tensorflow's learning and notes in combination with Tensorflow learning web course in Peking University and some personal understandings, including the implementation and improvement of Tensorflow_keras, which constructs BP,CNN,RNN network models from the basic tensor creation to the in-depth construction, and makes self-adjustment and implementation by using the samples in Tensorflow learning web course in Peking University.
This paper can be used and referenced by beginners of the same network as the author, or by scholars who need to use the network framework quickly. If you need to consult detailed network construction and mathematical derivation such as CNN, RNN, LSTM, DHNN, the author here recommends a good book: Neural Networks and Deep Learning.Qiu Xipeng, Editor, Machinery Industry Publication. Here is a brief introduction to the construction of networks and some superficial mathematical derivations. For readers who need to know the construction and working principle of neural networks, they can refer to the e-book Neuronal Dynamics From single neurons to networks and models of cognition for their interest. The original e-book address is as follows:
https://neuronaldynamics.epfl.ch/index.html
The original address of the Tensorflow Web course at Peking University is as follows. Readers can check it for themselves if they want.
https://www.icourse163.org/learn/PKU-1002536002?tid=1462067447#/learn/announce
To read this paper, you need to know the basic framework and calculation method of the neural network. If you need to know, you can refer to the working method of the BP network in Zhou Zhihua's Machine Learning and the working principle of RNN and CNN in the two books above.
Now I have written about BP, the author will keep updating the contents of RNN and CNN, I hope you can support it!
Enter the topic below

# 1. Tensor creation and tensor conversion

Tensorflow's tensor is defined as follows:

##### 1. tf.constant (tensor content, dtype=data type)
###### Usage explanations and explanations of functions:

1 Common dtype examples are as follows:

Sequence Number12345
type t f . i n t tf.int tf.int t f . f l o a t tf.float tf.float t f . f l o a t 32 tf.float32 tf.float32 t f . f l o a t 64 tf.float64 tf.float64 t f . i n t 32 tf.int32 tf.int32

Of course, this is not limited to tables, such as tf.string and tf.bool, which are not commonly listed by the authors. Readers who want to use them can find out more.
(2) Tensors can be one-dimensional values, two-dimensional vectors, or three-dimensional matrices, n-dimensional vectors /.

##### example1:
tf.constant(35,dtype=tf.int64)#This is the definition of number
tf.constant([1.1,2.2,3.3],dtype=tf.float32)#This is the definition of a vector
tf.constant([[1.1,2.2],[3.3,4.4]],dtype=tf.float32)#This is the definition of the matrix


By analogy, you can define how many dimensions you want to define.
Tensorflow's tensor conversion function is as follows:

##### 2. tf.convert_to_numpy (data name to be converted, dtype=data type)

Since python data types are often imported as numpy np objects in daily life, data conversion between np and tf is required.

##### example2:
import numpy as np
import tensorflow as tf
data=np.arange(0,10,1)
n_data=tf.convert_to_tensor(data,dtype=tf.int64)#Data conversion to tf type data


# 2. Common Tensorflow functions

##### example3:
tf.zeros([2,2])#- Generate a zero matrix of 2 x 2.
tf.ones(7)#- 7 1-D vectors are generated.
tf.fill([4,4],9)#- Generate a 4 x 4 full 9 matrix.

##### (3) tf.random.uniform (dimension, minval=minimum, maxval=maximum)
###### Usage explanations and explanations of functions:

The first is to generate normal distribution random numbers, the second is to generate truncated normal distribution random numbers, where the truncated means to generate normal distribution data with a mean of the origin and a double standard deviation of the variation range, which is equivalent to truncating the tails on both sides. The third function is to generate data with a uniform distribution [minval, maxval].

##### example4:
tf.random.normal([2,2],mean=5,stddev=1)
#Random number with 2 x 2-dimensional normal distribution, which satisfies a mean of 5 and a variance of 1
tf.random.truncated_normal([2,2],mean=5,stddev=1)
#Truncated normal distribution, the rest as above.
tf.random.uniform([2,2],minval=-1,maxval=6)
#Generate [-1,6] uniformly distributed 2*2 matrix data.

##### example5:
import tensorflow as tf
tf.compat.v1.disable_eager_execution()#Version 2.0 is not compatible with Session
data=tf.constant([[1.1,2.2],[3.3,4.4],[5.5,6.6]],dtype=tf.float32)
tf.cast(data,dtype=tf.float64)#Cast Data
tf.reduce_min(data,axis=1)#Minimum by column with result [1.1,3.3,5.5]
tf.reduce_min(data,axis=0)#Minimum by row with result [1.1,2.2]
tf.reduce_max(data,axis=1)#Maximum by column, result [2.2, 4.4, 6.6]
tf.reduce_max(data,axis=0)#Maximum by row, result [5.5, 6.6]
tf.reduce_mean(data,axis=1)#Average by column, result is [1.6500001, 3.85, 6.05]
tf.reduce_mean(data,axis=0)#Average by row with result [3.3 4.4]
tf.reduce_sum(data,axis=1)#Sum by column, the result is [3.3000002,7.7,12.1]
tf.reduce_sum(data,axis=0)#Sum by row, the result is [9.9,13.200001]
data1=tf.constant(1,dtype=tf.float64)
data2=tf.constant(5,dtype=tf.float64)
data3=tf.constant([[1,2,3],[4,5,6],[7,8,9]],dtype=tf.int32)
tf.subtract(data1,data2)#Subtraction, the result is -5
tf.multiply(data1,data2)#Multiplication, result is 5
tf.divide(data1,data2)#Division, the result is 0.2
tf.square(data1)#Square, the result is 1
tf.pow(data1,data2)#Multiplier, result is 1
tf.sqrt(data1)#Formula, result is 1
tf.matmul(data3,data3)#Matrix multiplication with the following results:
#[[ 30  36  42], [ 66  81  96], [102 126 150]]
sess=tf.compat.v1.Session()#Version 2.0 is not compatible with Session.
A=tf.where(tf.greater(data1,data2),1,0)#Is data1 larger than data2? Yes, 1, otherwise 0
print(sess.run(A))#Since data1<data2 is false, perform the second assignment, A is 0.
#Output is 0
y1=tf.constant([0.1,0.6,0.3],dtype=tf.float32)
y_1=tf.constant([0.2,0.7,0.1],dtype=tf.float32)
B=tf.where(tf.greater(y1,y_1),1,0)#Is y1 larger than y_1? Yes, 1, otherwise 0
print(sess.run(B))#Output as [0 0 0 1]


# 3. Usage of Tensorflow Network Framework Fundamental Functions

##### 6. Initialization of network parameters and trainable function tf.Variable (tensor)
###### Usage explanations and explanations of functions:

Tf.Variable (Tensor) sets a tensor as a trainable parameter, which can be used for training and reverse propagation in neural networks.

##### example6:
w1=tf.Variable(tf.random.truncated_normal([2,35],mean=0,stddev=1))
b1=tf.Variable(tf.constant(0.01,shape=))#2*35 Layer 1
w2=tf.Variable(tf.random.truncated_normal([35,6],mean=0,stddev=1))
b2=tf.Variable(tf.constant(0.01,shape=))#35*6 Layer 2
w3=tf.Variable(tf.random.truncated_normal([66,1],mean=0,stddev=1))
b3=tf.Variable(tf.constant(0.01,shape=))#6*1 Layer 3


Here the author defines a three-layer neural network parameter. Assuming our input is a two-dimensional vector, which is 1 x 2, the first layer has 2 x 35 parameters, the offset is 1 x 35, the first layer has a vector of 1 x 35 after walking, the second layer has 35 x 6 parameters, the offset is 1 x 6, the second layer has a vector of 1 x 6 after walking, we assume the output is 1 value, then the third layer has 6There are six trainable variables defined here that can be used in future reverse propagation and other operations.

##### 7. Loss Function-Activation Function

First, the softmax function is introduced:

##### tf.nn.softmax(X)#Distribute the input N-dimensional vector X in probability p i p_i The pi output is as follows:

p i = e ( x i ) ∑ j = 1 N e ( x j ) p_i=\frac{e^{(x_i)}}{\sum_{j=1}^Ne^{(x_j)}} pi​=∑j=1N​e(xj​)e(xi​)​

##### (2) Cross-entropy loss function tf.losses.categorical_crossentropy(y_1,y1)

Key usage of the cross-entropy loss function: Y 1 and Y 1 input to the cross-entropy function are both probability distributions, for example!
So it is generally converted into probability distribution by softmax before cross-entropy calculation. The calculation method of cross-entropy is as follows: H(y_1,y1)= − ∑ -\sum −∑(y_1)log(y1)

##### example7:
y1=tf.constant([0.1,0.6,0.3],dtype=tf.float32)
y_1=tf.constant([0.2,0.7,0.1],dtype=tf.float32)
y2=tf.constant([1,6,3],dtype=tf.float32)
y_2=tf.constant([2,9,4],dtype=tf.float32)
tf.losses.categorical_crossentropy(y_1,y1)#Loss of cross-entropy, result 0.938
tf.reduce_mean(tf.square(y_1-y1))#Square loss, result 0.020000001
tf.nn.softmax_cross_entropy_with_logits(y_2,y2)#Direct calculation of cross-entropy loss yields 22.82478 (no softmax required)


##### (4) Custom loss functions: for example, the following piecewise functions:

loss(y1,y_1)=0.3(y_1-y1)(y_1>y1)
loss(y1,y_1)=0.5(y1-y_1)(y_1<y1)
The definition here is different, and the code is as follows:

##### example8: custom loss function:
loss=tf.reduce_sum(tf.where(tf.greater(y_1,y1),0.3(y_1-y1),0.5(y1-y_1)))
print(sess.run(loss))#The result is 0.16, please calculate it yourself to verify the correctness

##### II. Activation Function

Generally speaking, if one level of input over the network is x x x, without any activation, the output of this layer y y y should look like this: w w w is the weight matrix, b b b is the offset vector.
y = − x T w + b y=-x^Tw+b y=−xTw+b
But in this case, it is a linear equation, and our neural network aims to simulate the majority of non-linearity, so we need to add non-linearity factors. How to increase non-linearity? This introduces the emergence of activation function.
The improvement of the activation function is that the original output is no longer linear but adds a non-linear factor:
y = f ( − x T w + b ) y=f(-x^Tw+b) y=f(−xTw+b)
There f f f We call it an activation function, and the upper form is also the key to forward propagation. There are several main categories of activation functions:

##### Other custom activation functions

The quality of the activation function actually determines how well the network is built. Therefore, if we select the activation function better, the network effect and accuracy will be higher. Conversely, if you think your network training is not good, you can start with the activation function, or you can start with the network optimization later.

Here mooc gives a method of calculating the gradient using the with...as structure. As you can see, it is somewhat similar to a simple omitted try...except statement. The detailed usage is not described here.

###### Usage explanations and explanations of functions:
with tf.GradientTape() as tp:
#This is where the forward propagation formula of the network is written.
#After writing the forward propagation formula of the network, write the definition of loss function loss of the network


We know that in traditional BP network, the most classic is reverse propagation, which requires learning rate and gradient to update what we set before w w w Weight Matrix and b b That is, given a learning rate LR, the gradient update rule is (with w w w and b b b Example:
w ∗ = w − L R × g r a d e s w ( g r a d e s w by w Of ladder degree ) w^*=w-LR*grades_w (gradient of grades_w to w) W=w_LR*gradesw (gradesw is the gradient of w)
b ∗ = b − L R × g r a d e s b ( g r a d e s b by b Of ladder degree ) b^*=b-LR*grades_b (gradient of grades_b to b) B=b_LR*gradesb (gradesb is the gradient of b)
Here some readers will be very confused when they read this structure and don't know what it is and how to use it. But that's okay. Let's leave a suspense, remember this form and reverse propagation, and understand that this function is gradient oriented. I'll give you a complete BP network model later. I'll explain it for you.

##### 9. Data feature processing and parameter selection before network training.

The following two principles are important, so it is important to remember that we assume that the number of features to be trained is N. That is, the number of features to be input into the network is N.

##### (1) Parameters for initializing the network α α Alpha must satisfy the following normal distribution:

α ～ N ( 0 , 2 N ) α～N(0,\sqrt{\frac{2}{N}}) α～N(0,N2​ ​)

##### (2) The input features need to be centralized to follow the normal distribution:

N ( 0 , 1 ) N(0,1) N(0,1)

##### Tf.data.Dataset.from_tensor_slices (input features, output labels). batch( 2 n 2^n 2n)
###### Usage explanations and explanations of functions:

Bach represents the basic unit of training fed into a neural network. It is usually the n power of 2. Given n n n itself, for example, taking the 5 power of 2, 32 is then trained as a set of 32 feeding neural networks. This function packages the features and labels together and paves the way for subsequent function calls of the neural network. In other words, it is a merging function.

##### example9:
list1=[34,35,36]
for location,value in enumerate(list1):
print(location,value)#Location returns location, value returns value of corresponding location
#The results were 0 34, 1 35, 2 36.

##### example10:
a=1
a.assign_sub(2)#The equivalent of a becomes 1-2=-1, which means a=a-2
a.assign_sub(-2)#The equivalent of a becoming 1-(-2)=3 is a=a-(-2)


# 4. Construction of TensorFlow, the most basic neural network

With the content and knowledge we have described above, we can begin to build our first neural network. This example file uses the dot.csv file provided by mooc. I will explain it step by step below.

##### step1:import-related library functions

We need pandas to read csv files, numpy to preprocess the read data, tensorflow to build the network, and of course, pictures are better to draw, so we need these four libraries to make calls.

import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

##### step2: Read the relevant feature labels and initially set up the network content.

Here's the dataset I'm using. It has two features x1 and x2, and it has an output y_c, so when we choose to start with 9, they should all obey when initializing the network parameters N ( 0 , 1 ) N(0,1) N(0,1) normal distribution. file=pd.read_csv('C:/Users/DELL/Desktop/dot.csv')#Read file information
x_data=np.array(file[['x1','x2']])#Pick out the two columns named in the file
y_data=np.array(file['y_c'])#Pick out the file named y_c
x_train=np.vstack(x_data).reshape(-1,2)#Vertically stacked and converted into two columns
y_train=np.vstack(y_data).reshape(-1,1)#Vertically stacked and converted into a column
#Here the np.vstack function stands for vertical stacking
#reshape(-1,2) stands for conversion to two columns (because of two characteristics)
x_train=tf.cast(x_train,tf.float32)#Convert data type to tf object type
y_train=tf.cast(y_train,tf.float32)#Convert data type to tf object type
data=tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(32)
#32 sets of feeding neural networks, combined into a total dataset
Ir=0.005#Set up the learning rate of the network
epoch=3000#Set the number of training rounds for the network

##### step3: Build Network Framework

The author here selected a three-layer neural network framework, I believe that after reading the previous notes, you have a good understanding of how I set up, you can not see example6.

w1=tf.Variable(tf.random.normal([2,35]),dtype=tf.float32)
b1=tf.Variable(tf.constant(0.01,shape=))#1*35 Layer 1 Output
w2=tf.Variable(tf.random.normal([35,7]),dtype=tf.float32)#35*7 Layer 2
b2=tf.Variable(tf.constant(0.01,shape=))#1*7 Layer 2 Output
w3=tf.Variable(tf.random.normal([7,1]),dtype=tf.float32)#7*1 Layer 3
b3=tf.Variable(tf.constant(0.01,shape=))#1*1 Layer 3 Output

##### step4: forward and reverse propagation update parameters

I believe you have already remembered that structure, so it should be easy to understand what these are doing:

for epoch in range(epoch):#Training 3000 rounds
for i,(x_train,y_train) in enumerate(data):#Iterate over data
with tf.GradientTape() as tape:#Following are forward propagations
h1=tf.matmul(x_train,w1)+b1#wx+b
h1=tf.nn.relu(h1)#activation
h2=tf.matmul(h1,w2)+b2#wx+b
h2=tf.nn.relu(h2)#activation
y=tf.matmul(h2,w3)+b3#wx+b
#End of Forward Propagation
loss=tf.reduce_mean(tf.square(y_train-y))#Define loss function
variables=[w1,b1,w2,b2,w3,b3]#What are the parameters for setting the gradient to be evaluated
w1.assign_sub(Ir*grads)#Following is the parameter update process, reverse propagation
if(epoch%20==0):#How big is the following loss output per iteration of 20 rounds
print(epoch,float(loss))

##### step5: verify and test the network

So we've built the network, but we don't know if it's good or bad. First we need to draw our training scatter plot. Here we'll give you two colors:

Y_c=[['red' if y else 'blue'] for y in y_train]#Change 0-1 to a list of colors
#If y_train contains 1, then assign'red', otherwise assign'blue'
x1=x_data[:,0]#First coordinate
x2=x_data[:,1]#Second coordinate
plt.scatter(x1,x2,color=np.squeeze(Y_c))#Draw its image


The result is as follows Let's create some x1 and x2 as test data to see how the network works

xx,yy=np.mgrid[-3:3:0.1,-3:3:0.1]
#Set grid point with step 0.1 corresponding to x 1 and x2
grid=np.c_[xx.ravel(),yy.ravel()]
#ravel is a flattening function that flattens them.
#np.c transforms it into a grid coordinate point, xx.ravel() as the transverse coordinate, and yy.ravel() as the ordinate coordinate
grid=tf.cast(grid,tf.float32)#Data conversion for easy calculation
probs=[]#Store output y
for x_test in grid:#Replace with a trained network
h1=tf.matmul([x_test],w1)+b1
h1=tf.nn.relu(h1)
h2=tf.matmul(h1,w2)+b2
h2=tf.nn.relu(h2)
y=tf.matmul(h2,w3)+b3
probs=np.array(probs).reshape(xx.shape)#Convert to xx shape for easy drawing
plt.contour(xx,yy,probs,levels=[0.5])
plt.scatter(x1,x2,color=np.squeeze(Y_c))#Draw its image
plt.show()


The following are the effects: It is obvious that there is a fitting phenomenon! That is, the curve is not stable, which is also the model optimization problem we are going to talk about now.
By combining the above code, we get the complete network structure and source code as follows:

import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
x_data=np.array(file[['x1','x2']])#Pick out the two columns named in the file
y_data=np.array(file['y_c'])#Pick out the file named y_c
x_train=np.vstack(x_data).reshape(-1,2)#Convert to two column merge
y_train=np.vstack(y_data).reshape(-1,1)#Convert to a column merge
Y_c=[['red' if y else 'blue'] for y in y_train]#Change 0-1 to a list of colors
x_train=tf.cast(x_train,tf.float32)#shifting clause
y_train=tf.cast(y_train,tf.float32)#shifting clause
data=tf.data.Dataset.from_tensor_slices((x_train,y_train)).batch(32)#32 sets of feeding neural networks, combined into a total dataset
w1=tf.Variable(tf.random.normal([2,35]),dtype=tf.float32)
b1=tf.Variable(tf.constant(0.01,shape=))#2*35 Layer 1 Output
w2=tf.Variable(tf.random.normal([35,7]),dtype=tf.float32)#35*36 Layer 2
b2=tf.Variable(tf.constant(0.01,shape=))#2*36 Layer 2 Output
w3=tf.Variable(tf.random.normal([7,1]),dtype=tf.float32)#36*1 Layer 3
b3=tf.Variable(tf.constant(0.01,shape=))
Ir=0.005#learning rate
epoch=2000#Number of training rounds
for epoch in range(epoch):
for i,(x_train,y_train) in enumerate(data):
h1=tf.matmul(x_train,w1)+b1
h1=tf.nn.relu(h1)
h2=tf.matmul(h1,w2)+b2
h2=tf.nn.relu(h2)
y=tf.matmul(h2,w3)+b3
loss=tf.reduce_mean(tf.square(y_train-y))
variables=[w1,b1,w2,b2,w3,b3]
if(epoch%20==0):
print(epoch,float(loss))
xx,yy=np.mgrid[-3:3:0.1,-3:3:0.1];
print(xx,yy)
grid=np.c_[xx.ravel(),yy.ravel()]
grid=tf.cast(grid,tf.float32)
probs=[]
for x_test in grid:
h1=tf.matmul([x_test],w1)+b1
h1=tf.nn.relu(h1)
h2=tf.matmul(h1,w2)+b2
h2=tf.nn.relu(h2)
y=tf.matmul(h2,w3)+b3
probs.append(y)
x1=x_data[:,0]
x2=x_data[:,1]
probs=np.array(probs).reshape(xx.shape)
plt.scatter(x1,x2,color=np.squeeze(Y_c))
plt.contour(xx,yy,probs,levels=[0.5])
plt.show()


(Continuous updates) I hope you can support! Continue to update network optimization, octagon, CNN and RNN related content!

Posted on Thu, 07 Oct 2021 15:29:54 -0400 by spiceweasel