# Implementation 2: BP algorithm practice | machine learning

Experiment content: implement BP algorithm class and test it with two data sets.

## BP algorithm (step by step)

BP algorithm is mainly composed of forward transfer and backward transfer.

First define the BPNetWork class:

Initialize the function init to generate four basic variables (weight + offset):

```    '''Initialization function init'''
def __init__(self,):
'''
w1,w2 The weights are input layer to hidden layer and hidden layer to output layer respectively; b1,b2 Offset from input layer to hidden layer and hidden layer to output layer respectively; Initialize all to None
'''
self.w1 = None
self.b1 = None
self.w2 = None
self.b2 = None
```

Next is the activation function of BP algorithm, or choose to use sigmoid function (S-type function) for activation:

```    '''Excitation function sigmoid'''
def sigmoid(self,X,):
'''
input:
X(mat):Value before conversion
output:
return(float):Converted value
'''
return 1.0/(1+np.exp(-X))
```

Derivative of excitation function sigmoid function dsigmoid function:

```    '''Excitation function sigmoid Derivative of'''
def dsigmoid(self,X):
'''
input:
X(mat):Value before conversion
output:
return(float):Converted value
'''
m,n = np.shape(X)
out = np.mat(np.zeros((m,n)))
for i in range(m):
for j in range(n):
out[i,j] = self.sigmoid(X[i,j])*(1-self.sigmoid(X[i,j]))
return out
```

Next, there are two steps of forward transmission: calculation from input layer to hidden layer and calculation from hidden layer to output layer;

The calculation from the input layer to the hidden layer is divided into two steps. The first step is to multiply the input matrix and the weight matrix, and add the offset value corresponding to the layer; The second step is to use sigmoid excitation function for excitation:

```    '''Hidden layer hidden_in function'''
def hidden_in(self,X,):
'''
input:
X(mat):Input matrix of input layer
output:
return(mat):Get the results before excitation
'''
X = np.mat(X)
m = np.shape(X)
hidden_input = X*self.w1
for i in range(m):
hidden_input[i,] += self.b1
return hidden_input

'''calculation hidden_out result'''
def hidden_out(self,hidden_input):
'''
input:
hidden_input(mat):Pre incentive mat
output:
return(mat):After incentive mat
'''
hidden_output = self.sigmoid(hidden_input)
return hidden_output
```

The calculation from the hidden layer to the output layer is divided into two steps. The first step is to multiply the output matrix of the hidden layer by the weight matrix, and add the offset value corresponding to the layer; The second step is to use sigmoid excitation function for excitation:

```    '''Output layer output_in function'''
def output_in(self,hidden_output,):
'''
input:
hidden_output(mat):Output result matrix of hidden layer
output:
return(mat):Input matrix of output layer
'''
hidden_output = np.mat(hidden_output)
m = np.shape(hidden_output)
output_in = hidden_output*self.w2
for i in range(m):
output_in[i,] += self.b2
return output_in

'''calculation output_out function'''
def output_out(self,output_in,):
'''
input:
output_input(mat):Input result matrix of output layer
output:
return(mat):Output matrix of output layer
'''
output_out = self.sigmoid(output_in)
return output_out
```

The total forward propagation function forward is:

``` '''Forward propagation function forward'''
def forward(self,X,):
hidden_input = self.hidden_in(X)
hidden_output = self.hidden_out(hidden_input)
predict_in = self.output_in(hidden_output)
predict_out = self.output_out(predict_in)
return hidden_input,hidden_output,predict_in,predict_out
```

The next step is the fit function of reverse transmission to adjust the weight and offset (note that since the value of the weight is [- 1,1], the following operations are performed to generate random numbers between [- 1,1], and then optimize the weight through reverse propagation):

```'''Adjusting weights and offsets fit function'''
def fit(self,X,y,hiddens,outputs,lr=0.01,epochs=100):
m,n = np.shape(X)

self.w1 = np.mat(np.random.random((n,hiddens))*2-1)
self.b1 = np.mat(np.zeros((1,hiddens)))
self.w2 = np.mat(np.random.random((hiddens,outputs))*2-1)
self.b2 = np.mat(np.zeros((1,outputs)))

for epoch in range(epochs):
#             print("training rounds:", epoch)
hidden_input,hidden_output,predict_in,predict_out = self.forward(X)

#Back propagation
#Residual from hidden layer to output layer
delta_output = -np.multiply((y-predict_out),self.dsigmoid(predict_in))
#Residual from input layer to hidden layer
delta_hidden = np.multiply((delta_output*self.w2.T),self.dsigmoid(hidden_input))
#Update weight and offset, i.e. gradient descent
self.w2 = self.w2 -lr*(hidden_output.T*delta_output)#Update w1
self.b2 = self.b2 - lr*np.sum(delta_output,axis=0)*(1.0/m)#Update b1
self.w1 = self.w1 -lr*(X.T*delta_hidden)#Update w2
self.b1 = self.b1 - lr*np.sum(delta_hidden,axis=0)*(1.0/m)#Update b0
```

Finally, the prediction function predict:

```'''Prediction function predict'''
def predict(self,x_test,):
x_test = np.mat(x_test)
hidden_input,hidden_output,predict_in,predict_out = self.forward(x_test)
output = np.array(self.sigmoid(predict_out))
return output

```

### Source code (all)

```import numpy as np
from math import sqrt

class BPNetWork(object):

'''Initialization function init'''
def __init__(self,):
'''
w1,w2 The weights are input layer to hidden layer and hidden layer to output layer respectively; b1,b2 Offset from input layer to hidden layer and hidden layer to output layer respectively; Initialize all to None
'''
self.w1 = None
self.b1 = None
self.w2 = None
self.b2 = None

'''Excitation function sigmoid'''
def sigmoid(self,X,):
'''
input:
X(mat):Value before conversion
output:
return(float):Converted value
'''
return 1.0/(1+np.exp(-X))

'''Excitation function sigmoid Derivative of'''
def dsigmoid(self,X):
'''
input:
X(mat):Value before conversion
output:
return(float):Converted value
'''
m,n = np.shape(X)
out = np.mat(np.zeros((m,n)))
for i in range(m):
for j in range(n):
out[i,j] = self.sigmoid(X[i,j])*(1-self.sigmoid(X[i,j]))
return out

'''Input function of hidden layer'''
def hidden_in(self,X,):
X = np.mat(X)
m = np.shape(X)
hidden_input = X*self.w1
for i in range(m):
hidden_input[i,] += self.b1
return hidden_input

'''Output function of hidden layer'''
def hidden_out(self,hidden_input,):
hidden_output = self.sigmoid(hidden_input)
return hidden_output

'''Input function of output layer'''
def output_in(self,hidden_output,):
hidden_output = np.mat(hidden_output)
m = np.shape(hidden_output)
output_in = hidden_output*self.w2
for i in range(m):
output_in[i,] += self.b2
return output_in

'''Output function of output layer'''
def output_out(self,output_in):
output_out = self.sigmoid(output_in)
return output_out

'''Forward propagation function forward'''
def forward(self,X,):
hidden_input = self.hidden_in(X)
hidden_output = self.hidden_out(hidden_input)
predict_in = self.output_in(hidden_output)
predict_out = self.output_out(predict_in)
return hidden_input,hidden_output,predict_in,predict_out

'''Adjusting weights and offsets fit function'''
def fit(self,X,y,hiddens,outputs,lr=0.01,epochs=100):
m,n = np.shape(X)

self.w1 = np.mat(np.random.random((n,hiddens))*2-1)
self.b1 = np.mat(np.zeros((1,hiddens)))
self.w2 = np.mat(np.random.random((hiddens,outputs))*2-1)
self.b2 = np.mat(np.zeros((1,outputs)))

for epoch in range(epochs):
#             print("training rounds:", epoch)
hidden_input,hidden_output,predict_in,predict_out = self.forward(X)

#Back propagation
#Residual from hidden layer to output layer
delta_output = -np.multiply((y-predict_out),self.dsigmoid(predict_in))
#Residual from input layer to hidden layer
delta_hidden = np.multiply((delta_output*self.w2.T),self.dsigmoid(hidden_input))
#Update weight and offset, i.e. gradient descent
self.w2 = self.w2 -lr*(hidden_output.T*delta_output)#Update w1
self.b2 = self.b2 - lr*np.sum(delta_output,axis=0)*(1.0/m)#Update b1
self.w1 = self.w1 -lr*(X.T*delta_hidden)#Update w2
self.b1 = self.b1 - lr*np.sum(delta_hidden,axis=0)*(1.0/m)#Update b0

'''Prediction function predict'''
def predict(self,x_test,):
x_test = np.mat(x_test)
hidden_input,hidden_output,predict_in,predict_out = self.forward(x_test)
output = np.array(self.sigmoid(predict_out))
return output
```

## Test data set I (iris set)

First, we import the iris dataset:

```#Import dataset iris
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']  #Characteristic attribute
```

We can see the dataset as shown in the figure below: There are 150 pieces of data in this dataset. The first four columns are the characteristic attributes of iris, and the last column is the label of iris. There are three iris setosa, iris versicolor and iris virginica. In order to facilitate the next processing, we need to define them as 0, 1 and 2 respectively:

```X = dataset.iloc[:,:-1].values  #Extract feature attributes
y = dataset.iloc[:,-1].values  #Take out the tag value

#Convert labels to 0, 1, 2
for i in range(len(y)):
if y[i]=='Iris-setosa':
y[i]=0
elif y[i]=='Iris-versicolor':
y[i]=1
elif y[i]=='Iris-virginica':
y[i]=2
```

Note, however, that the label y taken out at this time is 1 × 150. In order to calculate, we need to convert it to 150 × Column vector of 1. Since the tag values are discrete 0, 1 and 2, in order to process them more accurately, we use the unique heat coding model to encode the tag value y, so that the results can take continuous numbers and be transformed into the Darry model:

```y = y.reshape(len(y),1)#Change the shape of y
ohe = OneHotEncoder()#Establish the independent heat coding model
y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format
```

Next, we need to divide the data set into training set and test set. Train is directly used here_ test_ Split method, with the ratio of 4:1 (i.e. 120 training data and 30 test data):

```#Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666)
```

The next step is the most important step. We take the instance bpNet of BPNetWork class and train it. We take 100 hidden layer nodes and three output layer nodes, with a learning rate of 0.01 and a maximum number of iterations of 200. After the training, put the features of the test set into the predict prediction function to obtain the results. Pay attention to converting the single hot code into the original code, and then compare the predicted data with the real test data results to obtain the model training accuracy:

```#Model examples and training
bpNet = BPNetWork()
bpNet.fit(x_train,y_train,100,3,0.01,200)

ypredict= bpNet.predict(x_test)
y_test = np.argmax(y_test,axis =1)
print("Real results",y_test) #Output real results
p = np.argmax(ypredict,axis =1)#Find the position with the highest probability
print("Prediction results",p)  #Output prediction results
acc = np.mean(p==y_test)
print('The accuracy is%.4f'%acc)
```

The results are shown in the figure below: Less iterations will affect the accuracy. You can increase the training times by yourself!

### Source code (all)

```import pandas
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder

#Import dataset iris
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values

for i in range(len(y)):
if y[i]=='Iris-setosa':
y[i]=0
elif y[i]=='Iris-versicolor':
y[i]=1
elif y[i]=='Iris-virginica':
y[i]=2

y = y.reshape(len(y),1)#Change the shape of y
ohe = OneHotEncoder()#Establish the independent heat coding model
y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format

#Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666)

bpNet = BPNetWork()
bpNet.fit(x_train,y_train,100,3,0.01,200)

ypredict= bpNet.predict(x_test)

y_test = np.argmax(y_test,axis =1)
print("Real results",y_test)
p = np.argmax(ypredict,axis =1)#Find the position with the highest probability
print("Prediction results",p)
acc = np.mean(p==y_test)
print('The accuracy is%.4f'%acc)
```

## Test data set II (handwritten numeral set)

The training process of handwritten data set is basically similar to that of iris set. Here are some differences between them.

Acquisition process of handwritten dataset:

```# Load handwritten numeral dataset
# Create characteristic matrix
feature = digits.data
# Create target vector
target = digits.target
```

The data set has many characteristic attributes. The characteristic attributes of the first data output are as follows: Its purpose is to determine which number these data point to, so the tag value is 0 ~ 9, which is a number.

Due to the large amount of data in this data set, we only take the first 200 items for processing, and the ratio of training set to test set is still 4:1.

The training results are shown in the figure below: ### Source code (all)

```from sklearn import datasets

# Create characteristic matrix
feature = digits.data

# Create target vector
target = digits.target

X = feature[:200,:]
y = target[:200]
y = y.reshape(len(y),1)#Change the shape of y
ohe = OneHotEncoder()#Establish the independent heat coding model
y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format

# #Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666)

bpNet = BPNetWork()
bpNet.fit(x_train,y_train,100,10,0.01,200)

ypredict= bpNet.predict(x_test)
y_test = np.argmax(y_test,axis =1)
print("Real data",y_test)
p = np.argmax(ypredict,axis =1)#Find the position with the highest probability
print("Forecast data",p)
acc = np.mean(y_test==p)
print('The accuracy is%.4f'%acc)
```

Posted on Tue, 09 Nov 2021 05:39:22 -0500 by Stryks