Experiment content: implement BP algorithm class and test it with two data sets.
BP algorithm (step by step)
BP algorithm is mainly composed of forward transfer and backward transfer.
First define the BPNetWork class:
Initialize the function init to generate four basic variables (weight + offset):
'''Initialization function init''' def __init__(self,): ''' w1,w2 The weights are input layer to hidden layer and hidden layer to output layer respectively; b1,b2 Offset from input layer to hidden layer and hidden layer to output layer respectively; Initialize all to None ''' self.w1 = None self.b1 = None self.w2 = None self.b2 = None
Next is the activation function of BP algorithm, or choose to use sigmoid function (S-type function) for activation:
'''Excitation function sigmoid''' def sigmoid(self,X,): ''' input: X(mat):Value before conversion output: return(float):Converted value ''' return 1.0/(1+np.exp(-X))
Derivative of excitation function sigmoid function dsigmoid function:
'''Excitation function sigmoid Derivative of''' def dsigmoid(self,X): ''' input: X(mat):Value before conversion output: return(float):Converted value ''' m,n = np.shape(X) out = np.mat(np.zeros((m,n))) for i in range(m): for j in range(n): out[i,j] = self.sigmoid(X[i,j])*(1-self.sigmoid(X[i,j])) return out
Next, there are two steps of forward transmission: calculation from input layer to hidden layer and calculation from hidden layer to output layer;
The calculation from the input layer to the hidden layer is divided into two steps. The first step is to multiply the input matrix and the weight matrix, and add the offset value corresponding to the layer; The second step is to use sigmoid excitation function for excitation:
'''Hidden layer hidden_in function''' def hidden_in(self,X,): ''' input: X(mat):Input matrix of input layer output: return(mat):Get the results before excitation ''' X = np.mat(X) m = np.shape(X)[0] hidden_input = X*self.w1 for i in range(m): hidden_input[i,] += self.b1 return hidden_input '''calculation hidden_out result''' def hidden_out(self,hidden_input): ''' input: hidden_input(mat):Pre incentive mat output: return(mat):After incentive mat ''' hidden_output = self.sigmoid(hidden_input) return hidden_output
The calculation from the hidden layer to the output layer is divided into two steps. The first step is to multiply the output matrix of the hidden layer by the weight matrix, and add the offset value corresponding to the layer; The second step is to use sigmoid excitation function for excitation:
'''Output layer output_in function''' def output_in(self,hidden_output,): ''' input: hidden_output(mat):Output result matrix of hidden layer output: return(mat):Input matrix of output layer ''' hidden_output = np.mat(hidden_output) m = np.shape(hidden_output)[0] output_in = hidden_output*self.w2 for i in range(m): output_in[i,] += self.b2 return output_in '''calculation output_out function''' def output_out(self,output_in,): ''' input: output_input(mat):Input result matrix of output layer output: return(mat):Output matrix of output layer ''' output_out = self.sigmoid(output_in) return output_out
The total forward propagation function forward is:
'''Forward propagation function forward''' def forward(self,X,): hidden_input = self.hidden_in(X) hidden_output = self.hidden_out(hidden_input) predict_in = self.output_in(hidden_output) predict_out = self.output_out(predict_in) return hidden_input,hidden_output,predict_in,predict_out
The next step is the fit function of reverse transmission to adjust the weight and offset (note that since the value of the weight is [- 1,1], the following operations are performed to generate random numbers between [- 1,1], and then optimize the weight through reverse propagation):
'''Adjusting weights and offsets fit function''' def fit(self,X,y,hiddens,outputs,lr=0.01,epochs=100): m,n = np.shape(X) self.w1 = np.mat(np.random.random((n,hiddens))*2-1) self.b1 = np.mat(np.zeros((1,hiddens))) self.w2 = np.mat(np.random.random((hiddens,outputs))*2-1) self.b2 = np.mat(np.zeros((1,outputs))) for epoch in range(epochs): # print("training rounds:", epoch) hidden_input,hidden_output,predict_in,predict_out = self.forward(X) #Back propagation #Residual from hidden layer to output layer delta_output = -np.multiply((y-predict_out),self.dsigmoid(predict_in)) #Residual from input layer to hidden layer delta_hidden = np.multiply((delta_output*self.w2.T),self.dsigmoid(hidden_input)) #Update weight and offset, i.e. gradient descent self.w2 = self.w2 -lr*(hidden_output.T*delta_output)#Update w1 self.b2 = self.b2 - lr*np.sum(delta_output,axis=0)*(1.0/m)#Update b1 self.w1 = self.w1 -lr*(X.T*delta_hidden)#Update w2 self.b1 = self.b1 - lr*np.sum(delta_hidden,axis=0)*(1.0/m)#Update b0
Finally, the prediction function predict:
'''Prediction function predict''' def predict(self,x_test,): x_test = np.mat(x_test) hidden_input,hidden_output,predict_in,predict_out = self.forward(x_test) output = np.array(self.sigmoid(predict_out)) return output
Source code (all)
import numpy as np from math import sqrt class BPNetWork(object): '''Initialization function init''' def __init__(self,): ''' w1,w2 The weights are input layer to hidden layer and hidden layer to output layer respectively; b1,b2 Offset from input layer to hidden layer and hidden layer to output layer respectively; Initialize all to None ''' self.w1 = None self.b1 = None self.w2 = None self.b2 = None '''Excitation function sigmoid''' def sigmoid(self,X,): ''' input: X(mat):Value before conversion output: return(float):Converted value ''' return 1.0/(1+np.exp(-X)) '''Excitation function sigmoid Derivative of''' def dsigmoid(self,X): ''' input: X(mat):Value before conversion output: return(float):Converted value ''' m,n = np.shape(X) out = np.mat(np.zeros((m,n))) for i in range(m): for j in range(n): out[i,j] = self.sigmoid(X[i,j])*(1-self.sigmoid(X[i,j])) return out '''Input function of hidden layer''' def hidden_in(self,X,): X = np.mat(X) m = np.shape(X)[0] hidden_input = X*self.w1 for i in range(m): hidden_input[i,] += self.b1 return hidden_input '''Output function of hidden layer''' def hidden_out(self,hidden_input,): hidden_output = self.sigmoid(hidden_input) return hidden_output '''Input function of output layer''' def output_in(self,hidden_output,): hidden_output = np.mat(hidden_output) m = np.shape(hidden_output)[0] output_in = hidden_output*self.w2 for i in range(m): output_in[i,] += self.b2 return output_in '''Output function of output layer''' def output_out(self,output_in): output_out = self.sigmoid(output_in) return output_out '''Forward propagation function forward''' def forward(self,X,): hidden_input = self.hidden_in(X) hidden_output = self.hidden_out(hidden_input) predict_in = self.output_in(hidden_output) predict_out = self.output_out(predict_in) return hidden_input,hidden_output,predict_in,predict_out '''Adjusting weights and offsets fit function''' def fit(self,X,y,hiddens,outputs,lr=0.01,epochs=100): m,n = np.shape(X) self.w1 = np.mat(np.random.random((n,hiddens))*2-1) self.b1 = np.mat(np.zeros((1,hiddens))) self.w2 = np.mat(np.random.random((hiddens,outputs))*2-1) self.b2 = np.mat(np.zeros((1,outputs))) for epoch in range(epochs): # print("training rounds:", epoch) hidden_input,hidden_output,predict_in,predict_out = self.forward(X) #Back propagation #Residual from hidden layer to output layer delta_output = -np.multiply((y-predict_out),self.dsigmoid(predict_in)) #Residual from input layer to hidden layer delta_hidden = np.multiply((delta_output*self.w2.T),self.dsigmoid(hidden_input)) #Update weight and offset, i.e. gradient descent self.w2 = self.w2 -lr*(hidden_output.T*delta_output)#Update w1 self.b2 = self.b2 - lr*np.sum(delta_output,axis=0)*(1.0/m)#Update b1 self.w1 = self.w1 -lr*(X.T*delta_hidden)#Update w2 self.b1 = self.b1 - lr*np.sum(delta_hidden,axis=0)*(1.0/m)#Update b0 '''Prediction function predict''' def predict(self,x_test,): x_test = np.mat(x_test) hidden_input,hidden_output,predict_in,predict_out = self.forward(x_test) output = np.array(self.sigmoid(predict_out)) return output
Test data set I (iris set)
First, we import the iris dataset:
#Import dataset iris url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] #Characteristic attribute dataset = pandas.read_csv(url, names=names) #Read csv data
We can see the dataset as shown in the figure below:
There are 150 pieces of data in this dataset. The first four columns are the characteristic attributes of iris, and the last column is the label of iris. There are three iris setosa, iris versicolor and iris virginica. In order to facilitate the next processing, we need to define them as 0, 1 and 2 respectively:
X = dataset.iloc[:,:-1].values #Extract feature attributes y = dataset.iloc[:,-1].values #Take out the tag value #Convert labels to 0, 1, 2 for i in range(len(y)): if y[i]=='Iris-setosa': y[i]=0 elif y[i]=='Iris-versicolor': y[i]=1 elif y[i]=='Iris-virginica': y[i]=2
Note, however, that the label y taken out at this time is 1 × 150. In order to calculate, we need to convert it to 150 × Column vector of 1. Since the tag values are discrete 0, 1 and 2, in order to process them more accurately, we use the unique heat coding model to encode the tag value y, so that the results can take continuous numbers and be transformed into the Darry model:
y = y.reshape(len(y),1)#Change the shape of y ohe = OneHotEncoder()#Establish the independent heat coding model y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format
Next, we need to divide the data set into training set and test set. Train is directly used here_ test_ Split method, with the ratio of 4:1 (i.e. 120 training data and 30 test data):
#Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666 x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666)
The next step is the most important step. We take the instance bpNet of BPNetWork class and train it. We take 100 hidden layer nodes and three output layer nodes, with a learning rate of 0.01 and a maximum number of iterations of 200. After the training, put the features of the test set into the predict prediction function to obtain the results. Pay attention to converting the single hot code into the original code, and then compare the predicted data with the real test data results to obtain the model training accuracy:
#Model examples and training bpNet = BPNetWork() bpNet.fit(x_train,y_train,100,3,0.01,200) ypredict= bpNet.predict(x_test) y_test = np.argmax(y_test,axis =1) print("Real results",y_test) #Output real results p = np.argmax(ypredict,axis =1)#Find the position with the highest probability print("Prediction results",p) #Output prediction results acc = np.mean(p==y_test) print('The accuracy is%.4f'%acc)
The results are shown in the figure below:
Less iterations will affect the accuracy. You can increase the training times by yourself!
Source code (all)
import pandas from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder #Import dataset iris url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'] dataset = pandas.read_csv(url, names=names) #Read csv data dataset.head() X = dataset.iloc[:,:-1].values y = dataset.iloc[:,-1].values for i in range(len(y)): if y[i]=='Iris-setosa': y[i]=0 elif y[i]=='Iris-versicolor': y[i]=1 elif y[i]=='Iris-virginica': y[i]=2 y = y.reshape(len(y),1)#Change the shape of y ohe = OneHotEncoder()#Establish the independent heat coding model y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format #Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666 x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666) bpNet = BPNetWork() bpNet.fit(x_train,y_train,100,3,0.01,200) ypredict= bpNet.predict(x_test) y_test = np.argmax(y_test,axis =1) print("Real results",y_test) p = np.argmax(ypredict,axis =1)#Find the position with the highest probability print("Prediction results",p) acc = np.mean(p==y_test) print('The accuracy is%.4f'%acc)
Test data set II (handwritten numeral set)
The training process of handwritten data set is basically similar to that of iris set. Here are some differences between them.
Acquisition process of handwritten dataset:
# Load handwritten numeral dataset digits = datasets.load_digits() # Create characteristic matrix feature = digits.data # Create target vector target = digits.target
The data set has many characteristic attributes. The characteristic attributes of the first data output are as follows:
Its purpose is to determine which number these data point to, so the tag value is 0 ~ 9, which is a number.
Due to the large amount of data in this data set, we only take the first 200 items for processing, and the ratio of training set to test set is still 4:1.
The training results are shown in the figure below:
Source code (all)
from sklearn import datasets # Load handwritten numeral dataset digits = datasets.load_digits() # Create characteristic matrix feature = digits.data # Create target vector target = digits.target X = feature[:200,:] y = target[:200] y = y.reshape(len(y),1)#Change the shape of y ohe = OneHotEncoder()#Establish the independent heat coding model y = ohe.fit_transform(y).toarray()#The tag value y is uniquely encoded and converted to ndarray format # #Partition training set and test set, parameter test_ Set size to 0.2, random_state is set to 666 x_train,x_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 666) bpNet = BPNetWork() bpNet.fit(x_train,y_train,100,10,0.01,200) ypredict= bpNet.predict(x_test) y_test = np.argmax(y_test,axis =1) print("Real data",y_test) p = np.argmax(ypredict,axis =1)#Find the position with the highest probability print("Forecast data",p) acc = np.mean(y_test==p) print('The accuracy is%.4f'%acc)