## 1 algorithm Introduction

Explanation: Section 1.1 is mainly to summarize and help understand the BP network algorithm principles that take into account the influence factors, that is, the general explanation of BP model training principles (can be skipped according to their own knowledge). Section 1.2 starts with the BP network prediction model based on the influence of historical values.

When using BP network for prediction, there are two main types of models from the perspective of input indexes considered:

1.1 Principle of BP network algorithm affected by related indexes

As shown in Figure 1, when training BP using the newff function of MATLAB, you can see that in most cases there are three layers of neural networks (input layer, hidden layer, output layer). This helps to understand the principles of neural networks:

1) Input layer: equivalent to the human five senses, five senses to obtain external information, corresponding to the neural network model input port to receive input data process.

2) Implicit layer: corresponding to the human brain, the brain analyses and considers the data passed by the five senses. The hidden Layer of the neural network maps the data x from the input layer, which is simply understood as a formula hiddenLayer_output=F(w*x+b). W and B are called weight, threshold parameter, F() To map rules, also known as activation functions, hiddenLayer_output is the output value of the implied layer's mapping of the incoming data. In other words, the implied layer maps the influencing factor data X of the input to produce the mapping value.

3) Output layer: the brain can control the limbs to perform actions (respond externally) after thinking about the information from the five senses (mapping to the implied layer).Similarly, the output layer of the BP network maps hiddenLayer_output again, outputLayer_output=w *hiddenLayer_output+b. w, B are weight, threshold parameters, and outputLayer_output is the output value (also called simulation value, prediction value) of the output layer of the BP network (understood as the human brain's outward actions, such as baby tapping the table).

4) Gradient descent algorithm: By calculating the deviation between the output Layer_output and the y-value passed in from the neural network model, the algorithm is used to adjust the weights and thresholds accordingly. This process can be understood as the baby tapping the table, skewing, and adjusting the body according to the distance of the deviation so that the arm that is waving again keeps close to the table and eventually hits.

Take another example to further understand:

The BP network shown in Figure 1 has input layer, implied layer and output layer. How does BP use these three layers to implement output Layer_output of output layer, approaching the given y value continuously, so as to train a precise model?

From the ports connected in the diagram, you can think of a process: Take the subway and imagine the picture as a subway line. One day Wang Mou went home by subway: he got on the train at the input starting station, went through many stations halfway (hiddenLayer), and then found that he overtook (output Layer corresponds to the current location), so he would leave home according to his current location (Target)The distance (error Error) returns to the midway station (hiddenLayer) to take the Metro again (error reverse transmission, updating w and b with gradient descent algorithm), and if Wang Mou makes another mistake, the adjustment process will be carried out again.

From the example of baby beating the table and Wang Mou taking the metro, think about the problem: the complete training of BP needs to pass in the data to input first, then through the mapping of the hidden layer, the output layer gets the BP simulation value, according to the error between the simulation value and the target value, adjust the parameters, so that the simulation value keeps approaching the target value. For example, (1) The baby is subject to external interference factors (x)(2) Wang Mou's boarding point (x), crossing the station (predict), returning to the midway station continuously to adjust position and get home (y, Target).

In these links, influencing factor data x, target value data y (Target) are involved.According to x, y, BP algorithm is used to find the rules between X and y, and to map x to approximate y. This is the function of BP network algorithm. In addition, the process mentioned above is BP model training, so the final model is accurate, but the rules found (bp network)Is it accurate and reliable? So we give x1 to the trained BP network and get the corresponding BP output value (predicted value)Predict1, the proximity of predict1 and y1 can be compared by mapping, calculating Mse, Mape, R-party and other indicators to know if the model predicts accurately. This is the testing process of BP model, that is, to achieve the prediction of data, and to verify the accuracy of prediction by comparing actual values.

Fig.1 A 3-layer BP network structure diagram

### 1.2 BP network based on the influence of historical values

Taking the power load forecasting problem as an example, this paper makes a distinction between the two models.

One way is to predict the load value at time t by considering the effects of climate factors such as air humidity x1, temperature x2, and holiday x3 at that time. This is the model mentioned in the previous 1.1.

Another way is to think that the change of power load value is time-dependent, for example, that the power load value at t-1, t-2, T-3 is related to the load value at t-time, that is, to satisfy the formula y(t)=F(y(t-1),y(t-2),y(t-3). When using the BP network for training model, the influence factors input to the neural network are the historical load value y(t-1),y(t-2),y(t-3).In particular, 3 is called the autoregressive order or delay. The target output value to the neural network is y(t).

### 1.3 Differential Evolution Algorithm

Differential Evolution(DE) was proposed by Storn et al. in 1995, and others Evolutionary algorithm Like DE, it is a simulation of biological evolution stochastic model By repeating iteration Thus, individuals adapted to the environment are saved. However, compared to evolutionary algorithms, DE retains population-based global search strategies, reduces the complexity of genetic operations by using real-number encoding, simple differential-based mutation, and one-to-one competitive survival strategies. At the same time, DE's unique memory enables it to dynamically track current searches to adjust its performance.Search strategy with strong global convergence Robustness It is suitable for solving optimization problems in complex environments that cannot be solved by conventional mathematical programming methods without the help of the problem's characteristic information. Neuron Network, chemical industry, power, mechanical design, robots, signal processing, biological information, economics, modern agriculture, food safety, environmental protection and operational research.

The DE algorithm is mainly used to solve problems continuous variable Global optimization problems, their main work steps and others evolutionary algorithms Basic consistency, including Mutation, Crossover, and SelectionThe basic idea of the algorithm is to sum the difference vectors of two individuals randomly selected from the population as the source of random change for the third individual, and then sum the difference vectors according to a certain rule with the third individual to produce a mutant individual, which is called a mutation. Then, the mutant individual and a pre-determined target are used.Individuals mix parameters to produce test individuals, a process known as crossover. If a test individual's fitness value is better than that of the target individual, the test individual replaces the target individual in the next generation, otherwise the target individual remains, a process known as selection. In each generation of evolution, each individual vector is treated as the target individual once, and the algorithm passes throughConstantly iterate, retain good individuals, eliminate poor individuals, and guide the search process to the global optimum solution Approach.

Algorithmic illustrations:

Algorithmic pseudocode:

## Part 2 Code

%% Differential Evolution Algorithms for Optimization BP Initial weights and thresholds of neural networks %% Empty environment variables clear all; clc; warning off load v357; load y357; Pn_train=v; Tn_train=y; Pn_test=v; Tn_test=y; P_train=v; T_train=y; % P_train=[0 25.27 44 62.72 81.4 100.2; % 290.5 268.8 247.2 224.5 206 184.4; % 0 16.12 33.25 50.42 67.62 84.73; % 542.5 517.8 493 465.3 435.6 410.8; % 0 11.1 28.1 44.93 61.38 78.57; % 826.1 800.2 769.1 740.0 706.2 669.3]; % T_train=[0 1 2 3 4 5];%The above is unprocessed data % P_test=[0 25.25 43 62.75 81.6 100.7; % 290.3 268.4 247.5 224.6 206 184.2; % 0 16.14 33.26 50.47 67.68 84.79; % 542.7 517.9 495 465.8 435.6 410.9; % 0 11.4 28.6 44.94 61.36 78.59; % 826.3 800.7 769.8 740.5 706.7 669.3]; % T_test=[0 1 2 3 4 5]; % Pn_train=[0 0.252 0.439 0.626 0.813 1 0 0.19 0.392 0.595 0.798 1 0 0.141 0.358 0.572 0.781 1; % 1 0.795 0.592 0.378 0.204 0 1 0.815 0.626 0.415 0.189 0 1 0.835 0.637 0.451 0.235 0]; % %T To target vector, normalized data % Tn_train=[0.05,0.23,0.41,0.59,0.77,0.95,0.05,0.23,0.41,0.59,0.77,0.95,0.05,0.23,0.41,0.59,0.77,0.95]; % Pn_test=[ 0 0.17 0.39 0.595 0.798 1 0 0.141 0.358 0.572 0.781 1 0 0.258 0.439 0.626 0.813 1; % 1 0.815 0.625 0.415 0.189 0 1 0.835 0.635 0.451 0.235 0 1 0.795 0.599 0.378 0.204 0 ]; % Tn_test=[0.05,0.23,0.41,0.59,0.77,0.95,0.05,0.23,0.41,0.59,0.77,0.95,0.05,0.23,0.41,0.59,0.77,0.95]; %% Parameter Settings S1 = size(Pn_train,1); % Number of neurons in input layer S2 = 6; % Number of Hidden Layer Neurons S3 = size(Tn_train,1); % Number of neurons in the output layer Gm=10; %Maximum number of iterations F0=0.5; %F Is the zoom factor Np=5; %Population size CR=0.5; %Hybridization parameters G=1;%Initialization Algebra N=S1*S2 + S2*S3 + S2 + S3;%The dimension of the problem being solved % Setting network initial weights and thresholds net_optimized.IW{1,1} = W1; net_optimized.LW{2,1} = W2; net_optimized.b{1} = B1; net_optimized.b{2} = B2; % Setting up training parameters net_optimized.trainParam.epochs = 3000; net_optimized.trainParam.show = 100; net_optimized.trainParam.goal = 0.001; net_optimized.trainParam.lr = 0.1; % Training with new weights and thresholds net_optimized = train(net_optimized,Pn_train,Tn_train); %% Simulation test Tn_sim_optimized = sim(net_optimized,Pn_test); % Result comparison result_optimized = [Tn_test' Tn_sim_optimized']; %mean square error E_optimized = mse(Tn_sim_optimized - Tn_test) MAPE_optimized = mean(abs(Tn_sim_optimized-Tn_test)./Tn_sim_optimized)*100 % figure(1) % % plot(T_train,P_train(1,:),'r') % hold on % plot(T_train,P_train(3,:),'y') % hold on % plot(T_train,P_train(5,:),'b') % hold on % grid on % xlabel('Conventional True Value for Standard Equipment (10) KP)'); % ylabel('Output of pressure sensor ( mv)'); % title('Working curve of pressure sensor'); % legend('t=22','t=44','t=70'); figure(2) plot(Tn_train(1:6),Pn_train(1,1:6),'r') hold on plot(Tn_train(7:12),Pn_train(1,7:12),'y') hold on plot(Tn_train(13:18),Pn_train(1,13:18),'b') hold on grid on xlabel('Device convention true value (10) KP)'); ylabel('Output of pressure sensor ( mv)'); title('Working curve of pressure sensor for normalized training samples'); legend('t=22','t=44','t=70'); figure(3) plot(Tn_test(1:6),Pn_test(1,1:6),'r') hold on plot(Tn_test(7:12),Pn_test(1,7:12),'y') hold on plot(Tn_test(13:18),Pn_test(1,13:18),'b') hold on grid on xlabel('Device convention true value (10) KP)'); ylabel('Output of pressure sensor ( mv)'); title('The working curve of the normalized test sample pressure sensor'); legend('t=22','t=44','t=70'); figure(4) plot(Tn_test(1:6),Tn_sim_optimized(1:6),'r')%output DE-BP Curves of simulation results hold on plot(Tn_test(7:12),Tn_sim_optimized(7:12),'y') hold on plot(Tn_test(13:18),Tn_sim_optimized(13:18),'b') hold on xlabel('Conventional True Value (10) KP)'); ylabel('Output of pressure sensor ( mv)'); title('DE-BP Working curve of pressure sensor'); legend('t=22','t=44','t=70'); grid on %% Unoptimized BP neural network %net = newff(Pn_train,Tn_train,S2); net=newff(minmax(Pn_train),[6,1],{'logsig','purelin'},'traingdm');%Hidden layer neurons S Type tangent,output layer S Type logarithm, momentum gradient descent training BP network, % Setting up training parameters net.trainParam.epochs = 3000; net.trainParam.show = 100; net.trainParam.goal = 0.001; net.trainParam.lr = 0.1; net=init(net); inputWeights=net.IW{1,1};% Current Input Layer Weights and Thresholds inputbias=net.b{1}; layerWeights=net.LW{2,1};% Current Network Layer Weights and Thresholds layerbias=net.b{2} % Training with new weights and thresholds net = train(net,Pn_train,Tn_train); %% Simulation test Tn_sim = sim(net,Pn_test); %% Result comparison result = [Tn_test' Tn_sim']; % mean square error E1 = mse(Tn_sim - Tn_test) MAPE1= mean(abs(Tn_sim-Tn_test)./Tn_sim)*100 % end % figure(4) % plot(T_train,P_train(1,:),'r') % hold on % plot(T_train,P_train(3,:),'y') % hold on % plot(T_train,P_train(5,:),'b') % hold on % grid on % xlabel('Conventional True Value for Standard Equipment (10) KP)'); % ylabel('Output of pressure sensor ( mv)'); % title('Working curve of pressure sensor'); % legend('t=22','t=44','t=70'); figure(5) plot(Tn_test(1:6),Tn_sim(1:6),'r')%output BP Curves of simulation results hold on plot(Tn_test(7:12),Tn_sim(7:12),'y') hold on plot(Tn_test(13:18),Tn_sim(13:18),'b') hold on xlabel('Conventional True Value (10) KP)'); ylabel('Output of pressure sensor ( mv)'); title('BP Working curve of pressure sensor'); legend('t=22','t=44','t=70'); grid on

## 3 Simulation results

## 4 References

[1] Niu Qing, Cao Aimin, Chen Xiaoyi, Zhou Dong. Short-term load prediction based on flower pollination algorithm and BP network [J]. Power Grid and Clean Energy, 2020,36(10): 28-32.