# Definition

Neural network is not a very complex thing. In the simplest network, only simple generalized linear regression and gradient descent are involved.

Application: The application of neural network is very broad and simple, that is, data prediction, data prediction should also be possible, and because the operation topology of the neural network can also be used for classification. For example, if we feature a picture and then predict and classify the extracted features, then we can achieve simple image recognition and so on (convolution neural network and so on, of course).

This post describes the simplest neural network algorithm and manually implements a neural network (based on Python)

# Fundamentals

### Basic Transition

In fact, the simplest process to describe how a neural network works is actually "violent solution"

For a simple example (demonstrating univariate linear regression with the gradient descent algorithm), suppose we have a set of numbers

x = [1 2 3 4 5]

Output Set

y = [1 2 3 4 5]

Let's assume a relationship y = w*x +b

Now all we need to do is know the values of w and b.

The easiest way to do this, you know, is to randomly generate the value of w b, and then we modify the value of w b by error. So in order to calculate the error here, we need a way to calculate the error value and modify the corresponding w b value, for example, we calculate D(y reality, y prediction) b y variance.

Assuming that x = 1 at this point is initial w = 2, B = 1 then after calculation (of course we are actually using the mean of X and the mean of y)

y prediction = 3 Variance = 4

Here we can apply the gradient descent algorithm

This variance is called the loss function loss = (w*x + b - y)^2

Derivatives W and b, respectively, and then we get

w` = w - w partial derivative x step

b` = b - b Partial derivative x step

Then we go into the loop and we end up with w = 1 b = 0

Looking at the whole process, I just seem to show you how to use gradient descent, but if you use a flowchart you will see this

In fact, this is equivalent to a "neural network" with only one node in the middle (hidden layer)

### Basic Neural Network Structure

At this stage, we can introduce the concept of neural network. Previously, we found that when we calculate w b there is a random initial value (which can also start from 0 by default) passing through a node, so for complex fitting, prediction, different initial w b values may lead to different results, and a node seems not rigorous enough. Reference biology (actually, let's say three stinky cobblers are over one Zhuge Liang) We can divide up several nodes, then break up the value of x according to the weight, throw it into the nodes and calculate it, then combine it according to the weight, and finally we get the w b value of each node and finally we put it together again.

That's how complicated it becomes:

Then you find that it doesn't seem enough and you see a lot of online scares

perhaps

That's true, but the general principle is similar to the previous one.

# Manual implementation

Instead of choosing to rewrite one myself, I use a simple Dome implemented by someone else. After all, your handwriting is not as good as the ready-made framework, there are many details to consider, but the basic principles of that sentence are similar.

## target

The neural network we're going to simulate this time is

It should also be understood from the previous examples that the first thing we do is to choose an appropriate (or guessed) fit equation as the hidden layer.

So here it's called the activation function

We choose the most commonly used

def sigmoid (x): return 1/(1+numpy.exp(-x))

This is also an activation function that I use directly in my mathematical modeling (of course, I used the MATLAB toolbox at that time)

## Use activation function

Now that we have chosen an activation function, the next step is to divide the value of X by weight, then bring in the activation function operation, and then output by weight. Of course, we don't know the weight. In fact, our training also determines the weight, determines the weight of each layer, and finally we can export a very complex equation. (That's exactly what I did when I modeled math, because there won't be multiple objective optimization, so I fit the weight relationship between each objective directly through the neural network, get very complex equations, and run the inheritance. http://www.nnetinfo.com/text/show/4 )

The code here is exactly like that in that picture

def feedforward(self,x): h1 = x[0]*self.w1+x[1]*self.w2+self.b1 h1f = sigmoid(h1) h2 = x[0]*self.w3+x[1]*self.w4+self.b2 h2f = sigmoid(h2) o1 = h1f*self.w5+h2f*self.w6+self.b3 of = sigmoid(o1)

Then we initialize by randomly giving a value first and then training later

class nerualnetwo(): def __init__(self): self.w1 = numpy.random.normal() self.w2 = numpy.random.normal() self.w3 = numpy.random.normal() self.w4 = numpy.random.normal() self.w5 = numpy.random.normal() self.w6 = numpy.random.normal() self.b1 = numpy.random.normal() self.b2 = numpy.random.normal() self.b3 = numpy.random.normal()

### loss function

This is a comparison with the previous example, how do I know that w b is a good thing, so it needs to be corrected, or the variance is used here

def mse_loss(y_tr,y_pre): return((y_tr - y_pre)**2).mean()

### Feedback optimization (gradient-based descent)

In fact, this optimization of the weight of each node does not necessarily need to be reduced by a gradient, but it is the simplest and most understandable to achieve. (and I can write it)

Derivatives are needed here for gradient descent. But it's a little different here

We have two levels of gradient descent:

Input to Implicit Layer

Implicit Layer to Output

So this block has two functions that require derivation

def der_sigmoid(x): return sigmoid(x)*(1-sigmoid(x)) der_L_y_pre = -2*(y_tr-y_pre)

Notice here that -2*(y_tr-y_pre) is actually y_ Pre, i.e. y Prediction Derivative

This is because there are two whole gradient descents in tandem

self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1

self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1

Where:der_h1_b1 = der_sigmoid(valcell[0])

*der_h1_w1 = der_sigmoid(valcell[0])x[0]

That is, the partial derivative pairs w b

As far as valcell is h1,h1f,h2,h2f,o1,of

Want to bring in?

### Gradient correction

That's fine. Actually, the whole thing is going into a cycle, but here's a direct calculation of 1000 times

Here's also the core of the whole operation

def train(self,data,all_y_tr): epochs = 1000 learn_rate = 0.1 for i in range(epochs): for x , y_tr in zip(data,all_y_tr): valcell = self.feedforward(x) y_pre = valcell[5] der_L_y_pre = -2*(y_tr-y_pre) der_y_pre_h1 = der_sigmoid(valcell[4])*self.w5 der_y_pre_h2 = der_sigmoid(valcell[4])*self.w6 der_h1_w1 = der_sigmoid(valcell[0])*x[0] der_h1_w2 = der_sigmoid(valcell[0])*x[1] der_h2_w3 = der_sigmoid(valcell[2])*x[0] der_h2_w4 = der_sigmoid(valcell[2])*x[1] der_y_pre_w5 = der_sigmoid(valcell[4])*valcell[1] der_y_pre_w6 = der_sigmoid(valcell[4])*valcell[3] der_y_pre_b3 = der_sigmoid(valcell[4]) der_h1_b1 = der_sigmoid(valcell[0]) der_h2_b2 = der_sigmoid(valcell[2]) #Re-weighting and offset self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1 self.w2 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w2 self.w3 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w3 self.w4 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w4 self.w5 -= learn_rate * der_L_y_pre * der_y_pre_w5 self.w6 -= learn_rate * der_L_y_pre * der_y_pre_w6 self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1 self.b2 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_b2 self.b3 -= learn_rate * der_L_y_pre *der_y_pre_b3 #Output current loss value every 10 steps if i % 10 ==0 : y_pred = numpy.apply_along_axis(self.simulate,1,data) loss = mse_loss (all_y_tr , y_pred) print(i,loss)

Now when we have trained, we know the weights of each layer and can participate in the calculation.

### Operational Functions

def simulate (self,x): h1 = x[0]*self.w1+x[1]*self.w2+self.b1 h1f = sigmoid(h1) h2 = x[0]*self.w3+x[1]*self.w4+self.b2 h2f = sigmoid(h2) o1 = h1f*self.w5+h2f*self.w6+self.b3 of = sigmoid(o1)

This is actually an operation model. Once you have trained the weights of the model, you will know that it is good to bring them into the equation.

## Overall Code

import numpy def sigmoid (x): return 1/(1+numpy.exp(-x)) def der_sigmoid(x): return sigmoid(x)*(1-sigmoid(x)) def mse_loss(y_tr,y_pre): return((y_tr - y_pre)**2).mean() class nerualnetwo(): def __init__(self): self.w1 = numpy.random.normal() self.w2 = numpy.random.normal() self.w3 = numpy.random.normal() self.w4 = numpy.random.normal() self.w5 = numpy.random.normal() self.w6 = numpy.random.normal() self.b1 = numpy.random.normal() self.b2 = numpy.random.normal() self.b3 = numpy.random.normal() def feedforward(self,x): h1 = x[0]*self.w1+x[1]*self.w2+self.b1 h1f = sigmoid(h1) h2 = x[0]*self.w3+x[1]*self.w4+self.b2 h2f = sigmoid(h2) o1 = h1f*self.w5+h2f*self.w6+self.b3 of = sigmoid(o1) return h1,h1f,h2,h2f,o1,of def simulate (self,x): h1 = x[0]*self.w1+x[1]*self.w2+self.b1 h1f = sigmoid(h1) h2 = x[0]*self.w3+x[1]*self.w4+self.b2 h2f = sigmoid(h2) o1 = h1f*self.w5+h2f*self.w6+self.b3 of = sigmoid(o1) return of def train(self,data,all_y_tr): epochs = 1000 learn_rate = 0.1 for i in range(epochs): for x , y_tr in zip(data,all_y_tr): valcell = self.feedforward(x) y_pre = valcell[5] der_L_y_pre = -2*(y_tr-y_pre) der_y_pre_h1 = der_sigmoid(valcell[4])*self.w5 der_y_pre_h2 = der_sigmoid(valcell[4])*self.w6 der_h1_w1 = der_sigmoid(valcell[0])*x[0] der_h1_w2 = der_sigmoid(valcell[0])*x[1] der_h2_w3 = der_sigmoid(valcell[2])*x[0] der_h2_w4 = der_sigmoid(valcell[2])*x[1] der_y_pre_w5 = der_sigmoid(valcell[4])*valcell[1] der_y_pre_w6 = der_sigmoid(valcell[4])*valcell[3] der_y_pre_b3 = der_sigmoid(valcell[4]) der_h1_b1 = der_sigmoid(valcell[0]) der_h2_b2 = der_sigmoid(valcell[2]) self.w1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w1 self.w2 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_w2 self.w3 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w3 self.w4 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_w4 self.w5 -= learn_rate * der_L_y_pre * der_y_pre_w5 self.w6 -= learn_rate * der_L_y_pre * der_y_pre_w6 self.b1 -= learn_rate * der_L_y_pre * der_y_pre_h1 * der_h1_b1 self.b2 -= learn_rate * der_L_y_pre * der_y_pre_h2 * der_h2_b2 self.b3 -= learn_rate * der_L_y_pre *der_y_pre_b3 if i % 10 ==0 : y_pred = numpy.apply_along_axis(self.simulate,1,data) loss = mse_loss (all_y_tr , y_pred) print(i,loss) if __name__ == "__main__": data = numpy.array([[-2, -1],[25, 6],[17, 4],[-15, -6]]) all_y_trues = numpy.array([1,0,0,1]) ner = nerualnetwo() ner.train(data,all_y_trues)

## summary

In fact, the most basic network here is like this, the core is the two things of course there are many details.

Activation function

Automatic weight correction

Select the appropriate activation function, activate it, and then use self-correction to minimize the loss function. Then as the number of layers of the neural network increases, it becomes more complex. Of course, the accuracy is not necessarily proportional to the number of layers, which I have tested.

## Reference resources:

https://zhuanlan.zhihu.com/p/58964140