06 2-Layer Neural Network Exercise-Neural Network and Deep Learning Specialization Series

This article is Deep Learning Specialization Lesson 1 of the series Neural Networks and Deep Learning Learning notes from the Shallow Neural Network exercise section of the book.

Last article 05 Two-layer Neural Network-Neural Network and Deep Learning Specialization Series This time we will do an exercise using classification as an example.

1. References to Libraries

First, this exercise will use a new Python library for machine learning: Scikit-learn, which provides simple and efficient tools for data mining and data analysis. Scikit-learn website There are also many examples in this introduction.

The libraries needed for this experiment include:

import numpy as np
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import sigmoid, load_planar_dataset

The planar_utils module contains the data loading and sigmoid functions needed for this exercise.The data loading function is a two-class dataset that generates a flower shape.

def load_planar_dataset():
    np.random.seed(1)
    m = 400 # number of examples
    N = int(m/2) # number of points per class
    D = 2 # dimensionality
    X = np.zeros((m,D)) # data matrix where each row is a single example
    Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
    a = 4 # maximum ray of the flower

    for j in range(2):
        ix = range(N*j,N*(j+1))
        t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta
        r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius
        X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
        Y[ix] = j
        
    X = X.T
    Y = Y.T

    return X, Y

Data loading requires only calling this function: X, Y = load_planar_dataset()

2. Neural Network Model

This exercise uses a two-layer neural network with a hidden layer, where the input layer has two nodes (since we are an exercise in classifying two-dimensional data), the hidden layer has four nodes and uses the tanh activation function, and the output layer uses the sigmoid function.
The specific network model is shown in the following figure:

The general methods to construct this neural network are:

  1. Define the structure of the neural network, including the number of nodes in the input, hidden and output layers
  2. Initialize model parameters
  3. Forward Propagation
  4. Calculating loss or cost functions
  5. Reverse Propagation
  6. Updating parameters by gradient descending method
  7. Building a complete model (cycle)

Here are the steps.

Structure of 2.1 Neural Network

This is a very simple step, which defines three dimensions based on the input data and the number of hidden layer nodes defined: the input layer size n_x, the hidden layer size n_h, and the output layer size n_y.

def layer_sizes(X, Y):
    n_x = X.shape[0] # size of input layer
    n_h = 4 # Custom size, you can change
    n_y = Y.shape[0] # size of output layer
    return (n_x, n_h, n_y)

2.2 Initialization parameters

In the last article, " 05 Two-layer Neural Network-Neural Network and Deep Learning Specialization Series The initialization section of the parameters of the

In a neural network with a hidden layer, we cannot initialize the parameter to zero

Therefore, when the network is initialized, we use Numpy's random number to define the parameter w.For the dimensions of each parameter, a simple explanation is given in the previous article, based on the calculation formula of each layer:

  • z[1](i)=W[1]x(i)+b[1]z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1]}z[1](i)=W[1]x(i)+b[1]
  • z[2](i)=W[2]a[1](i)+b[2]z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}z[2](i)=W[2]a[1](i)+b[2]
    You know,
  • The dimension of W1 should be Hidden Layer Dimension n_h*Input Layer Dimension n_x
  • W2 dimension is output layer size n_y*hidden layer size n_h

The code to initialize the parameter is:

def initialize_parameters(n_x, n_h, n_y):
    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2= np.zeros((n_y, 1))

    parameters = {
        "W1": W1,
        "b1": b1,
        "W2": W2,
        "b2": b2
    }

    return parameters

2.3 Forward Propagation

Forward propagation begins with the input layer and the initialization parameters, calculates the output of the hidden layer and its activation function, and then calculates the final input layer with the following formulas:

  • z[1](i)=W[1]x(i)+b[1]z^{[1] (i)} = W^{[1]} x^{(i)} + b^{[1]}z[1](i)=W[1]x(i)+b[1]
  • a[1](i)=tanh⁡(z[1](i))a^{[1] (i)} = \tanh(z^{[1] (i)})a[1](i)=tanh(z[1](i))
  • z[2](i)=W[2]a[1](i)+b[2]z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2]}z[2](i)=W[2]a[1](i)+b[2]
  • y^(i)=a[2](i)=σ(z[2](i))\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})y^​(i)=a[2](i)=σ(z[2](i))

The inputs of forward propagation are input data X and parameters, and the output is required for the intermediate value of the calculation to be used in subsequent reverse propagation calculations:

def forward_propagation(X, parameters):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    # Implement forward propagation
    Z1 = np.dot(W1, X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)

    cache = {
        "Z1": Z1,
        "A1": A1,
        "Z2": Z2,
        "A2": A2
    }

    return A2, cache

2.4 Loss function

A new loss function, cross entropy error, is used here. The formula is as follows:
E=−∑iyi logaiE = - \sum\limits_{i} y_i\ loga_iE=−i∑​yi​ logai​

def compute_cost(A2, Y):
    logprobs = np.multiply(Y, np.log(A2))
    cost = -np.sum(logprobs)

    cost = float(np.squeeze(cost))

    return cost

The np.squeeze() function is used to remove the single dimension from the shape of the array.For example, change the shape (1, 10) to (10,), which you can refer to Numpy Library Learning - squeeze() Function.

2.5 Reverse Propagation

The calculation of dZ 1 involves g[1]'(Z[1])g^{[1]'}(Z^{[1]})g[1]' (Z^{[1]}) g[1]'(Z[1]). Since tanh activation function is used here, g[1]' (z) =1_a2 g^{[1]'} (z) = 1-a^2g[1]'(z)=1_a2.

def back_propagation(parameters, cache, X, Y):
    m = X.shape[1]

    W1 = parameters["W1"]
    W2 = parameters["W2"]

    A1 = cache["A1"]
    A2 = cache["A2"]

    dZ2 = A2 - Y
    dW2 = (1/m) * np.dot(dZ2, A2.T)
    db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = W2.T * dZ2 * (1 - np.power(A1, 2))
    dW1 = (1/m) * np.dot(dZ1, X.T)
    db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)

    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}
    
    return grads

2.6 Gradient Decreasing Method

The learning rate learning_rate is introduced to update the parameters using the gradient reduction method, and an appropriate parameter is obtained to minimize the loss function.The selection of learning rate is very important. The following chart shows a comparison between a good learning rate and a bad learning rate:


The code for this section is as follows:

def update_parameters(parameters, grads, learning_rate = 1.2):
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    parameters = {
        "W1": W1,
        "b1": b1,
        "W2": W2,
        "b2": b2
    }

    return parameters

2.7 Model Construction

The last step is to combine all the above steps to build a complete neural network.Here, when using the gradient-decreasing method, you need to iterate through the calculation by introducing an iteration number to get a parameter that minimizes the loss function.

def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]
    
    # Initialize parameters
    parameters = initialize_parameters(n_x, n_h, n_y)
    
    # Loop (gradient descent)
    for i in range(0, num_iterations):
        # Forward propagation.
        A2, cache = forward_propagation(X, parameters)
        
        # Cost function.
        cost = compute_cost(A2, Y, parameters)
 
        # Backpropagation.
        grads = backward_propagation(parameters, cache, X, Y)
 
        # Gradient descent parameter update.
        parameters = update_parameters(parameters, grads)
        
        # Print the cost every 1000 iterations
        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters

3. Prediction

Finally, after the optimum parameters are calculated, the existing data can be predicted by forward propagation.

def predict(parameters, X):
    # Computes probabilities using forward propagation
    A2, cache = forward_propagation(X, parameters)
    predictions = np.round(A2) #  Classifies to 0/1 using 0.5 as the threshold
    
    return predictions    

Here, the np.round() function is used to round the output to achieve a classification with a prediction threshold of 0.5:
yprediction(i)={1 ifa[2](i)>0.50 otherwisey^{(i)}_{prediction} = \begin{cases} 1 & \ {if } a^{[2](i)} > 0.5 \\ 0 & \ {otherwise } \end{cases}yprediction(i)​={10​ ifa[2](i)>0.5 otherwise​

Finally, the model is applied to the initial test data (where a hidden layer containing four nodes is still used) and the prediction accuracy is calculated:

parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)

predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y, predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
23 original articles published, 12 praised, 1447 visited
Private letter follow

Tags: network Python

Posted on Fri, 13 Mar 2020 22:37:45 -0400 by imagenesis