Hand in hand, MNIST handwritten numeral recognition

Data set introduction

MNIST data set is a very classic data set in the field of machine learning. It is composed of 60000 training samples and 10000 test samples. Each sample is a 28 * 28 pixel gray handwritten digital picture, which is placed in keras. This paper adopts keras under Tensorflow( Keras Chinese document )Neural network API for network construction.

Before you begin, recall the general workflow of machine learning( ✅ Indicates that it is used in this paper, ❌ (not used in this article)

  1. Define problems and collect data sets( ✅)
  2. Select indicators to measure success( ✅)
  3. Determine the method of evaluation( ✅)
  4. Prepare data( ✅)
  5. Develop better models than benchmarks( ❌)
  6. Scale up the model( ❌)
  7. Model regularization and adjustment parameters( ❌)

On the choice of activation function and loss function of the last layer

Problem typeLast layer activation functionloss function
Second classificationsigmoidbinary_crossentropy
Multi category, single labelsoftmaxcategorical_crosentropy
Multi classification and multi labelsigmoidbinary_crossentropy
Regression to any valuenothingmse
Return to 0 ~ 1sigmoidmse or binary_crossintropy

Let's start with the text ~

1. Data preprocessing

First, import the data using the mnist.load() function
Let's take a look at its source code statement:

def load_data(path='mnist.npz'):
  """Loads the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).

  This is a dataset of 60,000 28x28 grayscale images of the 10 digits,
  along with a test set of 10,000 images.
  More info can be found at the
  [MNIST homepage](http://yann.lecun.com/exdb/mnist/).


  Arguments:
      path: path where to cache the dataset locally
          (relative to `~/.keras/datasets`).

  Returns:
      Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.
      **x_train, x_test**: uint8 arrays of grayscale image data with shapes
        (num_samples, 28, 28).

      **y_train, y_test**: uint8 arrays of digit labels (integers in range 0-9)
        with shapes (num_samples,).
  """

As you can see, it contains the download link of the dataset and the declaration of the dataset size, size and data type, and the function returns two tuples composed of four numpy array s.

Import the dataset and reshape it to the desired shape, and then standardize it.
Among them, to_category () built into keras is one hot coding -- each tag is represented as an all zero vector, and only the element corresponding to the tag index is 1

eg: col=10

[0,1,9]-------->[ [1,0,0,0,0,0,0,0,0,0],
				  [0,1,0,0,0,0,0,0,0,0],
				  [0,0,0,0,0,0,0,0,0,1] ]		

We can implement it manually:

def one_hot(sequences,col):
        resuts=np.zeros((len(sequences),col))
        # for i,sequence in enumerate(sequences):
        #         resuts[i,sequence]=1
        for i in range(len(sequences)):
                for j in range(len(sequences[i])):
                        resuts[i,sequences[i][j]]=1
        return resuts

The following is the preprocessing process

def data_preprocess():
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()
    train_images = train_images.reshape((60000, 28, 28, 1))
    train_images = train_images.astype('float32') / 255
    #print(train_images[0])
    test_images = test_images.reshape((10000, 28, 28, 1))
    test_images = test_images.astype('float32') / 255

    train_labels = to_categorical(train_labels)
    test_labels = to_categorical(test_labels)
    return train_images,train_labels,test_images,test_labels

2. Network construction

Here we build a convolutional neural network, which is a simple linear stack containing some convolution, pooling and full connection. We know that multiple linear layer stacks still realize linear operation, and adding layers does not expand the hypothesis space (all possible linear transformation sets from input data to output data) Therefore, it is necessary to add nonlinear or activation functions. relu is the most commonly used activation function, and prelu and elu can also be used

def build_module():
    model = models.Sequential()
    #The first layer is the convolution layer. The first layer needs to indicate the input_shape
    model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))
    #The second layer is the maximum pool layer
    model.add(layers.MaxPooling2D((2,2)))
    #Third layer convolution layer
    model.add(layers.Conv2D(64, (3,3), activation='relu'))
    #The fourth layer is the maximum pool layer
    model.add(layers.MaxPooling2D((2,2)))
    #Fifth layer convolution layer
    model.add(layers.Conv2D(64, (3,3), activation='relu'))
    #The sixth layer is the Flatten layer, which tiles the 3D tensor into a vector
    model.add(layers.Flatten())
    #The seventh floor is fully connected
    model.add(layers.Dense(64, activation='relu'))
    #The eighth layer, softmax layer, is classified
    model.add(layers.Dense(10, activation='softmax'))
    return model

Use model.summary() to view the built network structure:

3. Network configuration

After the network is built, you need to set up the configuration in a key step. For example: optimizer ——The specific method of parameter updating by network gradient descent loss function ——Measure the distance between the generated value and the target value Evaluation index These can be configured by passing the model.compile() parameter.
Let's take a look at the source code analysis of model.compile()

  def compile(self,
              optimizer='rmsprop',
              loss=None,
              metrics=None,
              loss_weights=None,
              weighted_metrics=None,
              run_eagerly=None,
              steps_per_execution=None,
              **kwargs):
    """Configures the model for training.
parametermeaning
optimizerOptimizer: SGD, Adagrad, Adadelta, Adam, Adamax, Nadam
lossLoss function: mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, mean_squared_logarithmic_error, squared_hinge, hinge, categorical_hinge, logcosh
metricsEvaluate the performance indicators of the model during training and testing. Typical usage: metrics = ['accuracy']

About optimizer

Optimizer: string (optimizer name) or optimizer instance.
String format: for example, use the default parameters of the optimizer
The instance optimizer passes in parameters:

keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
model.compile(optimizer='rmsprop',loss='mean_squared_error')

It is recommended to use the default parameters of the optimizer (except the learning rate lr, which can be adjusted freely)
Parameters:

lr: float >= 0. Learning rate.
rho: float >= 0. RMSProp Decay rate of moving mean of gradient square.
epsilon: float >= 0. Fuzzy factor. if it is None, Default to K.epsilon(). 
decay: float >= 0. Learning rate attenuation value after each parameter update.

Similarly, there are many optimizers, such as SGD, Adagrad, Adadelta, Adam, Adamax, Nadam, etc
How to choose so many optimizers? Summary and comparison of various optimizers

On loss function

Depending on the specific task, generally speaking, the loss function should be able to describe the task well. For example

  1. Regression problem
    It is hoped that the output value of the neural network is closer to the ground truth, and it should be more appropriate to select the loss that can describe the distance, such as L1 Loss, MSE Loss, etc
  2. classification problem
    It is hoped that the output category of the neural network is consistent with the category of ground truth. It should be more appropriate to select the loss that can describe the category distribution, such as cross_entropy
    For specific common choices, see the selection of loss function at the beginning of the article

About indicators

For general use, just check the above list. Let's talk about the user-defined evaluation function: it should be passed in during compilation. The function needs to take (y_true, y_pred) as the input parameter and return a tensor as the output result.

import keras.backend as K
def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

4. Network training and testing

  1. Training (fitting)
    Use model.fit(), which can accept a list of parameters
def fit(self,
          x=None,
          y=None,
          batch_size=None,
          epochs=1,
          verbose=1,
          callbacks=None,
          validation_split=0.,
          validation_data=None,
          shuffle=True,
          class_weight=None,
          sample_weight=None,
          initial_epoch=0,
          steps_per_epoch=None,
          validation_steps=None,
          validation_batch_size=None,
          validation_freq=1,
          max_queue_size=10,
          workers=1,
          use_multiprocessing=False):

This source code has more than 300 presidents, and the specific interpretation will be put next time.
We divide the training data, transfer the network in a small batch of 64 samples, and iterate all the data for 5 times

model.fit(train_images, train_labels, epochs = 5, batch_size=64)

5. Draw the variation diagram of loss and accuracy with epochs

model.fit() returns a history object, which contains a history member and records all the data of the training process.
We use matplotlib.pyplot for drawing. See the complete code below for details.

Returns:
        A `History` object. Its `History.history` attribute is
        a record of training loss values and metrics values
        at successive epochs, as well as validation loss values
        and validation metrics values (if applicable).

Click me - Matplotlib rookie tutorial

def draw_loss(history):
    loss=history.history['loss']
    epochs=range(1,len(loss)+1)
    plt.subplot(1,2,1)#First picture
    plt.plot(epochs,loss,'bo',label='Training loss')
    plt.title("Training loss")
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    plt.subplot(1,2,2)#Second picture
    accuracy=history.history['accuracy']
    plt.plot(epochs,accuracy,'bo',label='Training accuracy')
    plt.title("Training accuracy")
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.suptitle("Train data")
    plt.legend()
    plt.show()

6. Complete code

from tensorflow.keras.datasets import mnist
from tensorflow.keras import models
from tensorflow.keras import layers
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt
import numpy as np
def data_preprocess():
    (train_images, train_labels), (test_images, test_labels) = mnist.load_data()
    train_images = train_images.reshape((60000, 28, 28, 1))
    train_images = train_images.astype('float32') / 255
    #print(train_images[0])
    test_images = test_images.reshape((10000, 28, 28, 1))
    test_images = test_images.astype('float32') / 255

    train_labels = to_categorical(train_labels)
    test_labels = to_categorical(test_labels)
    return train_images,train_labels,test_images,test_labels

#Build network
def build_module():
    model = models.Sequential()
    #First layer convolution layer
    model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)))
    #The second layer is the maximum pool layer
    model.add(layers.MaxPooling2D((2,2)))
    #Third layer convolution layer
    model.add(layers.Conv2D(64, (3,3), activation='relu'))
    #The fourth layer is the maximum pool layer
    model.add(layers.MaxPooling2D((2,2)))
    #Fifth layer convolution layer
    model.add(layers.Conv2D(64, (3,3), activation='relu'))
    #The sixth layer is the Flatten layer, which tiles the 3D tensor into a vector
    model.add(layers.Flatten())
    #The seventh floor is fully connected
    model.add(layers.Dense(64, activation='relu'))
    #The eighth layer, softmax layer, is classified
    model.add(layers.Dense(10, activation='softmax'))
    return model
def draw_loss(history):
    loss=history.history['loss']
    epochs=range(1,len(loss)+1)
    plt.subplot(1,2,1)#First picture
    plt.plot(epochs,loss,'bo',label='Training loss')
    plt.title("Training loss")
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    plt.subplot(1,2,2)#Second picture
    accuracy=history.history['accuracy']
    plt.plot(epochs,accuracy,'bo',label='Training accuracy')
    plt.title("Training accuracy")
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.suptitle("Train data")
    plt.legend()
    plt.show()
if __name__=='__main__':
    train_images,train_labels,test_images,test_labels=data_preprocess()
    model=build_module()
    print(model.summary())
    model.compile(optimizer='rmsprop', loss = 'categorical_crossentropy', metrics=['accuracy'])
    history=model.fit(train_images, train_labels, epochs = 5, batch_size=64)
    draw_loss(history)
    test_loss, test_acc = model.evaluate(test_images, test_labels)
    print('test_loss=',test_loss,'  test_acc = ', test_acc)


Changes of loss and accuracy during iterative training


Because the data set is relatively simple, the accuracy of random neural network design in the test set can reach 99.2%
END!

Tags: Python Machine Learning Deep Learning

Posted on Mon, 29 Nov 2021 23:22:18 -0500 by kraadde