Data set introduction
MNIST data set is a very classic data set in the field of machine learning. It is composed of 60000 training samples and 10000 test samples. Each sample is a 28 * 28 pixel gray handwritten digital picture, which is placed in keras. This paper adopts keras under Tensorflow( Keras Chinese document )Neural network API for network construction.
Before you begin, recall the general workflow of machine learning( ✅ Indicates that it is used in this paper, ❌ (not used in this article)
- Define problems and collect data sets( ✅)
- Select indicators to measure success( ✅)
- Determine the method of evaluation( ✅)
- Prepare data( ✅)
- Develop better models than benchmarks( ❌)
- Scale up the model( ❌)
- Model regularization and adjustment parameters( ❌)
On the choice of activation function and loss function of the last layer
Problem type | Last layer activation function | loss function |
---|---|---|
Second classification | sigmoid | binary_crossentropy |
Multi category, single label | softmax | categorical_crosentropy |
Multi classification and multi label | sigmoid | binary_crossentropy |
Regression to any value | nothing | mse |
Return to 0 ~ 1 | sigmoid | mse or binary_crossintropy |
Let's start with the text ~
1. Data preprocessing
First, import the data using the mnist.load() function
Let's take a look at its source code statement:
def load_data(path='mnist.npz'): """Loads the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). This is a dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images. More info can be found at the [MNIST homepage](http://yann.lecun.com/exdb/mnist/). Arguments: path: path where to cache the dataset locally (relative to `~/.keras/datasets`). Returns: Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`. **x_train, x_test**: uint8 arrays of grayscale image data with shapes (num_samples, 28, 28). **y_train, y_test**: uint8 arrays of digit labels (integers in range 0-9) with shapes (num_samples,). """
As you can see, it contains the download link of the dataset and the declaration of the dataset size, size and data type, and the function returns two tuples composed of four numpy array s.
Import the dataset and reshape it to the desired shape, and then standardize it.
Among them, to_category () built into keras is one hot coding -- each tag is represented as an all zero vector, and only the element corresponding to the tag index is 1
eg: col=10
[0,1,9]-------->[ [1,0,0,0,0,0,0,0,0,0], [0,1,0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,1] ]
We can implement it manually:
def one_hot(sequences,col): resuts=np.zeros((len(sequences),col)) # for i,sequence in enumerate(sequences): # resuts[i,sequence]=1 for i in range(len(sequences)): for j in range(len(sequences[i])): resuts[i,sequences[i][j]]=1 return resuts
The following is the preprocessing process
def data_preprocess(): (train_images, train_labels), (test_images, test_labels) = mnist.load_data() train_images = train_images.reshape((60000, 28, 28, 1)) train_images = train_images.astype('float32') / 255 #print(train_images[0]) test_images = test_images.reshape((10000, 28, 28, 1)) test_images = test_images.astype('float32') / 255 train_labels = to_categorical(train_labels) test_labels = to_categorical(test_labels) return train_images,train_labels,test_images,test_labels
2. Network construction
Here we build a convolutional neural network, which is a simple linear stack containing some convolution, pooling and full connection. We know that multiple linear layer stacks still realize linear operation, and adding layers does not expand the hypothesis space (all possible linear transformation sets from input data to output data) Therefore, it is necessary to add nonlinear or activation functions. relu is the most commonly used activation function, and prelu and elu can also be used
def build_module(): model = models.Sequential() #The first layer is the convolution layer. The first layer needs to indicate the input_shape model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1))) #The second layer is the maximum pool layer model.add(layers.MaxPooling2D((2,2))) #Third layer convolution layer model.add(layers.Conv2D(64, (3,3), activation='relu')) #The fourth layer is the maximum pool layer model.add(layers.MaxPooling2D((2,2))) #Fifth layer convolution layer model.add(layers.Conv2D(64, (3,3), activation='relu')) #The sixth layer is the Flatten layer, which tiles the 3D tensor into a vector model.add(layers.Flatten()) #The seventh floor is fully connected model.add(layers.Dense(64, activation='relu')) #The eighth layer, softmax layer, is classified model.add(layers.Dense(10, activation='softmax')) return model
Use model.summary() to view the built network structure:
3. Network configuration
After the network is built, you need to set up the configuration in a key step. For example: optimizer ——The specific method of parameter updating by network gradient descent loss function ——Measure the distance between the generated value and the target value Evaluation index These can be configured by passing the model.compile() parameter.
Let's take a look at the source code analysis of model.compile()
def compile(self, optimizer='rmsprop', loss=None, metrics=None, loss_weights=None, weighted_metrics=None, run_eagerly=None, steps_per_execution=None, **kwargs): """Configures the model for training.
parameter | meaning |
---|---|
optimizer | Optimizer: SGD, Adagrad, Adadelta, Adam, Adamax, Nadam |
loss | Loss function: mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, mean_squared_logarithmic_error, squared_hinge, hinge, categorical_hinge, logcosh |
metrics | Evaluate the performance indicators of the model during training and testing. Typical usage: metrics = ['accuracy'] |
About optimizer
Optimizer: string (optimizer name) or optimizer instance.
String format: for example, use the default parameters of the optimizer
The instance optimizer passes in parameters:
keras.optimizers.RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0) model.compile(optimizer='rmsprop',loss='mean_squared_error')
It is recommended to use the default parameters of the optimizer (except the learning rate lr, which can be adjusted freely)
Parameters:
lr: float >= 0. Learning rate. rho: float >= 0. RMSProp Decay rate of moving mean of gradient square. epsilon: float >= 0. Fuzzy factor. if it is None, Default to K.epsilon(). decay: float >= 0. Learning rate attenuation value after each parameter update.
Similarly, there are many optimizers, such as SGD, Adagrad, Adadelta, Adam, Adamax, Nadam, etc
How to choose so many optimizers? Summary and comparison of various optimizers
On loss function
Depending on the specific task, generally speaking, the loss function should be able to describe the task well. For example
- Regression problem
It is hoped that the output value of the neural network is closer to the ground truth, and it should be more appropriate to select the loss that can describe the distance, such as L1 Loss, MSE Loss, etc - classification problem
It is hoped that the output category of the neural network is consistent with the category of ground truth. It should be more appropriate to select the loss that can describe the category distribution, such as cross_entropy
For specific common choices, see the selection of loss function at the beginning of the article
About indicators
For general use, just check the above list. Let's talk about the user-defined evaluation function: it should be passed in during compilation. The function needs to take (y_true, y_pred) as the input parameter and return a tensor as the output result.
import keras.backend as K def mean_pred(y_true, y_pred): return K.mean(y_pred) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy', mean_pred])
4. Network training and testing
- Training (fitting)
Use model.fit(), which can accept a list of parameters
def fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0., validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False):
This source code has more than 300 presidents, and the specific interpretation will be put next time.
We divide the training data, transfer the network in a small batch of 64 samples, and iterate all the data for 5 times
model.fit(train_images, train_labels, epochs = 5, batch_size=64)
5. Draw the variation diagram of loss and accuracy with epochs
model.fit() returns a history object, which contains a history member and records all the data of the training process.
We use matplotlib.pyplot for drawing. See the complete code below for details.
Returns: A `History` object. Its `History.history` attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable).
Click me - Matplotlib rookie tutorial
def draw_loss(history): loss=history.history['loss'] epochs=range(1,len(loss)+1) plt.subplot(1,2,1)#First picture plt.plot(epochs,loss,'bo',label='Training loss') plt.title("Training loss") plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend() plt.subplot(1,2,2)#Second picture accuracy=history.history['accuracy'] plt.plot(epochs,accuracy,'bo',label='Training accuracy') plt.title("Training accuracy") plt.xlabel('Epochs') plt.ylabel('Accuracy') plt.suptitle("Train data") plt.legend() plt.show()
6. Complete code
from tensorflow.keras.datasets import mnist from tensorflow.keras import models from tensorflow.keras import layers from tensorflow.keras.utils import to_categorical import matplotlib.pyplot as plt import numpy as np def data_preprocess(): (train_images, train_labels), (test_images, test_labels) = mnist.load_data() train_images = train_images.reshape((60000, 28, 28, 1)) train_images = train_images.astype('float32') / 255 #print(train_images[0]) test_images = test_images.reshape((10000, 28, 28, 1)) test_images = test_images.astype('float32') / 255 train_labels = to_categorical(train_labels) test_labels = to_categorical(test_labels) return train_images,train_labels,test_images,test_labels #Build network def build_module(): model = models.Sequential() #First layer convolution layer model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1))) #The second layer is the maximum pool layer model.add(layers.MaxPooling2D((2,2))) #Third layer convolution layer model.add(layers.Conv2D(64, (3,3), activation='relu')) #The fourth layer is the maximum pool layer model.add(layers.MaxPooling2D((2,2))) #Fifth layer convolution layer model.add(layers.Conv2D(64, (3,3), activation='relu')) #The sixth layer is the Flatten layer, which tiles the 3D tensor into a vector model.add(layers.Flatten()) #The seventh floor is fully connected model.add(layers.Dense(64, activation='relu')) #The eighth layer, softmax layer, is classified model.add(layers.Dense(10, activation='softmax')) return model def draw_loss(history): loss=history.history['loss'] epochs=range(1,len(loss)+1) plt.subplot(1,2,1)#First picture plt.plot(epochs,loss,'bo',label='Training loss') plt.title("Training loss") plt.xlabel('Epochs') plt.ylabel('Loss') plt.legend() plt.subplot(1,2,2)#Second picture accuracy=history.history['accuracy'] plt.plot(epochs,accuracy,'bo',label='Training accuracy') plt.title("Training accuracy") plt.xlabel('Epochs') plt.ylabel('Accuracy') plt.suptitle("Train data") plt.legend() plt.show() if __name__=='__main__': train_images,train_labels,test_images,test_labels=data_preprocess() model=build_module() print(model.summary()) model.compile(optimizer='rmsprop', loss = 'categorical_crossentropy', metrics=['accuracy']) history=model.fit(train_images, train_labels, epochs = 5, batch_size=64) draw_loss(history) test_loss, test_acc = model.evaluate(test_images, test_labels) print('test_loss=',test_loss,' test_acc = ', test_acc)
Changes of loss and accuracy during iterative training
Because the data set is relatively simple, the accuracy of random neural network design in the test set can reach 99.2%
END!