Deep learning -- TensorFlow (project) recognizes your own handwritten digits (based on CNN convolutional neural network)

catalogue

Basic theory  

1, Training CNN convolutional neural network

1. Load data

2. Change data dimension

3. Normalization

4. Unique heat coding

5. Build CNN convolutional neural network

  5-1. First layer: the first convolution layer

5-2. Second layer: second convolution layer

5-3. Flattening

5-4. The third layer: the first fully connected layer

5-5. Layer 4: the second full connection layer (output layer)

6. Compile

7. Training

8. Save model

code

2, Recognize your own handwritten digits (images)

1. Load data

2. Load the trained model

3. Load the digital picture written by yourself and set the size

4. To grayscale image

5. Turn to white on black background, data normalization

6. Convert to four-dimensional data

7. Forecast

8. Display image

Effect display

code

Basic theory  

 

First layer: convolution layer.

The second layer: convolution layer.

The third layer: full connection layer.

Layer 4: output layer.

        The picture of the original handwritten numeral in the picture is a 28 × 28, and it is black and white, so the number of channels of the picture is 1 and the input data is 28 × twenty-eight × 1. If it is a color picture, the number of channels of the picture is 3.
        The network structure is a 4-layer convolution neural network (when calculating the number of neural network layers, the one with weight is regarded as one layer, and the pooling layer cannot be counted as one layer alone) (the pooling calculation is carried out in the convolution layer).
Convolution of multiple feature maps is equivalent to feature extraction of multiple feature maps at the same time.

         The more the number of feature maps is, the more the number of features extracted by the convolution network is. If the number of feature maps is set too small, it is prone to under fitting. If the number of feature maps is set too much, it is prone to over fitting, so it needs to be set to an appropriate value.  

 

 

1, Training CNN convolutional neural network

1. Load data

# 1. Load data
mnist = tf.keras.datasets.mnist
(train_data, train_target), (test_data, test_target) = mnist.load_data()

 

2. Change data dimension

Note: in TensorFlow, the data needs to be changed into a 4-dimensional format when convolution is performed.
The four dimensions are: data quantity, picture height, picture width and picture channel number.
# 2. Change data dimension
train_data = train_data.reshape(-1, 28, 28, 1)
test_data = test_data.reshape(-1, 28, 28, 1)
# Note: in TensorFlow, it is necessary to change the data into a 4-dimensional format when convoluting
# The four dimensions are: data quantity, picture height, picture width and picture channel number

3. Normalization

# 3. Normalization (helps speed up training)
train_data = train_data/255.0
test_data = test_data/255.0

4. Unique heat coding

# 4. Unique heat coding
train_target = tf.keras.utils.to_categorical(train_target, num_classes=10)
test_target = tf.keras.utils.to_categorical(test_target, num_classes=10)    #10 results

5. Build CNN convolutional neural network

model = Sequential()

 

  5-1. First layer: the first convolution layer

The first convolution layer: convolution layer + pooling layer.  

# 5-1. First layer: convolution layer + pooling layer
# First convolution
model.add(Convolution2D(input_shape = (28,28,1), filters = 32, kernel_size = 5, strides = 1, padding = 'same', activation = 'relu'))
#         Convolution layer input data filter number convolution kernel size step filling data (same padding) activation function
# First pool layer # pool_size
model.add(MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',))
#         Pool layer (maximum pool) pool window size step filling method

 

5-2. Second layer: second convolution layer

# 5-2. Second layer: convolution layer + pooling layer
# Second convolution
model.add(Convolution2D(64, 5, strides=1, padding='same', activation='relu'))
# 64: number of filters 5: convolution window size
# Second pool layer
model.add(MaxPooling2D(2, 2, 'same'))

5-3. Flattening

Change (64,7,7,64) data into: (64,7 * 7 * 64).  

flatten flattening:  

# 5-3. Flattening (equivalent to (64,7,7,64) data - > (64,7 * 7 * 64))
model.add(Flatten())

5-4. The third layer: the first fully connected layer

# 5-4. The third layer: the first fully connected layer
model.add(Dense(1024,activation = 'relu'))
model.add(Dropout(0.5))

5-5. Layer 4: the second full connection layer (output layer)

# 5-5. Layer 4: the second full connection layer (output layer)
model.add(Dense(10, activation='softmax'))
# 10: Number of output neurons

6. Compile

Set optimizer, loss function, label.  

# 6. Compile
model.compile(optimizer=Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
#            Optimizer (adam) loss function (cross entropy loss function) label

7. Training

# 7. Training
model.fit(train_data, train_target, batch_size=64, epochs=10, validation_data=(test_data, test_target))

8. Save model

# 8. Save model
model.save('mnist.h5')

effect:

Epoch 1/10
938/938 [==============================] - 142s 151ms/step - loss: 0.3319 - accuracy: 0.9055 - val_loss: 0.0895 - val_accuracy: 0.9728
Epoch 2/10
938/938 [==============================] - 158s 169ms/step - loss: 0.0911 - accuracy: 0.9721 - val_loss: 0.0515 - val_accuracy: 0.9830
Epoch 3/10
938/938 [==============================] - 146s 156ms/step - loss: 0.0629 - accuracy: 0.9807 - val_loss: 0.0389 - val_accuracy: 0.9874
Epoch 4/10
938/938 [==============================] - 120s 128ms/step - loss: 0.0498 - accuracy: 0.9848 - val_loss: 0.0337 - val_accuracy: 0.9889
Epoch 5/10
938/938 [==============================] - 119s 127ms/step - loss: 0.0424 - accuracy: 0.9869 - val_loss: 0.0273 - val_accuracy: 0.9898
Epoch 6/10
938/938 [==============================] - 129s 138ms/step - loss: 0.0338 - accuracy: 0.9897 - val_loss: 0.0270 - val_accuracy: 0.9907
Epoch 7/10
938/938 [==============================] - 124s 133ms/step - loss: 0.0302 - accuracy: 0.9904 - val_loss: 0.0234 - val_accuracy: 0.9917
Epoch 8/10
938/938 [==============================] - 132s 140ms/step - loss: 0.0264 - accuracy: 0.9916 - val_loss: 0.0240 - val_accuracy: 0.9913
Epoch 9/10
938/938 [==============================] - 139s 148ms/step - loss: 0.0233 - accuracy: 0.9926 - val_loss: 0.0235 - val_accuracy: 0.9919
Epoch 10/10
938/938 [==============================] - 139s 148ms/step - loss: 0.0208 - accuracy: 0.9937 - val_loss: 0.0215 - val_accuracy: 0.9924

It can be found that after 10 times of training, the effect has reached 99% +, which is still quite good.  

 

 

code

# Handwritten numeral recognition -- CNN neural network training
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout,Convolution2D,MaxPooling2D,Flatten
from tensorflow.keras.optimizers import Adam

# 1. Load data
mnist = tf.keras.datasets.mnist
(train_data, train_target), (test_data, test_target) = mnist.load_data()

# 2. Change data dimension
train_data = train_data.reshape(-1, 28, 28, 1)
test_data = test_data.reshape(-1, 28, 28, 1)
# Note: in TensorFlow, it is necessary to change the data into a 4-dimensional format when convoluting
# The four dimensions are: data quantity, picture height, picture width and picture channel number

# 3. Normalization (helps speed up training)
train_data = train_data/255.0
test_data = test_data/255.0

# 4. Unique heat coding
train_target = tf.keras.utils.to_categorical(train_target, num_classes=10)
test_target = tf.keras.utils.to_categorical(test_target, num_classes=10)    #10 results

# 5. Build CNN convolutional neural network
model = Sequential()
# 5-1. First layer: convolution layer + pooling layer
# First convolution
model.add(Convolution2D(input_shape = (28,28,1), filters = 32, kernel_size = 5, strides = 1, padding = 'same', activation = 'relu'))
#         Convolution layer input data filter number convolution kernel size step filling data (same padding) activation function
# First pool layer # pool_size
model.add(MaxPooling2D(pool_size = 2, strides = 2, padding = 'same',))
#         Pool layer (maximum pool) pool window size step filling method

# 5-2. Second layer: convolution layer + pooling layer
# Second convolution
model.add(Convolution2D(64, 5, strides=1, padding='same', activation='relu'))
# 64: number of filters 5: convolution window size
# Second pool layer
model.add(MaxPooling2D(2, 2, 'same'))

# 5-3. Flattening (equivalent to (64,7,7,64) data - > (64,7 * 7 * 64))
model.add(Flatten())

# 5-4. The third layer: the first fully connected layer
model.add(Dense(1024, activation = 'relu'))
model.add(Dropout(0.5))

# 5-5. Layer 4: the second full connection layer (output layer)
model.add(Dense(10, activation='softmax'))
# 10: Number of output neurons

# 6. Compile
model.compile(optimizer=Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
#            Optimizer (adam) loss function (cross entropy loss function) label

# 7. Training
model.fit(train_data, train_target, batch_size=64, epochs=10, validation_data=(test_data, test_target))

# 8. Save model
model.save('mnist.h5')

2, Recognize your own handwritten digits (images)

1. Load data

# 1. Load data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

Pictures of dataset (one):

 

2. Load the trained model

# 2. Load the trained model
model = load_model('mnist.h5')

3. Load the digital picture written by yourself and set the size

# 3. Load the digital picture written by yourself and set the size
img = Image.open('6.jpg')
# Set the size (consistent with the picture of the dataset)
img = img.resize((28, 28))

 

4. To grayscale image

# 4. To grayscale image
gray = np.array(img.convert('L'))       #. convert('L '): convert to grayscale image

It can be found that it is very different from the black words on a white background in the dataset, so let's reverse it:

5. Turn to white on black background, data normalization

The data in MNIST dataset are white words with black background, and the value is between 0 and 1.

# 5. Turn to white on black background, data normalization
gray_inv = (255-gray)/255.0

6. Convert to four-dimensional data

CNN neural network prediction requires four-dimensional data.  

# 6. Turn to four-dimensional data (required for CNN prediction)
image = gray_inv.reshape((1,28,28,1))

7. Forecast

# 7. Forecast
prediction = model.predict(image)           # forecast
prediction = np.argmax(prediction,axis=1)   # Find the maximum
print('Forecast results:', prediction)

8. Display image

# 8. Show
# Set plt chart
f, ax = plt.subplots(3, 3, figsize=(7, 7))
# Display dataset image
ax[0][0].set_title('train_model')
ax[0][0].axis('off')
ax[0][0].imshow(x_train[18], 'gray')
# Show original
ax[0][1].set_title('img')
ax[0][1].axis('off')
ax[0][1].imshow(img, 'gray')
# Display grayscale image (black words on white background)
ax[0][2].set_title('gray')
ax[0][2].axis('off')
ax[0][2].imshow(gray, 'gray')
# Display grayscale image (white on black background)
ax[1][0].set_title('gray')
ax[1][0].axis('off')
ax[1][0].imshow(gray_inv, 'gray')

plt.show()

 

Effect display

 

 

 

code

# Recognize your own handwritten digits (image prediction)
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'

import tensorflow as tf
from tensorflow.keras.models import load_model
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

# 1. Load data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 2. Load the trained model
model = load_model('mnist.h5')

# 3. Load the digital picture written by yourself and set the size
img = Image.open('5.jpg')
# Set the size (consistent with the picture of the dataset)
img = img.resize((28, 28))

# 4. To grayscale image
gray = np.array(img.convert('L'))       #. convert('L '): convert to grayscale image

# 5. Turn to white on black background, data normalization
gray_inv = (255-gray)/255.0

# 6. Turn to four-dimensional data (required for CNN prediction)
image = gray_inv.reshape((1,28,28,1))

# 7. Forecast
prediction = model.predict(image)           # forecast
prediction = np.argmax(prediction,axis=1)   # Find the maximum
print('Forecast results:', prediction)

# 8. Show
# Set plt chart
f, ax = plt.subplots(2, 2, figsize=(5, 5))
# Display dataset image
ax[0][0].set_title('train_model')
ax[0][0].axis('off')
ax[0][0].imshow(x_train[18], 'gray')
# Show original
ax[0][1].set_title('img')
ax[0][1].axis('off')
ax[0][1].imshow(img, 'gray')
# Display grayscale image (black words on white background)
ax[1][0].set_title('gray')
ax[1][0].axis('off')
ax[1][0].imshow(gray, 'gray')
# Display grayscale image (white on black background)
ax[1][1].set_title(f'predict:{prediction}')
ax[1][1].axis('off')
ax[1][1].imshow(gray_inv, 'gray')

plt.show()

Tags: neural networks TensorFlow Deep Learning image processing Project

Posted on Thu, 14 Oct 2021 23:43:37 -0400 by jakebur01