Convolution neural network learning notes (taking handwritten numeral recognition as an example)

1, Several knowledge points 1. Difference between convolu...

1, Several knowledge points

1. Difference between convolutional neural network and artificial neural network

        The traditional artificial neural network has only input layer, hidden layer and output layer. The number of hidden layers depends on the needs. The construction steps are: feature mapping to value, and feature selection is manual. Based on the original multilayer neural network, the convolution neural network adds the feature learning part, that is, the partially connected convolution layer and pooling layer are added in front of the original full connection layer. The construction steps are: signal - > feature - > value, and the features are selected by the network itself.

2. Basic composition of convolutional neural network

Convolution layer: convolution is the inner product of the local data in the picture and the convolution kernel. It is to extract the local features of the picture. Convolution kernel, also known as filter, is actually a group of neurons with fixed weight, which are used to extract specific features. The extracted features are generally called feature map. The convolution layer is formed by the superposition of multiple filters.

Each filter has a set of weights. After a filter slides to a position, it calculates convolution, sums, and finally adds bias to get the final result of the filter at that position.

Activation function: as mentioned in the previous section, there is an activation function behind the convolution layer, and the output of this layer is after passing the activation function.

Pooling layer: the pooling layer can effectively reduce the size of the parameter matrix, so as to reduce the number of parameters in the final connection layer. Common pooling methods are: average pooling - calculate the average value of the image area as the pooled value of the area. max pooling - select the maximum value of the image area as the pooled value of the area. There are also activation functions after the pooling layer.

The number of convolution layers and pooling layers shall be determined as required.

Full connection layer: the full connection layer is at the tail of convolutional neural network, which is the same as the connection mode of artificial neural network and plays the role of "Classifier".

3. Stochastic Gradient Descent (SGD) optimization method with Momentum parameter

2, Using tensorflow and keras to build a convolutional neural network framework

The code is as follows:

#Cnn_HandWritten.py from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.model_selection import KFold import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras import utils from tensorflow.keras.utils import to_categorical from tensorflow.keras.layers import Conv2D from tensorflow.keras.layers import MaxPooling2D from tensorflow.keras.layers import Flatten from tensorflow.keras.optimizers import SGD #load tarin & test dataset def load_dataset(): (X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data() #reshape dataset to have a single channel X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) #one hot representation y_train = to_categorical(y_train) y_test = to_categorical(y_test) return X_train, y_train, X_test, y_test #scale pixels def pre_pixels(train, test): #convert from integers to floats train_norm = train.astype('float32') test_norm = test.astype('float32') # normalize inputs from 0-255 to 0-1 train_norm = train_norm / 255.0 test_norm = test_norm / 255.0 return train_norm, test_norm #define a model def define_model(): model = Sequential() #add a convolutional layer model.add(Conv2D(8, (3,3), activation = 'relu', kernel_initializer = 'he_uniform', input_shape = (28,28,1))) #add a pooling layer model.add(MaxPooling2D((2,2))) #the number of output for each layer = (input - kernel + 2 * padding) / stride + 1 model.add(Flatten()) #add a hidden layer model.add(Dense(120, activation = 'relu', kernel_initializer = 'he_uniform')) #add a output layer model.add(Dense(10, activation = 'softmax')) #comoile the model opt = SGD(lr = 0.01, momentum = 0.9) model.compile(optimizer = opt, loss = 'categorical_crossentropy', metrics = ['accuracy']) print(model.summary()) return model #evaluate a model using k-fols cross-validation def evaluate_model(dataX, dataY, n_folds = 5): scores, histories = list(), list() #prepare cross validation kfold = KFold(n_folds, shuffle = True, random_state = 1) for train_ix, test_ix in kfold.split(dataX): model = define_model() train_x, train_y, test_x, test_y = dataX[train_ix], dataY[train_ix], dataX[test_ix], dataY[test_ix] history = model.fit(train_x, train_y, epochs = 10, batch_size = 60, validation_data = (test_x, test_y), verbose = 0) print(history.history.keys()) #evaluate model _, acc = model.evaluate(test_x, test_y, verbose = 0) print('> acc: %.3f' % (acc * 100.0)) #stores scores scores.append(acc) histories.append(history) print('scores', scores) print('histories,len', len(histories)) return scores, histories ''' #plot diagnostic learning curves def summarize_diagnostics(histories): for i in range(len(histories)): #plot loss pyplot.subplot(2,1,1) pyplot.title('Cross Entropy Loss') pyplot.plot(histories[i].history['loss'], color = 'blue', label = 'train') pyplot.plot(histories[i].history['val_loss'], color = 'orange', label = 'test') pyplot.ylabel('loss') pyplot.xlabel('epoch') pyplot.legend(['train', 'test'], loc = 'upper right') #plot accuracy pyplot.subplot(2, 1, 2) pyplot.title('Classification Accuracy') pyplot.plot(histories[i].history['accuracy'], color='blue', label='train') pyplot.plot(histories[i].history['val_accuracy'], color='orange', label='test') pyplot.ylabel('accuracy') pyplot.xlabel('epoch') pyplot.legend(['train', 'test'], loc='upper right') pyplot.show() ''' #summarize performance of the model def summarize_performance(scores): #print summary print('Accuracy: mean = %.3f std = %.3f, n=%d' % (mean(scores) * 100, std(scores) * 100, len(scores))) #run the test harness for evaluating amodel def run_mymodel_test(): #load dataset X_train, y_train, X_test, y_test = load_dataset() #scale pixels X_train, X_test = pre_pixels(X_train, X_test) #evaluate a model scores, histories = evaluate_model(X_train, y_train) # plot diagnostic learning curves #summarize_diagnostics(histories) # summarize performance of the model summarize_performance(scores) #Main program entry run_mymodel_test()

Operation results:

  3, Problems encountered and Solutions

At the beginning of running this code, the following errors are reported:

tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

  The solution provided online is to add the following code at the beginning of the program:

physical_devices = tf.config.experimental.list_physical_devices('GPU') assert len(physical_devices) > 0, "Not enough GPU hardware devices available" tf.config.experimental.set_memory_growth(physical_devices[0], True)

When I joined, the error report became:

ValueError: Memory growth cannot differ between GPU devices

The solution provided online is to add the following code at the beginning of the program:

import os os.environ['CUDA_VISIBLE_DEVICES'] = '0'

After running, the error reported above disappeared, and it changed back to the error at the beginning... I don't know why I went around, and then... Returned to the original starting point.

I suddenly realized that this would not work. After all, everyone's version and configuration may be different. I went to look at the error information. Sure enough, I found such a line:

Loaded runtime CuDNN library: 7.1.4 but source was compiled with: 7.6.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

Is Cudnn version incompatible? I'm going to find a way to upgrade Cudnn. I found the way, but I didn't use it because I found this: [cudnn error resolution] loaded runtime cudnn Library: 7.0.5, but source was compiled with: 7.2.1https://blog.csdn.net/jy1023408440/article/details/82887479 Fortunately, I didn't rush to upgrade Cudnn first. If I didn't solve it, I wouldn't have a day. Since the main way is to reduce the version of Tensorflow, I directly use the virtual environment with Tensorflow 1.12.0,

  Cudnn doesn't upgrade. It's solved perfectly in the end.

14 October 2021, 18:42 | Views: 2231

Add new comment

For adding a comment, please log in
or create account

0 comments