- home page
- special column
- Deep learning
- Article details

Realization of convolution neural network and visualization of convolution features

Copyright notice: This article is the original article of the blogger and cannot be reproduced without the permission of the blogger.
This paper mainly realizes a simple convolution neural network and visualizes the extracted features in the convolution process
Convolutional neural network was first used to solve the problem of image recognition. Now it is also used in time series data and text data processing. Convolutional neural network does not need to extract data features. In the process of network training, the network will automatically extract main features
The convolutional neural network directly uses all the pixels of the original image as the input, but the interior is a non fully connected structure. Because the image data is spatially organized, each pixel is spatially related to the surrounding pixels, and has little connection with the pixels far away. Each neuron only needs to accept local pixels as the input, Then the global information can be obtained by summarizing the local information
The two operations of weight sharing and pooling greatly reduce the parameters of the network model and improve the training efficiency of the model
- Weight sharing:
There can be multiple convolution cores in the convolution layer. Each convolution core will map a new 2D image after convolution with the original image. Each pixel of the new image comes from the same convolution core. This is weight sharing
- Pooling:
Downsampling: for the image processed by the activation function after convolution (filtering), the pixel with the highest gray value in the pixel block (the most important feature is retained), such as 2x2 maximum pool, and a 2x2 pixel block is reduced to 1x1 pixel block
# The training data of convolution network is MNIST(28*28 gray monochrome image) import tensorflow as tf import numpy as np import matplotlib.pyplot as plt from tensorflow.examples.tutorials.mnist import input_data
Training parameters
train_epochs = 100 # Number of training rounds batch_size = 100 # Random data size display_step = 1 # Display interval of training results learning_rate= 0.0001 # learning efficiency drop_prob = 0.5 # Regularization, discard ratio fch_nodes = 512 # Number of fully connected hidden layer neurons
network structure
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-gpwo1l58-157907744365) (output_16_11. PNG)]
The input layer is the input gray image size: -1 x 28 x 28 x 1 First convolution,Size of convolution kernel,Depth and quantity (5, 5, 1, 16) Characteristic tensor size after pooling: -1 x 14 x 14 x 16 Second convolution,Size of convolution kernel,Depth and quantity (5, 5, 16, 32) Characteristic tensor size after pooling: -1 x 7 x 7 x 32 Full connection layer weight matrix one thousand five hundred and sixty-eight x 512 Between the output layer and the fully connected hidden layer, 512 x 10
Some auxiliary functions
# Some auxiliary functions required by network model # Weight initialization (convolution kernel initialization) # tf.truncated_normal() is different from tf.random_normal(), the returned value will not deviate from the standard deviation of twice the mean value # The parameter shpae is a list object, for example, corresponding to [5, 5, 1, 32] # 5 and 5 represent the size of convolution kernel, 1 represents channel, convolution for color pictures is 3, and monochrome gray scale is 1 # The last number 32 is the number of convolution kernels (that is, the number of features extracted from the volume base) # Explicitly declare data types, remember def weight_init(shape): weights = tf.truncated_normal(shape, stddev=0.1,dtype=tf.float32) return tf.Variable(weights) # Initialization of offset def biases_init(shape): biases = tf.random_normal(shape,dtype=tf.float32) return tf.Variable(biases) # Randomly selected mini_batch def get_random_batchdata(n_samples, batchsize): start_index = np.random.randint(0, n_samples - batchsize) return (start_index, start_index + batchsize)
# Full connection layer weight initialization function xavier def xavier_init(layer1, layer2, constant = 1): Min = -constant * np.sqrt(6.0 / (layer1 + layer2)) Max = constant * np.sqrt(6.0 / (layer1 + layer2)) return tf.Variable(tf.random_uniform((layer1, layer2), minval = Min, maxval = Max, dtype = tf.float32))
# convolution def conv2d(x, w): return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME') # The source code is located under tensorflow/python/ops nn_impl.py and nn_ops.py # This function takes two parameters. x is the pixel of the image and w is the convolution kernel # Dimension of x tensor [batch, height, width, channels] # Dimension of w convolution kernel [height, width, channels, channels_multiplier] # tf.nn.conv2d() is a two-dimensional convolution function, # stirdes is the step size of the convolution kernel moving, and the four 1s represent the step size moving on the four parameters of the x tensor dimension # The padding parameter 'SAME' indicates that the original input pixels are filled, and the 2D image mapped after convolution is equal to the size of the original image # Filling refers to filling 0 pixels around the pixel value matrix of the original image # If filling is not performed, assume that the original image is a 32x32 image, the convolution sum size is 5x5, and the size of the mapped image after convolution is 28x28
Padding
The action of convolution kernel in feature extraction becomes padding,It works in two ways: SAME and VALID. The moving step of convolution kernel may not be able to divide the width of picture pixels, so some pixels cannot be convoluted at the border of some pictures. This sampling without crossing the edge is called valid padding,The image area after convolution is smaller than the original image. In order to make the convolution kernel cover all pixels, the edge position can be filled with 0 pixels, and then convoluted. This cross edge sampling is same padding. If the over moving step is 1, an image of the same size as the original image is obtained. If the step size is large and exceeds the convolution kernel length, then same padding,The resulting feature image will also be smaller than the original image.
# Pooling def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') # Pooling is somewhat similar to convolution # x is the image after nonlinear activation after convolution, # ksize is the pooled sliding tensor # The dimension of ksize [batch, height, width, channels] is the same as the x tensor # Stripes [1, 2, 2, 1], the moving step of the corresponding dimension above # padding is the same as the convolution function. padding='VALID ', the original image is not 0 filled
# x is the pixel value of the handwritten image, and y is the label corresponding to the image x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) # Convert the one-dimensional vector of gray image into 28x28 two-dimensional structure x_image = tf.reshape(x, [-1, 28, 28, 1]) # -1 represents any number of samples, with a size of 28x28 and a depth of one # It can be ignored (in fact, the tensor with depth of 28 and 28x1 is used to represent the tensor with depth of 28 and 28x28 and 1)
First layer convolution + pooling
w_conv1 = weight_init([5, 5, 1, 16]) # 5x5, depth 1,16 b_conv1 = biases_init([16]) h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1) # Size of output tensor: 28x28x16 h_pool1 = max_pool_2x2(h_conv1) # Tensor size after pooling: 14x14x16 # h_ 16 characteristic diagrams of pool1, 14x14
Layer 2 convolution + pooling
w_conv2 = weight_init([5, 5, 16, 32]) # 5x5, depth 16,32 b_conv2 = biases_init([32]) h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2) # Size of output tensor: 14x14x32 h_pool2 = max_pool_2x2(h_conv2) # Tensor size after pooling: 7x7x32 # h_ 32 characteristic diagrams of pool2, 7x7
Full connection layer
# h_pool2 is a 7x7x32 tensor, which is converted into a one-dimensional vector h_fpool2 = tf.reshape(h_pool2, [-1, 7*7*32]) # Full connection layer, 512 hidden layer nodes # Weight initialization w_fc1 = xavier_init(7*7*32, fch_nodes) b_fc1 = biases_init([fch_nodes]) h_fc1 = tf.nn.relu(tf.matmul(h_fpool2, w_fc1) + b_fc1)
# Fully connected hidden layer / output layer # In order to prevent the network from over fitting, the fully connected hidden layer is dropout (regularized), and the part is randomly discarded in the training process # Dropout is the same as setting the node data to 0 to discard some eigenvalues. Only in the training process, # Full data features are still used for prediction # Proportion of incoming and discarded node data #keep_prob = tf.placeholder(tf.float32) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob=drop_prob) # Hide layer and output layer weight initialization w_fc2 = xavier_init(fch_nodes, 10) b_fc2 = biases_init([10]) # Inactive output y_ = tf.add(tf.matmul(h_fc1_drop, w_fc2), b_fc2) # Active output y_out = tf.nn.softmax(y_)
# Cross entropy cost function cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_out), reduction_indices = [1])) # tensorflow comes with a method for calculating cross entropy # Input the output value without nonlinear activation and the corresponding real label #cross_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_, y)) # The optimizer selects Adam (multiple choices) optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy) # Accuracy # The prediction result of each sample is a (1,10) vector correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_out, 1)) # tf.cast converts the bool value to a floating point number accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Operation s initialized by global variables init = tf.global_variables_initializer()
# Load dataset MNIST mnist = input_data.read_data_sets('MNIST/mnist', one_hot=True) n_samples = int(mnist.train.num_examples) total_batches = int(n_samples / batch_size)
# conversation with tf.Session() as sess: sess.run(init) Cost = [] Accuracy = [] for i in range(train_epochs): for j in range(100): start_index, end_index = get_random_batchdata(n_samples, batch_size) batch_x = mnist.train.images[start_index: end_index] batch_y = mnist.train.labels[start_index: end_index] _, cost, accu = sess.run([ optimizer, cross_entropy,accuracy], feed_dict={x:batch_x, y:batch_y}) Cost.append(cost) Accuracy.append(accu) if i % display_step ==0: print ('Epoch : %d , Cost : %.7f'%(i+1, cost)) print 'training finished' # Cost function curve fig1,ax1 = plt.subplots(figsize=(10,7)) plt.plot(Cost) ax1.set_xlabel('Epochs') ax1.set_ylabel('Cost') plt.title('Cross Loss') plt.grid() plt.show() # Accuracy curve fig7,ax7 = plt.subplots(figsize=(10,7)) plt.plot(Accuracy) ax7.set_xlabel('Epochs') ax7.set_ylabel('Accuracy Rate') plt.title('Train Accuracy Rate') plt.grid() plt.show() #----------------------------------Feature visualization of each layer------------------------------- # imput image fig2,ax2 = plt.subplots(figsize=(2,2)) ax2.imshow(np.reshape(mnist.train.images[11], (28, 28))) plt.show() # Characteristic diagram of convolution output of the first layer input_image = mnist.train.images[11:12] conv1_16 = sess.run(h_conv1, feed_dict={x:input_image}) # [1, 28, 28 ,16] conv1_transpose = sess.run(tf.transpose(conv1_16, [3, 0, 1, 2])) fig3,ax3 = plt.subplots(nrows=1, ncols=16, figsize = (16,1)) for i in range(16): ax3[i].imshow(conv1_transpose[i][0]) # Slice of tensor [row, column] plt.title('Conv1 16x28x28') plt.show() # Characteristic diagram of the first floor after pooling pool1_16 = sess.run(h_pool1, feed_dict={x:input_image}) # [1, 14, 14, 16] pool1_transpose = sess.run(tf.transpose(pool1_16, [3, 0, 1, 2])) fig4,ax4 = plt.subplots(nrows=1, ncols=16, figsize=(16,1)) for i in range(16): ax4[i].imshow(pool1_transpose[i][0]) plt.title('Pool1 16x14x14') plt.show() # Second layer convolution output characteristic diagram conv2_32 = sess.run(h_conv2, feed_dict={x:input_image}) # [1, 14, 14, 32] conv2_transpose = sess.run(tf.transpose(conv2_32, [3, 0, 1, 2])) fig5,ax5 = plt.subplots(nrows=1, ncols=32, figsize = (32, 1)) for i in range(32): ax5[i].imshow(conv2_transpose[i][0]) plt.title('Conv2 32x14x14') plt.show() # Characteristic diagram of the second floor after pooling pool2_32 = sess.run(h_pool2, feed_dict={x:input_image}) #[1, 7, 7, 32] pool2_transpose = sess.run(tf.transpose(pool2_32, [3, 0, 1, 2])) fig6,ax6 = plt.subplots(nrows=1, ncols=32, figsize = (32, 1)) plt.title('Pool2 32x7x7') for i in range(32): ax6[i].imshow(pool2_transpose[i][0]) plt.show()
Epoch : 1 , Cost : 1.7629557 Epoch : 2 , Cost : 0.8955871 Epoch : 3 , Cost : 0.6002768 Epoch : 4 , Cost : 0.4222347 Epoch : 5 , Cost : 0.4106165 Epoch : 6 , Cost : 0.5070749 Epoch : 7 , Cost : 0.5032627 Epoch : 8 , Cost : 0.3399751 Epoch : 9 , Cost : 0.1524799 Epoch : 10 , Cost : 0.2328545 Epoch : 11 , Cost : 0.1815660 Epoch : 12 , Cost : 0.2749544 Epoch : 13 , Cost : 0.2539429 Epoch : 14 , Cost : 0.1850740 Epoch : 15 , Cost : 0.3227096 Epoch : 16 , Cost : 0.0711472 Epoch : 17 , Cost : 0.1688010 Epoch : 18 , Cost : 0.1442217 Epoch : 19 , Cost : 0.2415594 Epoch : 20 , Cost : 0.0848383 Epoch : 21 , Cost : 0.1879225 Epoch : 22 , Cost : 0.1355369 Epoch : 23 , Cost : 0.1578972 Epoch : 24 , Cost : 0.1017473 Epoch : 25 , Cost : 0.2265745 Epoch : 26 , Cost : 0.2625684 Epoch : 27 , Cost : 0.1950202 Epoch : 28 , Cost : 0.0607868 Epoch : 29 , Cost : 0.0782418 Epoch : 30 , Cost : 0.0744723 Epoch : 31 , Cost : 0.0848689 Epoch : 32 , Cost : 0.1038134 Epoch : 33 , Cost : 0.0848786 Epoch : 34 , Cost : 0.1219746 Epoch : 35 , Cost : 0.0889094 Epoch : 36 , Cost : 0.0605406 Epoch : 37 , Cost : 0.0478896 Epoch : 38 , Cost : 0.1100840 Epoch : 39 , Cost : 0.0168766 Epoch : 40 , Cost : 0.0479708 Epoch : 41 , Cost : 0.1187883 Epoch : 42 , Cost : 0.0707371 Epoch : 43 , Cost : 0.0471128 Epoch : 44 , Cost : 0.1206998 Epoch : 45 , Cost : 0.0674985 Epoch : 46 , Cost : 0.1218394 Epoch : 47 , Cost : 0.0840694 Epoch : 48 , Cost : 0.0468497 Epoch : 49 , Cost : 0.0899443 Epoch : 50 , Cost : 0.0111846 Epoch : 51 , Cost : 0.0653627 Epoch : 52 , Cost : 0.1446207 Epoch : 53 , Cost : 0.0320902 Epoch : 54 , Cost : 0.0792156 Epoch : 55 , Cost : 0.1250363 Epoch : 56 , Cost : 0.0477339 Epoch : 57 , Cost : 0.0249218 Epoch : 58 , Cost : 0.0571465 Epoch : 59 , Cost : 0.0152223 Epoch : 60 , Cost : 0.0373616 Epoch : 61 , Cost : 0.0417238 Epoch : 62 , Cost : 0.0710011 Epoch : 63 , Cost : 0.0654174 Epoch : 64 , Cost : 0.0234730 Epoch : 65 , Cost : 0.0267291 Epoch : 66 , Cost : 0.0329132 Epoch : 67 , Cost : 0.0344089 Epoch : 68 , Cost : 0.1151591 Epoch : 69 , Cost : 0.0555586 Epoch : 70 , Cost : 0.0213475 Epoch : 71 , Cost : 0.0567649 Epoch : 72 , Cost : 0.1207196 Epoch : 73 , Cost : 0.0407380 Epoch : 74 , Cost : 0.0580697 Epoch : 75 , Cost : 0.0352901 Epoch : 76 , Cost : 0.0420529 Epoch : 77 , Cost : 0.0016548 Epoch : 78 , Cost : 0.0184542 Epoch : 79 , Cost : 0.0657262 Epoch : 80 , Cost : 0.0185127 Epoch : 81 , Cost : 0.0211956 Epoch : 82 , Cost : 0.0709701 Epoch : 83 , Cost : 0.1013358 Epoch : 84 , Cost : 0.0876017 Epoch : 85 , Cost : 0.1351897 Epoch : 86 , Cost : 0.1239478 Epoch : 87 , Cost : 0.0147001 Epoch : 88 , Cost : 0.0155131 Epoch : 89 , Cost : 0.0425102 Epoch : 90 , Cost : 0.0912542 Epoch : 91 , Cost : 0.0445287 Epoch : 92 , Cost : 0.0823120 Epoch : 93 , Cost : 0.0155016 Epoch : 94 , Cost : 0.0869377 Epoch : 95 , Cost : 0.0641734 Epoch : 96 , Cost : 0.0498264 Epoch : 97 , Cost : 0.0289681 Epoch : 98 , Cost : 0.0271511 Epoch : 99 , Cost : 0.0131940 Epoch : 100 , Cost : 0.0418167 training finished
Training cross entropy cost
Training accuracy
A sample of training data
Features extracted from the first convolution layer
Characteristics after 2x2 pooling
Feature extraction by second layer convolution
Characteristics after 2x2 pooling