Realization of convolution neural network and visualization of convolution features

  1. home page
  2. special column
  3. Deep learning
  4. Article details
0

Realization of convolution neural network and visualization of convolution features

King of Kings 2020 Published 20 minutes ago

Copyright notice: This article is the original article of the blogger and cannot be reproduced without the permission of the blogger.

This paper mainly realizes a simple convolution neural network and visualizes the extracted features in the convolution process

Convolutional neural network was first used to solve the problem of image recognition. Now it is also used in time series data and text data processing. Convolutional neural network does not need to extract data features. In the process of network training, the network will automatically extract main features
The convolutional neural network directly uses all the pixels of the original image as the input, but the interior is a non fully connected structure. Because the image data is spatially organized, each pixel is spatially related to the surrounding pixels, and has little connection with the pixels far away. Each neuron only needs to accept local pixels as the input, Then the global information can be obtained by summarizing the local information
The two operations of weight sharing and pooling greatly reduce the parameters of the network model and improve the training efficiency of the model

  • Weight sharing:

There can be multiple convolution cores in the convolution layer. Each convolution core will map a new 2D image after convolution with the original image. Each pixel of the new image comes from the same convolution core. This is weight sharing

  • Pooling:
    Downsampling: for the image processed by the activation function after convolution (filtering), the pixel with the highest gray value in the pixel block (the most important feature is retained), such as 2x2 maximum pool, and a 2x2 pixel block is reduced to 1x1 pixel block
# The training data of convolution network is MNIST(28*28 gray monochrome image)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.examples.tutorials.mnist import input_data

Training parameters

train_epochs = 100    # Number of training rounds
batch_size   = 100     # Random data size
display_step = 1       # Display interval of training results
learning_rate= 0.0001  # learning efficiency
drop_prob    = 0.5     # Regularization, discard ratio
fch_nodes    = 512     # Number of fully connected hidden layer neurons

network structure

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-gpwo1l58-157907744365) (output_16_11. PNG)]

The input layer is the input gray image size:  -1 x 28 x 28 x 1 
First convolution,Size of convolution kernel,Depth and quantity (5, 5, 1, 16)
Characteristic tensor size after pooling:       -1 x 14 x 14 x 16
 Second convolution,Size of convolution kernel,Depth and quantity (5, 5, 16, 32)
Characteristic tensor size after pooling:       -1 x 7 x 7 x 32
 Full connection layer weight matrix         one thousand five hundred and sixty-eight x 512
 Between the output layer and the fully connected hidden layer,  512 x 10

Some auxiliary functions

# Some auxiliary functions required by network model
# Weight initialization (convolution kernel initialization)
# tf.truncated_normal() is different from tf.random_normal(), the returned value will not deviate from the standard deviation of twice the mean value
# The parameter shpae is a list object, for example, corresponding to [5, 5, 1, 32]
# 5 and 5 represent the size of convolution kernel, 1 represents channel, convolution for color pictures is 3, and monochrome gray scale is 1
# The last number 32 is the number of convolution kernels (that is, the number of features extracted from the volume base)
#   Explicitly declare data types, remember
def weight_init(shape):
    weights = tf.truncated_normal(shape, stddev=0.1,dtype=tf.float32)
    return tf.Variable(weights)

# Initialization of offset
def biases_init(shape):
    biases = tf.random_normal(shape,dtype=tf.float32)
    return tf.Variable(biases)

# Randomly selected mini_batch
def get_random_batchdata(n_samples, batchsize):
    start_index = np.random.randint(0, n_samples - batchsize)
    return (start_index, start_index + batchsize)
# Full connection layer weight initialization function xavier
def xavier_init(layer1, layer2, constant = 1):
    Min = -constant * np.sqrt(6.0 / (layer1 + layer2))
    Max = constant * np.sqrt(6.0 / (layer1 + layer2))
    return tf.Variable(tf.random_uniform((layer1, layer2), minval = Min, maxval = Max, dtype = tf.float32))
# convolution
def conv2d(x, w):
    return tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')

# The source code is located under tensorflow/python/ops nn_impl.py and nn_ops.py
# This function takes two parameters. x is the pixel of the image and w is the convolution kernel
# Dimension of x tensor [batch, height, width, channels]
# Dimension of w convolution kernel [height, width, channels, channels_multiplier]
# tf.nn.conv2d() is a two-dimensional convolution function,
# stirdes is the step size of the convolution kernel moving, and the four 1s represent the step size moving on the four parameters of the x tensor dimension
# The padding parameter 'SAME' indicates that the original input pixels are filled, and the 2D image mapped after convolution is equal to the size of the original image
# Filling refers to filling 0 pixels around the pixel value matrix of the original image
# If filling is not performed, assume that the original image is a 32x32 image, the convolution sum size is 5x5, and the size of the mapped image after convolution is 28x28

Padding

The action of convolution kernel in feature extraction becomes padding,It works in two ways: SAME and VALID. The moving step of convolution kernel may not be able to divide the width of picture pixels, so some pixels cannot be convoluted at the border of some pictures. This sampling without crossing the edge is called valid padding,The image area after convolution is smaller than the original image. In order to make the convolution kernel cover all pixels, the edge position can be filled with 0 pixels, and then convoluted. This cross edge sampling is same padding. If the over moving step is 1, an image of the same size as the original image is obtained.
    If the step size is large and exceeds the convolution kernel length, then same padding,The resulting feature image will also be smaller than the original image.
# Pooling
def max_pool_2x2(x):
    return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

# Pooling is somewhat similar to convolution
# x is the image after nonlinear activation after convolution,
# ksize is the pooled sliding tensor
# The dimension of ksize [batch, height, width, channels] is the same as the x tensor
# Stripes [1, 2, 2, 1], the moving step of the corresponding dimension above
# padding is the same as the convolution function. padding='VALID ', the original image is not 0 filled
# x is the pixel value of the handwritten image, and y is the label corresponding to the image
x = tf.placeholder(tf.float32, [None, 784])
y = tf.placeholder(tf.float32, [None, 10])
# Convert the one-dimensional vector of gray image into 28x28 two-dimensional structure
x_image = tf.reshape(x, [-1, 28, 28, 1])
# -1 represents any number of samples, with a size of 28x28 and a depth of one
# It can be ignored (in fact, the tensor with depth of 28 and 28x1 is used to represent the tensor with depth of 28 and 28x28 and 1)

First layer convolution + pooling

w_conv1 = weight_init([5, 5, 1, 16])                             # 5x5, depth 1,16
b_conv1 = biases_init([16])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1) + b_conv1)    # Size of output tensor: 28x28x16
h_pool1 = max_pool_2x2(h_conv1)                                   # Tensor size after pooling: 14x14x16
# h_ 16 characteristic diagrams of pool1, 14x14

Layer 2 convolution + pooling

w_conv2 = weight_init([5, 5, 16, 32])                             # 5x5, depth 16,32
b_conv2 = biases_init([32])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2) + b_conv2)    # Size of output tensor: 14x14x32
h_pool2 = max_pool_2x2(h_conv2)                                   # Tensor size after pooling: 7x7x32
# h_ 32 characteristic diagrams of pool2, 7x7

Full connection layer

# h_pool2 is a 7x7x32 tensor, which is converted into a one-dimensional vector
h_fpool2 = tf.reshape(h_pool2, [-1, 7*7*32])
# Full connection layer, 512 hidden layer nodes
# Weight initialization
w_fc1 = xavier_init(7*7*32, fch_nodes)
b_fc1 = biases_init([fch_nodes])
h_fc1 = tf.nn.relu(tf.matmul(h_fpool2, w_fc1) + b_fc1)
# Fully connected hidden layer / output layer
# In order to prevent the network from over fitting, the fully connected hidden layer is dropout (regularized), and the part is randomly discarded in the training process
# Dropout is the same as setting the node data to 0 to discard some eigenvalues. Only in the training process,
# Full data features are still used for prediction
# Proportion of incoming and discarded node data
#keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob=drop_prob)

# Hide layer and output layer weight initialization
w_fc2 = xavier_init(fch_nodes, 10)
b_fc2 = biases_init([10])

# Inactive output
y_ = tf.add(tf.matmul(h_fc1_drop, w_fc2), b_fc2)
# Active output
y_out = tf.nn.softmax(y_)
# Cross entropy cost function
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(y_out), reduction_indices = [1]))

# tensorflow comes with a method for calculating cross entropy
# Input the output value without nonlinear activation and the corresponding real label
#cross_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_, y))

# The optimizer selects Adam (multiple choices)
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cross_entropy)

# Accuracy
# The prediction result of each sample is a (1,10) vector
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_out, 1))
# tf.cast converts the bool value to a floating point number
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Operation s initialized by global variables
init = tf.global_variables_initializer()
# Load dataset MNIST
mnist = input_data.read_data_sets('MNIST/mnist', one_hot=True)
n_samples = int(mnist.train.num_examples)
total_batches = int(n_samples / batch_size)
# conversation
with tf.Session() as sess:
    sess.run(init)
    Cost = []
    Accuracy = []
    for i in range(train_epochs):

        for j in range(100):
            start_index, end_index = get_random_batchdata(n_samples, batch_size)
            
            batch_x = mnist.train.images[start_index: end_index]
            batch_y = mnist.train.labels[start_index: end_index]
            _, cost, accu = sess.run([ optimizer, cross_entropy,accuracy], feed_dict={x:batch_x, y:batch_y})
            Cost.append(cost)
            Accuracy.append(accu)
        if i % display_step ==0:
            print ('Epoch : %d ,  Cost : %.7f'%(i+1, cost))
    print 'training finished'
    # Cost function curve
    fig1,ax1 = plt.subplots(figsize=(10,7))
    plt.plot(Cost)
    ax1.set_xlabel('Epochs')
    ax1.set_ylabel('Cost')
    plt.title('Cross Loss')
    plt.grid()
    plt.show()
    # Accuracy curve
    fig7,ax7 = plt.subplots(figsize=(10,7))
    plt.plot(Accuracy)
    ax7.set_xlabel('Epochs')
    ax7.set_ylabel('Accuracy Rate')
    plt.title('Train Accuracy Rate')
    plt.grid()
    plt.show()
#----------------------------------Feature visualization of each layer-------------------------------
    # imput image
    fig2,ax2 = plt.subplots(figsize=(2,2))
    ax2.imshow(np.reshape(mnist.train.images[11], (28, 28)))
    plt.show()
    
    # Characteristic diagram of convolution output of the first layer
    input_image = mnist.train.images[11:12]
    conv1_16 = sess.run(h_conv1, feed_dict={x:input_image})     # [1, 28, 28 ,16] 
    conv1_transpose = sess.run(tf.transpose(conv1_16, [3, 0, 1, 2]))
    fig3,ax3 = plt.subplots(nrows=1, ncols=16, figsize = (16,1))
    for i in range(16):
        ax3[i].imshow(conv1_transpose[i][0])                      # Slice of tensor [row, column]
     
    plt.title('Conv1 16x28x28')
    plt.show()
    
    # Characteristic diagram of the first floor after pooling
    pool1_16 = sess.run(h_pool1, feed_dict={x:input_image})     # [1, 14, 14, 16]
    pool1_transpose = sess.run(tf.transpose(pool1_16, [3, 0, 1, 2]))
    fig4,ax4 = plt.subplots(nrows=1, ncols=16, figsize=(16,1))
    for i in range(16):
        ax4[i].imshow(pool1_transpose[i][0])
     
    plt.title('Pool1 16x14x14')
    plt.show()
    
    # Second layer convolution output characteristic diagram
    conv2_32 = sess.run(h_conv2, feed_dict={x:input_image})          # [1, 14, 14, 32]
    conv2_transpose = sess.run(tf.transpose(conv2_32, [3, 0, 1, 2]))
    fig5,ax5 = plt.subplots(nrows=1, ncols=32, figsize = (32, 1))
    for i in range(32):
        ax5[i].imshow(conv2_transpose[i][0])
    plt.title('Conv2 32x14x14')
    plt.show()
    
    # Characteristic diagram of the second floor after pooling
    pool2_32 = sess.run(h_pool2, feed_dict={x:input_image})         #[1, 7, 7, 32]
    pool2_transpose = sess.run(tf.transpose(pool2_32, [3, 0, 1, 2]))
    fig6,ax6 = plt.subplots(nrows=1, ncols=32, figsize = (32, 1))
    plt.title('Pool2 32x7x7')
    for i in range(32):
        ax6[i].imshow(pool2_transpose[i][0])
    
    plt.show()
    
Epoch : 1 ,  Cost : 1.7629557
Epoch : 2 ,  Cost : 0.8955871
Epoch : 3 ,  Cost : 0.6002768
Epoch : 4 ,  Cost : 0.4222347
Epoch : 5 ,  Cost : 0.4106165
Epoch : 6 ,  Cost : 0.5070749
Epoch : 7 ,  Cost : 0.5032627
Epoch : 8 ,  Cost : 0.3399751
Epoch : 9 ,  Cost : 0.1524799
Epoch : 10 ,  Cost : 0.2328545
Epoch : 11 ,  Cost : 0.1815660
Epoch : 12 ,  Cost : 0.2749544
Epoch : 13 ,  Cost : 0.2539429
Epoch : 14 ,  Cost : 0.1850740
Epoch : 15 ,  Cost : 0.3227096
Epoch : 16 ,  Cost : 0.0711472
Epoch : 17 ,  Cost : 0.1688010
Epoch : 18 ,  Cost : 0.1442217
Epoch : 19 ,  Cost : 0.2415594
Epoch : 20 ,  Cost : 0.0848383
Epoch : 21 ,  Cost : 0.1879225
Epoch : 22 ,  Cost : 0.1355369
Epoch : 23 ,  Cost : 0.1578972
Epoch : 24 ,  Cost : 0.1017473
Epoch : 25 ,  Cost : 0.2265745
Epoch : 26 ,  Cost : 0.2625684
Epoch : 27 ,  Cost : 0.1950202
Epoch : 28 ,  Cost : 0.0607868
Epoch : 29 ,  Cost : 0.0782418
Epoch : 30 ,  Cost : 0.0744723
Epoch : 31 ,  Cost : 0.0848689
Epoch : 32 ,  Cost : 0.1038134
Epoch : 33 ,  Cost : 0.0848786
Epoch : 34 ,  Cost : 0.1219746
Epoch : 35 ,  Cost : 0.0889094
Epoch : 36 ,  Cost : 0.0605406
Epoch : 37 ,  Cost : 0.0478896
Epoch : 38 ,  Cost : 0.1100840
Epoch : 39 ,  Cost : 0.0168766
Epoch : 40 ,  Cost : 0.0479708
Epoch : 41 ,  Cost : 0.1187883
Epoch : 42 ,  Cost : 0.0707371
Epoch : 43 ,  Cost : 0.0471128
Epoch : 44 ,  Cost : 0.1206998
Epoch : 45 ,  Cost : 0.0674985
Epoch : 46 ,  Cost : 0.1218394
Epoch : 47 ,  Cost : 0.0840694
Epoch : 48 ,  Cost : 0.0468497
Epoch : 49 ,  Cost : 0.0899443
Epoch : 50 ,  Cost : 0.0111846
Epoch : 51 ,  Cost : 0.0653627
Epoch : 52 ,  Cost : 0.1446207
Epoch : 53 ,  Cost : 0.0320902
Epoch : 54 ,  Cost : 0.0792156
Epoch : 55 ,  Cost : 0.1250363
Epoch : 56 ,  Cost : 0.0477339
Epoch : 57 ,  Cost : 0.0249218
Epoch : 58 ,  Cost : 0.0571465
Epoch : 59 ,  Cost : 0.0152223
Epoch : 60 ,  Cost : 0.0373616
Epoch : 61 ,  Cost : 0.0417238
Epoch : 62 ,  Cost : 0.0710011
Epoch : 63 ,  Cost : 0.0654174
Epoch : 64 ,  Cost : 0.0234730
Epoch : 65 ,  Cost : 0.0267291
Epoch : 66 ,  Cost : 0.0329132
Epoch : 67 ,  Cost : 0.0344089
Epoch : 68 ,  Cost : 0.1151591
Epoch : 69 ,  Cost : 0.0555586
Epoch : 70 ,  Cost : 0.0213475
Epoch : 71 ,  Cost : 0.0567649
Epoch : 72 ,  Cost : 0.1207196
Epoch : 73 ,  Cost : 0.0407380
Epoch : 74 ,  Cost : 0.0580697
Epoch : 75 ,  Cost : 0.0352901
Epoch : 76 ,  Cost : 0.0420529
Epoch : 77 ,  Cost : 0.0016548
Epoch : 78 ,  Cost : 0.0184542
Epoch : 79 ,  Cost : 0.0657262
Epoch : 80 ,  Cost : 0.0185127
Epoch : 81 ,  Cost : 0.0211956
Epoch : 82 ,  Cost : 0.0709701
Epoch : 83 ,  Cost : 0.1013358
Epoch : 84 ,  Cost : 0.0876017
Epoch : 85 ,  Cost : 0.1351897
Epoch : 86 ,  Cost : 0.1239478
Epoch : 87 ,  Cost : 0.0147001
Epoch : 88 ,  Cost : 0.0155131
Epoch : 89 ,  Cost : 0.0425102
Epoch : 90 ,  Cost : 0.0912542
Epoch : 91 ,  Cost : 0.0445287
Epoch : 92 ,  Cost : 0.0823120
Epoch : 93 ,  Cost : 0.0155016
Epoch : 94 ,  Cost : 0.0869377
Epoch : 95 ,  Cost : 0.0641734
Epoch : 96 ,  Cost : 0.0498264
Epoch : 97 ,  Cost : 0.0289681
Epoch : 98 ,  Cost : 0.0271511
Epoch : 99 ,  Cost : 0.0131940
Epoch : 100 ,  Cost : 0.0418167
training finished

Training cross entropy cost

Training accuracy

A sample of training data

Features extracted from the first convolution layer
 

Characteristics after 2x2 pooling
 

Feature extraction by second layer convolution
 

Characteristics after 2x2 pooling

The official account of Amway.

Reading 10 was published 20 minutes ago
Like collection

Primary school students in machine learning

1 prestige
0 fans
Focus on the author
Submit comments
You know what?

Register login

Primary school students in machine learning

1 prestige
0 fans
Focus on the author
Article catalog
follow
Billboard

Tags: TensorFlow Deep Learning image identification

Posted on Fri, 29 Oct 2021 01:25:17 -0400 by horstuff