Artificial Intelligence Learning (23 Deep Learning: 09-Authentication Code Recognition)

01_Knowledge Review

In the previous discussion of neural networks, several concepts are mentioned, namely, perceptors, neural networks, and convolution neural networks.
Neural Network: Input Layer, Hidden, Fully Connected Layer
The convolution neural network includes: convolution, activation, pooling - all belong to the hidden layer.Let's review the convolution neural network.
The convolution layer, which defines a filter (observer), requires a de-size, step size, and 0 fill.These factors determine the output size of the convolution layer.
Then the relu is activated, which does not make the signoids because the signoids are computationally intensive and are prone to gradient explosions when there are too many layers in a single network.
Next comes another pooling layer.This is to reduce the amount of data.

Analysis of 02_Verification Code Recognition Principle

Next we will explain the following

Authentication codes exist on the website in a variety of ways, such as the following, which is a picture of the Authentication Code:

So how do we identify such a verification code?Generally speaking, there are two kinds of ideas, division and overall recognition:

If there are any drawbacks to splitting, and if two letters are very close or very far apart, how do we define splitting?For a neural network, it is unnecessary for us to divide it. Generally, we will identify it as a whole.

as follows

Previously, we entered a picture to get the probability of each category he belongs to. So what should he belong to when we enter a verification code like NZPP?What we want to identify is that it contains N,Z,P,P. which means there are four target values after recognition.

If each location is a letter of A-Z, there are only 26 possibilities for each letter.Previously, for handwriting recognition, the output had ten values, that is, the probability of using softmax to convert to ten categories.For our verification code NZPP, it will find that N belongs to the maximum probability of that letter (including other probabilities) out of 26 letters, and the same is true for ZPP.The following:

It takes out the maximum group probability, and we need to get NZPP.So, compared to the previous one, we'll output ten target values for none of the samples here.And it should be compared with the actual one.If I have more than four digits in the verification code, what about 5,6?What should we do?The one-haot codes he created would be the same.That is, each of our sample outputs is a [4*26] matrix size.Previously we output [None,10] for handwriting recognition and [None,4,26] for verification code recognition now.Because there are now four target values.Once we get this value, we need to calculate the cross-entropy at any time.For A-Z26 letters, we can replace them with numbers.This way the computer can understand.Later we'll combine the target values into a column, such as:

03_Authentication Code Recognition Procedure and Picture Data Processing

How do we set up a full connection layer output?There are 104 output values, and each pass through softmax will be transformed into a positive probability.When predicting, the cross-entropy loss is calculated with the actual value.There is a place for every 26 letters.

There are 104 outputs in the full connection layer, which are converted by softmax to 104 probabilities with a target value of 104. Or 4*26.

So how do we write such a program?First, we'll work with the data, including the data that needs to be processed, as well as the label files.The label file must correspond to the picture one-to-one.So there's a big hole.We use os.listdir to read the data directly, and the order of the file names is random and not sorted.So this method is not good.What can we do?We can construct it ourselves.Because it's random.

We use the tfrecords format to store tags and files together.To identify the verification code, we need the following steps:
1. Read from tfrecords, no picture contains image, label.If our gradient falls, no gradient reads 100, image is [100,20,80,3].Its label is [100,4].
2. Establish a model to read data directly into the model.Use full connection layer directly to complete.x=[100,20803], y=predict = [100,426],w=[20803,426].
3. Establish loss, softmax, cross-entropy loss.
The target values we store and read are all represented by data, such as the APCD's verification code, which we read is [1,2,3,4], and we need to convert it to one-hot code.Gradient descent optimization is then used.

So how does one-hot conversion work on it?One-hot coding analysis:


Introduction to one-hot API:

If we want to see the accuracy of gradient descent optimization, how should we calculate it?In a sample, only four letters are all validated correctly before we think there are no errors in the sample.We're still using tf.argmax(), but the difference is that what we're looking for now is a 3-dimensional form, in the form of [None, 4,26].

04_Implementation of Authentication Code Recognition

Write the following code:

import tensorflow as tf

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string("captcha_dir", "./tfrecords/captcha.tfrecords", "Path to Authentication Code Data")
tf.app.flags.DEFINE_integer("batch_size", 100, "Number of samples per batch of training")
tf.app.flags.DEFINE_integer("label_num", 4, "Number of target values per sample")
tf.app.flags.DEFINE_integer("letter_num", 26, "Number of possible hearts of letters taken per target value")


# Define a function to initialize weights
def weight_variables(shape):
    w = tf.Variable(tf.random_normal(shape=shape, mean=0.0, stddev=1.0))
    return w


# Define a function to initialize offsets
def bias_variables(shape):
    b = tf.Variable(tf.constant(0.0, shape=shape))
    return b


def read_and_decode():
    """
    //Read Authentication Code Data API
    :return: image_batch, label_batch
    """
    # 1. Build a file queue
    file_queue = tf.train.string_input_producer([FLAGS.captcha_dir])

    # 2. Build a reader, read the contents of the file, default one sample
    reader = tf.TFRecordReader()

    # Read Content
    key, value = reader.read(file_queue)

    # tfrecords format example, needs to be parsed
    features = tf.parse_single_example(value, features={
        "image": tf.FixedLenFeature([], tf.string),
        "label": tf.FixedLenFeature([], tf.string),
    })

    # Decode content, string content
    # 1. Resolve the eigenvalue of the picture first
    image = tf.decode_raw(features["image"], tf.uint8)
    # 1. Resolve the target value of the picture first
    label = tf.decode_raw(features["label"], tf.uint8)

    # print(image, label)

    # Change Shape
    image_reshape = tf.reshape(image, [20, 80, 3])

    label_reshape = tf.reshape(label, [4])

    print(image_reshape, label_reshape)

    # For batch processing, the number of samples read per batch is 100, which is the sample for each training session
    image_batch, label_btach = tf.train.batch([image_reshape, label_reshape], batch_size=FLAGS.batch_size, num_threads=1, capacity=FLAGS.batch_size)

    print(image_batch, label_btach)
    return image_batch, label_btach


def fc_model(image):
    """
    //Make predictions
    :param image: 100 Picture eigenvalues[100, 20, 80, 3]
    :return: y_predict predicted value[100, 4 * 26]
    """
    with tf.variable_scope("model"):
        # Converting Picture Data Shapes to Two-Dimensional Shapes
        image_reshape = tf.reshape(image, [-1, 20 * 80 * 3])

        # 1. Random Initialization Weight Offset
        # matrix[100, 20 * 80 * 3] * [20 * 80 * 3, 4 * 26] + [104] = [100, 4 * 26]
        weights = weight_variables([20 * 80 * 3, 4 * 26])
        bias = bias_variables([4 * 26])

        # Perform full connection layer calculations [100, 4 * 26]
        y_predict = tf.matmul(tf.cast(image_reshape, tf.float32), weights) + bias

    return y_predict


def predict_to_onehot(label):
    """
    //Convert target values from read files to one-hot encoding
    :param label: [100, 4]      [[13, 25, 15, 15], [19, 23, 20, 16]......]
    :return: one-hot
    """
    # Perform one_hot encoding conversion, provide cross-entropy loss calculation, accuracy calculation [100, 4, 26]
    label_onehot = tf.one_hot(label, depth=FLAGS.letter_num, on_value=1.0, axis=2)

    print(label_onehot)

    return label_onehot


def captcharec():
    """
    //Authentication Code Identifier
    :return:
    """
    # 1. Read the data file label_btch [100,4]
    image_batch, label_batch = read_and_decode()

    # 2. Establish a model by inputting picture feature data to get the prediction result
    # One-level, fully-connected neural networks for prediction
    # matrix [100, 20 * 80 * 3] * [20 * 80 * 3, 4 * 26] + [104] = [100, 4 * 26]
    y_predict = fc_model(image_batch)

    #  [100, 4 * 26]
    print(y_predict)

    # 3. First convert the target value to one-hot encoding [100, 4, 26]
    y_true = predict_to_onehot(label_batch)

    # 4. softmax calculation, cross-entropy loss calculation
    with tf.variable_scope("soft_cross"):
        # Find the average cross-entropy loss, y_true [100, 4, 26]---> [100, 4*26]
        loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
            labels=tf.reshape(y_true, [FLAGS.batch_size, FLAGS.label_num * FLAGS.letter_num]),
            logits=y_predict))
    # 5. Optimized loss of gradient descent
    with tf.variable_scope("optimizer"):

        train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

    # 6. What is the 3-D comparison of the accuracy rate of each batch of sample predictions?
    with tf.variable_scope("acc"):

        # Compare each predicted value with the target value whether the location (4) is the same y_predict [100, 4 * 26]--> [100, 4, 26]
        equal_list = tf.equal(tf.argmax(y_true, 2), tf.argmax(tf.reshape(y_predict, [FLAGS.batch_size, FLAGS.label_num, FLAGS.letter_num]), 2))

        # Equal_list 100 samples [1, 0, 1, 0, 1, 1,.........]
        accuracy = tf.reduce_mean(tf.cast(equal_list, tf.float32))

    # Define an op for an initialization variable
    init_op = tf.global_variables_initializer()

    # Open Session Training
    with tf.Session() as sess:
        sess.run(init_op)

        # Define thread coordinators and open threads (data is read in the file and provided to the model)
        coord = tf.train.Coordinator()

        # Open thread to run read file operation
        threads = tf.train.start_queue_runners(sess, coord=coord)

        # Training Recognition Program
        for i in range(5000):

            sess.run(train_op)

            print("No.%d The accuracy of the lot is:%f" % (i, accuracy.eval()))

        # Recycle Threads
        coord.request_stop()

        coord.join(threads)

    return None


if __name__ == "__main__":
    captcharec()
227 original articles were published. 697 were praised. 120,000 visits+
Private letter follow

Tags: network Session encoding

Posted on Fri, 10 Jan 2020 20:36:23 -0500 by sticks464