Computer vision 1.16, learning rate regulator

Learning Rate Scheduler

MiniVGGNet network was trained on cifar-10 data set before. In order to alleviate over fitting, we introduced the concept of learning rate decay when applying SGD.

This article will discuss the concept of learning rate scheduler, sometimes called adaptive learning rate. By adjusting the learning rate on different epoch s, we can reduce loss, improve accuracy, and even reduce the time of training the network in some cases.

We can think of the process of adjusting the learning rate as:

  1. Use a higher learning rate to find a set of reasonable weights in the early training process.
  2. Then slowly adjust the weight with a smaller learning rate until the optimal weight is found

There are two basic types of learning rate schedulers you may encounter:

  1. Gradually reduce the learning rate according to the epoch number, such as linear, polynomial or exponential equations

  2. Drop according to a specific epoch, such as a piecewise function.

1. Standard attenuation scheduler in keras

Look back at the code we used to initialize SGD:

print("[INFO] compiling model...")
opt = SGD(learning_rate=0.01, decay=0.01/40, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

Here we use learning_rate α \alpha α=0.01,momentum γ \gamma γ= 0.9 and point out that Nesterov accelerated gradient is used in our door. Then we divide the learning rate by the total number of epoch s, and the result is 0.01/40 = 0.00025

At the bottom layer, Keras calls the following learning rate scheduler to adjust the learning rate of each epoch:
α e + 1 = α e × 1 / ( 1 + γ ∗ e ) \alpha_{e+1}=\alpha_e \times 1/(1+\gamma*e) αe+1​=αe​×1/(1+γ∗e)
When γ \gamma γ= At 0, lr remains unchanged

When γ \gamma γ= At 0.01/40, lr will begin to decrease at the end of each cycle

2. Phase based attenuation

This scheduler will automatically lower the learning rate after a specific epoch. We can regard it as a piecewise function. In this case, the learning rate will maintain a constant within a few epochs, then drop suddenly, then continue to maintain a constant for a few epochs, and then drop suddenly, and so on.

When our learning rate scheduler uses this phase attenuation, we have two options:

  1. Define this piecewise function about learning rate

  2. When training the neural network, notice that the performance is not good at verification, stop script with ctrl+c, then adjust the learning rate, then continue training.

This paper mainly focuses on the first method. The second method is more advanced. It is often used to train deep neural networks on large data sets. At this time, it is impossible to predict where we want to adjust the learning rate, so we use the second method.

3. Customize the learning rate scheduler of keras

The Keras library provides us with the LearningRateScheduler class, so that we can define a personalized learning rate function and apply it automatically during training.

This user-defined function requires epoch as a parameter, and then calculates the corresponding learning rate based on the function we define.

We define a piecewise function to reduce the learning rate according to the specific factor F for every D epoch. Our function is as follows:
α E + 1 = α 1 × F ( 1 + E ) / D \alpha_{E+1}=\alpha_1\times F^{(1+E)/D} αE+1​=α1​×F(1+E)/D
among α 1 \alpha_1 α 1 ¢ is the initial learning rate, F is the factor controlling the reduction of learning rate, and D is the number of epoch s. The implementation code is as follows:

alpha = initAlpha * (factor ** np.floor((1 + epoch) / dropEvery))

Open a python file named cifar10_lr_decay.py, write the following code:

import matplotlib

matplotlib.use("Agg")

from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from nn.conv.minivggnet import MiniVGGNet
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import cifar10
import matplotlib.pyplot as plt
import numpy as np


def step_decay(epoch):
    # Initialize the basic learning rate, drop factor and how many cycles drop once
    initAlpha = 0.01
    factor = 0.25
    dropEvery = 5

    # Calculate the learning rate of the current epoch
    alpha = initAlpha * (factor ** np.floor((1 + epoch) / dropEvery))

    return float(alpha)


# Define the path to the output loss/acc diagram
output = "/Users/liushanlin/PycharmProjects/DLstudy/result"

# Load training data and test data and zoom to the range of [0, 1]
print("[INFO] loading cifar_10 data...")
((trainX, trainY), (testX, testY)) = cifar10.load_data()
trainX = trainX.astype("float") / 255.0
testX = testX.astype("float") / 255.0

# Convert labels from integers to vectors
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# Initialize the label name of the CIFAR-10 dataset
labelNames = ["airplane", "automobile", "bird", "cat", "deer", "dog",
              "frog", "horse", "ship", "truck"]

# Pass the model to the callbacks set when defining training
callbacks = [LearningRateScheduler(step_decay)]

# Initialize model and optimizer
opt = SGD(learning_rate=0.01, momentum=0.9, nesterov=True)
model = MiniVGGNet.build(width=32, height=32, depth=3, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# Training network
H = model.fit(trainX, trainY, validation_data=(testX, testY),
              batch_size=64, epochs=40, callbacks=callbacks, verbose=1)

# Evaluation network
print("[INFO] evaluating network...")
predictions = model.predict(testX, batch_size=64)
print(classification_report(testY.argmax(axis=1), predictions.argmax(axis=1), target_names=labelNames))

#Drawing
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 40), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 40), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 40), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, 40), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on CIFAR-10")
plt.xlabel("Epoch#")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.savefig(output)

Operation results:

            precision    recall  f1-score   support

    airplane       0.83      0.72      0.77      1000
  automobile       0.92      0.80      0.86      1000
        bird       0.72      0.56      0.63      1000
         cat       0.59      0.54      0.56      1000
        deer       0.64      0.80      0.71      1000
         dog       0.65      0.68      0.67      1000
        frog       0.73      0.88      0.80      1000
       horse       0.85      0.77      0.81      1000
        ship       0.82      0.89      0.85      1000
       truck       0.79      0.88      0.83      1000

    accuracy                           0.75     10000
   macro avg       0.76      0.75      0.75     10000
weighted avg       0.76      0.75      0.75     10000

It can be seen that the accuracy rate of our network is only 76%, and the learning rate decreases very fast. After 15epoch, the learning rate is only 0.00125, which means that the pace of our network is very small.

If we set factor=0.5, what will be the result?

Change as follows:

def step_decay(epoch):
    # Initialize the basic learning rate, drop factor and how many cycles drop once
    initAlpha = 0.01
    factor = 0.5
    dropEvery = 5

    # Calculate the learning rate of the current epoch
    alpha = initAlpha * (factor ** np.floor((1 + epoch) / dropEvery))

    return float(alpha)

Run the program again and the results are as follows:

              precision    recall  f1-score   support

    airplane       0.81      0.81      0.81      1000
  automobile       0.90      0.88      0.89      1000
        bird       0.71      0.61      0.66      1000
         cat       0.65      0.55      0.60      1000
        deer       0.69      0.78      0.74      1000
         dog       0.67      0.69      0.68      1000
        frog       0.79      0.86      0.82      1000
       horse       0.82      0.83      0.83      1000
        ship       0.87      0.90      0.88      1000
       truck       0.83      0.89      0.86      1000

    accuracy                           0.78     10000
   macro avg       0.78      0.78      0.78     10000
weighted avg       0.78      0.78      0.78     10000


The process has ended with exit code 0

Tags: Machine Learning AI Computer Vision

Posted on Thu, 04 Nov 2021 19:44:34 -0400 by aisalen