Sometimes technology enhances art, sometimes it destroys art.
Coloring black-and-white films is a very old idea that dates back to 1902. For decades, many film creators have opposed the idea of coloring black-and-white films and considered it a destruction of their art. Today, it is regarded as an improvement of art form.
Isn't it cool if the algorithm doesn't use any user input?
1. Define coloring problems
Let's first define the coloring problem according to the CIE Lab color space. Like RGB color space, it is a 3-channel color space, but different from RGB color space, color information is encoded only in a (green red component) and b (blue yellow component) channels. The L (brightness) channel encodes only intensity information.
The grayscale image we want to color can be regarded as the L channel of the image in the Lab color space. Our goal is to find the a and b components. The resulting Lab image can be converted to RGB color space using standard color space transformation. For example, in OpenCV, this can be done with color_ The bgr2Lab option is implemented by cvtColor.
In order to simplify the calculation, the AB space of Lab color space is quantized into 313 bin, as shown in the figure below. Instead of finding a and b values for each pixel, because of this quantization, we only need to find a bin number between 0 and 312. Another way to think about the problem is that we already have an L channel with a value from 0 to 255. We need to find an ab channel with a value between 0 and 312. Therefore, the color prediction task has now become a multi classification problem, in which each gray pixel has 313 categories to choose from.
2. CNN architecture for coloring
The architecture proposed by Zhang et al. Is a VGG network with multiple convolution blocks. Each block has two or three convolution layers, followed by a modified linear unit (ReLU), and terminates in a batch normalization layer (BN). Unlike VGG networks, it has no pooling or full connection layer.
The input image is rescaled to 224 × 224. Let's use
X
X
X represents the rescaled grayscale input image.
When it passes through the neural network shown in the figure above, it will be transformed into Z ^ \hat Z Z^. Mathematically, this transformation of the network can be written as Z ^ = G ( X ) \hat Z = G(X) Z^=G(X).
Z ^ \hat Z The dimension of Z ^ is H ∗ W ∗ Q H*W*Q H * W * Q, where H ( = 56 ) H(=56) H(=56) and W ( = 56 ) W(=56) W(=56) is the height and width of the last convolution output. For each H ∗ W H*W H * W pixels, Z ^ \hat Z Z ^ contains a Q ( = 313 ) Q(=313) Q(=313) value vector, where each value represents the probability that the pixel belongs to this class. Our goal is for each probability distribution Z ^ h , w \hat Z_{h,w} Z^h,w find a pair of ab channel values.
3. From Z ^ \hat Z Z ^ restore color image
CNN shown above provides us with input images from resizing X X Distribution set of X Z ^ \hat Z Z^. Let's see how to start from Z ^ \hat Z Recover a single ab value pair in each distribution in Z ^.
You might think that we can simply average the distribution and select the ab pair corresponding to the nearest quantization bin center. Unfortunately, this distribution is not Gaussian, and the mean value of the distribution only corresponds to unnatural unsaturated colors. To understand this, think about the color of the sky - sometimes blue, sometimes orange. The distribution of sky color is bimodal. When coloring the sky, blue or yellow will produce a reasonable color. But the average of blue and yellow is a boring gray.
So why not use the distribution pattern to get a blue or yellow sky? Of course, the author has tried. Although it provides bright colors, it sometimes destroys spatial consistency. Their solution is to interpolate between the mean and the pattern estimation to obtain an amount called the annealed mean. A parameter called temperature (T) is used to control the degree of interpolation. The final value of T=0.38 is used as a trade-off between the two extremes.
send
use
temperature
degree
(
T
)
of
retreat
fire
all
value
use
to
stay
branch
cloth
of
all
value
and
many
number
of
between
enter
that 's ok
insert
value
.
The annealing mean of temperature (T) is used to interpolate between the mean and mode of the distribution.
The annealing mean of temperature (T) is used to interpolate between the mean and mode of the distribution.
Corresponding to Z ^ \hat Z The ab pair of annealing mean of Z ^ distribution is expressed as Y ^ h , w \hat Y_{h,w} Y^h,w, can be written as the original distribution Z ^ h , w \hat Z_{h,w} Transformation of Z^h,w + Y ^ = H ( Z ^ ) \hat Y=H(\hat Z) Y^=H(Z^)
Note that when the image passes through CNN, its size decreases to 56 × 56. Therefore, the predicted ab image Y ^ \hat Y Y ^ also has 56 × 56 dimensions. In order to obtain a color image, it is up sampled to the original image size and then added to the brightness channel L to generate the final color image.
4. Polynomial loss function with color rebalancing
All neural networks are trained by defining loss functions. The goal of the training process is to minimize the loss of the training set. In the coloring problem, the training data consists of thousands of color images and their gray versions.
CNN's output is Z ^ \hat Z Z ^, the input image is X X X. We need to convert all color images in the training set to their corresponding values. Mathematically, we just want to reverse the mapping H H H Z = H − 1 ( Y ) Z=H^{-1}(Y) Z=H−1(Y)
For output images Y Y Each pixel of Y Y h , w Y_{h,w} Yh,w, we can simply find the nearest ab bin and Z h , w Z_{h,w} Zh,w , is expressed as a single heat vector, where we assign 1 to the nearest ab bin and 0 to all other 312 bin. However, in order to obtain better results, we consider five nearest neighbors and use Gaussian distribution to calculate the distribution Z h , w Z_{h,w} Zh,w, depending on the distance from the true value.
If you have used CNN before, you may want to use standard cross entropy loss to compare the real value
Z
Z
Z and predicted value
Z
^
\hat Z
Z^.
L
(
Z
^
,
Z
)
=
−
1
H
W
∑
h
,
w
∑
q
Z
h
,
w
,
q
l
o
g
(
Z
^
h
,
w
,
q
)
L(\hat Z, Z) = -\frac{1}{HW}\sum_{h,w}\sum_qZ_{h,w,q}log(\hat Z_{h,w,q})
L(Z^,Z)=−HW1h,w∑q∑Zh,w,qlog(Z^h,w,q)
Unfortunately, the above loss function produces very dim colors. This is because the color distribution in ImageNet is heavy around the gray line.
5. Colorization results
The authors shared two versions of the trained Caffe model - with and without color rebalancing. We tried two versions and shared the results in the figure below. The middle column shows the version without rebalance color, and the last column shows the rebalance version.
As we can see, color rebalancing makes many images very lively. Most of them are specious colors. On the other hand, sometimes it will add some unnecessary saturated color blocks to some images.
Remember, when we try to convert grayscale images to color images, there may be a variety of reasonable solutions. Therefore, the way to evaluate good coloring is not how well it matches the basic facts, but how credible and pleasant it looks in people's eyes.
5.1 animals
The model performs very well on animal images, especially cats and dogs. This is because ImageNet contains a very large collection of these animals.
5.2 outdoor scenes
The model also performs very well in the outdoor scene of blue sky and green vegetation. Also note that given the outline of a tree, the model predicts an orange sky, indicating that it has captured the concept of sunset.
5.3 sketches
Finally, even for sketches, the model produces reasonable shading.
6. Realize shading in OpenCV
Author here position The pre training model and network details are provided in GitHub. Next, we will review Python and C + + code and use these pre training models to color a given gray image. Our code is based on the OpenCV sample code. We used OpenCV version 4.5.1. We also provide code to color a given grayscale video.
Link: https://pan.baidu.com/s/14_my8daL2SxgFVymQy-0Dg Extraction code: 1 v8m
(1)Python
Image shading Code:
# colorizeImage.py # Usage # python colorizeImage.py --input greyscaleImage.png import numpy as np import cv2 as cv import argparse import os.path parser = argparse.ArgumentParser(description='Colorize GreyScale Image') parser.add_argument('--input', help='Path to image.', default="greyscaleImage.png") args = parser.parse_args() if args.input==None: print('Please give the input greyscale image name.') print('Usage example: python3 colorizeImage.py --input greyscaleImage.png') exit() if os.path.isfile(args.input)==0: print('Input file does not exist') exit() # Read input image frame = cv.imread(args.input) # Specify the path of 2 model files protoFile = "./models/colorization_deploy_v2.prototxt" weightsFile = "./models/colorization_release_v2.caffemodel" # weightsFile = "./models/colorization_release_v2_norebal.caffemodel" # Load cluster center pts_in_hull = np.load('./pts_in_hull.npy') # Read network into memory net = cv.dnn.readNetFromCaffe(protoFile, weightsFile) # The cluster center is filled with 1x1 convolution kernel pts_in_hull = pts_in_hull.transpose().reshape(2, 313, 1, 1) net.getLayer(net.getLayerId('class8_ab')).blobs = [pts_in_hull.astype(np.float32)] net.getLayer(net.getLayerId('conv8_313_rh')).blobs = [np.full([1, 313], 2.606, np.float32)] #Sample from opencv W_in = 224 H_in = 224 img_rgb = (frame[:,:,[2, 1, 0]] * 1.0 / 255).astype(np.float32) img_lab = cv.cvtColor(img_rgb, cv.COLOR_RGB2Lab) img_l = img_lab[:,:,0] # Pull out the L channel # Adjust the brightness channel to the network input size img_l_rs = cv.resize(img_l, (W_in, H_in)) # img_l_rs -= 50 # Minus 50 centralization net.setInput(cv.dnn.blobFromImage(img_l_rs)) ab_dec = net.forward()[0,:,:,:].transpose((1,2,0)) # This is our result (H_orig,W_orig) = img_rgb.shape[:2] # Original image size ab_dec_us = cv.resize(ab_dec, (W_orig, H_orig)) img_lab_out = np.concatenate((img_l[:,:,np.newaxis],ab_dec_us),axis=2) # Connect to original image L img_bgr_out = np.clip(cv.cvtColor(img_lab_out, cv.COLOR_Lab2BGR), 0, 1) outputFile = args.input[:-4]+'_colorized.png' cv.imwrite(outputFile, (img_bgr_out*255).astype(np.uint8)) print('Colorized image saved as '+outputFile) print('Done !!!')
Video coloring code
# colorizeVideo.py # Usage # python colorizeVideo.py --input greyscaleVideo.mp4 import numpy as np import cv2 as cv import argparse import os.path parser = argparse.ArgumentParser(description='Colorize GreyScale Video') parser.add_argument('--input', help='Path to video file.', default="greyscaleVideo.mp4") args = parser.parse_args() if args.input == None: print('Please give the input greyscale video file.') print('Usage example: python colorizeVideo.py --input greyscaleVideo.mp4') exit() if os.path.isfile(args.input) == 0: print('Input file does not exist') exit() # Read input video cap = cv.VideoCapture(args.input) hasFrame, frame = cap.read() outputFile = args.input[:-4] + '_colorized.avi' vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M', 'J', 'P', 'G'), 60, (frame.shape[1], frame.shape[0])) # Specify the path of 2 model files protoFile = "./models/colorization_deploy_v2.prototxt" weightsFile = "./models/colorization_release_v2.caffemodel" # weightsFile = "./models/colorization_release_v2_norebal.caffemodel" # Load cluster center pts_in_hull = np.load('./pts_in_hull.npy') # Read network into memory net = cv.dnn.readNetFromCaffe(protoFile, weightsFile) # The cluster center is filled with 1x1 convolution kernel pts_in_hull = pts_in_hull.transpose().reshape(2, 313, 1, 1) net.getLayer(net.getLayerId('class8_ab')).blobs = [pts_in_hull.astype(np.float32)] net.getLayer(net.getLayerId('conv8_313_rh')).blobs = [np.full([1, 313], 2.606, np.float32)] # Sample from opencv W_in = 224 H_in = 224 while cv.waitKey(1): hasFrame, frame = cap.read() frameCopy = np.copy(frame) if not hasFrame: break img_rgb = (frame[:, :, [2, 1, 0]] * 1.0 / 255).astype(np.float32) img_lab = cv.cvtColor(img_rgb, cv.COLOR_RGB2Lab) img_l = img_lab[:, :, 0] # Pull out the L channel # Adjust the brightness channel to the network input size img_l_rs = cv.resize(img_l, (W_in, H_in)) img_l_rs -= 50 # Minus 50 centralization net.setInput(cv.dnn.blobFromImage(img_l_rs)) ab_dec = net.forward()[0, :, :, :].transpose((1, 2, 0)) # result (H_orig, W_orig) = img_rgb.shape[:2] # Original image size ab_dec_us = cv.resize(ab_dec, (W_orig, H_orig)) img_lab_out = np.concatenate((img_l[:, :, np.newaxis], ab_dec_us), axis=2) # Splice with original L-channel img_bgr_out = np.clip(cv.cvtColor(img_lab_out, cv.COLOR_Lab2BGR), 0, 1) vid_writer.write((img_bgr_out * 255).astype(np.uint8)) vid_writer.release() print('Colorized video saved as ' + outputFile) print('Done !!!')
(2)C++
Image shading code
// colorizeImage.cpp // Usage // ./colorizeImage.out greyscaleImage.png #include <opencv2/dnn.hpp> #include <opencv2/imgproc.hpp> #include <opencv2/highgui.hpp> #include <iostream> using namespace cv; using namespace cv::dnn; using namespace std; // From PTS_ in_ 313 ab cluster centers of hull.npy (transposed) static float hull_pts[] = { -90., -90., -90., -90., -90., -80., -80., -80., -80., -80., -80., -80., -80., -70., -70., -70., -70., -70., -70., -70., -70., -70., -70., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 50., 60., 70., 80., 90., 20., 30., 40., 50., 60., 70., 80., 90., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0. }; int main(int argc, char **argv) { string imageFileName; // Get parameters from the command line if (argc < 2) { cout << "Please input the greyscale image filename." << endl; cout << "Usage example: ./colorizeImage.out greyscaleImage.png" << endl; return 1; } imageFileName = argv[1]; Mat img = imread(imageFileName); if (img.empty()) { cout << "Can't read image from file: " << imageFileName << endl; return 2; } string protoFile = "./models/colorization_deploy_v2.prototxt"; string weightsFile = "./models/colorization_release_v2.caffemodel"; //string weightsFile = "./models/colorization_release_v2_norebal.caffemodel"; double t = (double) cv::getTickCount(); // Fixed input size of pre training network const int W_in = 224; const int H_in = 224; Net net = dnn::readNetFromCaffe(protoFile, weightsFile); // Set additional layers: int sz[] = {2, 313, 1, 1}; const Mat pts_in_hull(4, sz, CV_32F, hull_pts); Ptr<dnn::Layer> class8_ab = net.getLayer("class8_ab"); class8_ab->blobs.push_back(pts_in_hull); Ptr<dnn::Layer> conv8_313_rh = net.getLayer("conv8_313_rh"); conv8_313_rh->blobs.push_back(Mat(1, 313, CV_32F, Scalar(2.606))); // Extract the L channel and subtract the average Mat lab, L, input; img.convertTo(img, CV_32F, 1.0/255); cvtColor(img, lab, COLOR_BGR2Lab); extractChannel(lab, L, 0); resize(L, input, Size(W_in, H_in)); input -= 50; // Run L channel over network Mat inputBlob = blobFromImage(input); net.setInput(inputBlob); Mat result = net.forward(); // Retrieve the calculated a,b channels from the network output Size siz(result.size[2], result.size[3]); Mat a = Mat(siz, CV_32F, result.ptr(0,0)); Mat b = Mat(siz, CV_32F, result.ptr(0,1)); resize(a, a, img.size()); resize(b, b, img.size()); // Merge and convert back to BGR Mat color, chn[] = {L, a, b}; merge(chn, 3, lab); cvtColor(lab, color, COLOR_Lab2BGR); t = ((double)cv::getTickCount() - t)/cv::getTickFrequency(); cout << "Time taken : " << t << " secs" << endl; string str = imageFileName; str.replace(str.end()-4, str.end(), ""); str = str+"_colorized.png"; color = color*255; color.convertTo(color, CV_8U); imwrite(str, color); cout << "Colorized image saved as " << str << endl; return 0; }
Video coloring code
// colorizeVideo.cpp // Usage // ./colorizeVideo.out greyscaleVideo.mp4 #include <opencv2/dnn.hpp> #include <opencv2/imgproc.hpp> #include <opencv2/highgui.hpp> #include <iostream> using namespace cv; using namespace cv::dnn; using namespace std; // From PTS_ in_ 313 ab cluster centers of hull.npy (transposed) static float hull_pts[] = { -90., -90., -90., -90., -90., -80., -80., -80., -80., -80., -80., -80., -80., -70., -70., -70., -70., -70., -70., -70., -70., -70., -70., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -60., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -50., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -40., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -30., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -20., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., -10., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 10., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 20., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 30., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 40., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 50., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 60., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 70., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 80., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 90., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 50., 60., 70., 80., 90., 20., 30., 40., 50., 60., 70., 80., 90., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., 100., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., 90., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., 80., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., -110., -100., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0., 10., 20., 30., 40., 50., 60., 70., -90., -80., -70., -60., -50., -40., -30., -20., -10., 0. }; int main(int argc, char **argv) { string videoFileName; // Get parameters from the command line if (argc < 2) { cout << "Please input the greyscale video filename." << endl; cout << "Usage example: ./colorizeVideo.out greyscaleVideo.mp4" << endl; return 1; } videoFileName = argv[1]; cv::VideoCapture cap(videoFileName); if (!cap.isOpened()) { cerr << "Unable to open video" << endl; return 1; } string protoFile = "./models/colorization_deploy_v2.prototxt"; string weightsFile = "./models/colorization_release_v2.caffemodel"; //string weightsFile = "./models/colorization_release_v2_norebal.caffemodel"; Mat frame, frameCopy; int frameWidth = cap.get(CAP_PROP_FRAME_WIDTH); int frameHeight = cap.get(CAP_PROP_FRAME_HEIGHT); string str = videoFileName; str.replace(str.end()-4, str.end(), ""); string outVideoFileName = str+"_colorized.avi"; VideoWriter video(outVideoFileName, VideoWriter::fourcc('M','J','P','G'), 60, Size(frameWidth,frameHeight)); // Fixed input size of pre training network const int W_in = 224; const int H_in = 224; Net net = dnn::readNetFromCaffe(protoFile, weightsFile); // Set additional layers int sz[] = {2, 313, 1, 1}; const Mat pts_in_hull(4, sz, CV_32F, hull_pts); Ptr<dnn::Layer> class8_ab = net.getLayer("class8_ab"); class8_ab->blobs.push_back(pts_in_hull); Ptr<dnn::Layer> conv8_313_rh = net.getLayer("conv8_313_rh"); conv8_313_rh->blobs.push_back(Mat(1, 313, CV_32F, Scalar(2.606))); for(;;) { cap >> frame; if (frame.empty()) break; frameCopy = frame.clone(); // Extract the L channel and subtract the average Mat lab, L, input; frame.convertTo(frame, CV_32F, 1.0/255); cvtColor(frame, lab, COLOR_BGR2Lab); extractChannel(lab, L, 0); resize(L, input, Size(W_in, H_in)); input -= 50; // Run L channel over network Mat inputBlob = blobFromImage(input); net.setInput(inputBlob); Mat result = net.forward(); // Extract the calculated a and B channels from the network output Size siz(result.size[2], result.size[3]); Mat a = Mat(siz, CV_32F, result.ptr(0,0)); Mat b = Mat(siz, CV_32F, result.ptr(0,1)); resize(a, a, frame.size()); resize(b, b, frame.size()); // Merge and convert back to BGR Mat coloredFrame, chn[] = {L, a, b}; merge(chn, 3, lab); cvtColor(lab, coloredFrame, COLOR_Lab2BGR); coloredFrame = coloredFrame*255; coloredFrame.convertTo(coloredFrame, CV_8U); video.write(coloredFrame); } cout << "Colorized video saved as " << outVideoFileName << endl << "Done !!!" << endl; cap.release(); video.release(); return 0; }
7. Code analysis
7.1 reading model
We provide the paths of protoFile and weightsFile in the code. Choose the right model, depending on whether you want to use color balance. We use the color rebalancing model by default. Read the input image and define the input size of the network as 224 × 224. Read the network into memory.
python
# Specifies the path to the model file protoFile = "./models/colorization_deploy_v2.prototxt" weightsFile = "./models/colorization_release_v2.caffemodel" #weightsFile = "./models/colorization_release_v2_norebal.caffemodel"; # Read input image frame = cv.imread("./dog-greyscale.png") W_in = 224 H_in = 224 # Read network into memory net = cv.dnn.readNetFromCaffe(protoFile, weightsFile)
C++
// Specify the path of 2 files string protoFile = "./models/colorization_deploy_v2.prototxt"; string weightsFile = "./models/colorization_release_v2.caffemodel"; //string weightsFile = "./models/colorization_release_v2_norebal.caffemodel"; Mat img = imread(imageFile); const int W_in = 224; const int H_in = 224; // Read network into memory Net net = readNetFromCaffe(protoFile, weightsFile);
7.1 loading quantized bin cluster centers
Next, we load the quantized bin cluster center. Then we assign 1x1 cores to each of the 313 bin cluster centers and assign them to the corresponding layer in the network. Finally, we add a non-zero scaling layer.
python
# Load bin cluster center pts_in_hull = np.load('./pts_in_hull.npy') # The cluster center is filled with 1x1 convolution kernel pts_in_hull = pts_in_hull.transpose().reshape(2, 313, 1, 1) net.getLayer(net.getLayerId('class8_ab')).blobs = [pts_in_hull.astype(np.float32)] net.getLayer(net.getLayerId('conv8_313_rh')).blobs = [np.full([1, 313], 2.606, np.float32)]
C++
// The cluster center is filled with 1x1 convolution kernel int sz[] = {2, 313, 1, 1}; const Mat pts_in_hull(4, sz, CV_32F, hull_pts); Ptr<dnn::Layer> class8_ab = net.getLayer("class8_ab"); class8_ab->blobs.push_back(pts_in_hull); Ptr<dnn::Layer> conv8_313_rh = net.getLayer("conv8_313_rh"); conv8_313_rh->blobs.push_back(Mat(1, 313, CV_32F, Scalar(2.606)));
7.3 converting images to CIE Lab color space
Scale the input RGB image so that its value is in the range of 0-1, then convert it into Lab color space and extract the brightness channel.
python
# Converts the rgb value of the input image to a range of 0 to 1 img_rgb = (frame[:,:,[2, 1, 0]] * 1.0 / 255).astype(np.float32) img_lab = cv.cvtColor(img_rgb, cv.COLOR_RGB2Lab) img_l = img_lab[:,:,0] # Pull out the L channel
C++
Mat lab, L, input; img.convertTo(img, CV_32F, 1.0/255); cvtColor(img, lab, COLOR_BGR2Lab); extractChannel(lab, L, 0);
The brightness channel in the original image is adjusted to the network input size, in this case (224). Typically, the brightness channel ranges from 0 to 100. So we subtract 50 to make it centered on 0.
python
# Adjust the brightness channel to the network input size img_l_rs = cv.resize(img_l, (W_in, H_in)) img_l_rs -= 50 # Centralization
C++
Mat lab, L, input; img.convertTo(img, CV_32F, 1.0/255); cvtColor(img, lab, COLOR_BGR2Lab); extractChannel(lab, L, 0);
Then we provide the scaled mean central brightness channel to the network as the input of forward transmission. The output of the forward transfer is the prediction ab channel of the image. It is scaled back to the original image size and then combined with the original size brightness image (extracted earlier in the original resolution) to obtain the output Lab image. Then it is converted to RGB color space to obtain the final color image. Then we can save the output image.
python
net.setInput(cv.dnn.blobFromImage(img_l_rs)) ab_dec = net.forward()[0,:,:,:].transpose((1,2,0)) # result (H_orig,W_orig) = img_rgb.shape[:2] # Original image size ab_dec_us = cv.resize(ab_dec, (W_orig, H_orig)) img_lab_out = np.concatenate((img_l[:,:,np.newaxis],ab_dec_us),axis=2) # Stitching with original image L img_bgr_out = np.clip(cv.cvtColor(img_lab_out, cv.COLOR_Lab2BGR), 0, 1) cv.imwrite('dog_colorized.png', cv.resize(img_bgr_out*255, imshowSize))
C++
// Run L channel over network Mat inputBlob = blobFromImage(input); net.setInput(inputBlob); Mat result = net.forward(); // Extract the calculated a and B channels from the network output and adjust them to the original image size Size siz(result.size[2], result.size[3]); Mat a = Mat(siz, CV_32F, result.ptr(0,0)); Mat b = Mat(siz, CV_32F, result.ptr(0,1)); resize(a, a, img.size()); resize(b, b, img.size()); // Merge and convert back to BGR Mat color, chn[] = {L, a, b}; merge(chn, 3, lab); cvtColor(lab, color, COLOR_Lab2BGR);
Result display
Reference catalogue
https://learnopencv.com/convolutional-neural-network-based-image-colorization-using-opencv/
https://github.com/richzhang/colorization/blob/caffe/colorization/demo/colorization_demo_v2.ipynb