Image Interpolation: Theory and Python Implementation

Preface

Reference to this paper: Deep Learning for Image Super-resolution:A Survey 1 Interpolation Method
Simply put, interpolation refers to the use of known points to "guess" unknown points. In the image field, interpolation is often used to modify the size of an image. Points in a new image matrix are computed and inserted from the points in an old image matrix. Different interpolation algorithms are used.
There are many interpolation algorithms. Here are three closely related algorithms:

  • Nearest Interpolation: The fastest calculation, but the least effective.
  • Bilinear Interpolation: Bilinear interpolation uses 4 (2*2) points in the original image to compute 1 point in the new image. The result is slightly less than bicubic interpolation, which is faster than bicubic interpolation. It is a balanced beauty and belongs to the default algorithm in many frameworks.
  • Bicubic interpolation: Bicubic interpolation calculates one point in a new image from 16 (4*4) points in the original image. The result is better, but the computational cost is too high.

Nearest neighbor interpolation

(1) Theory

nearest_neighbor is the simplest gray interpolation, also known as zero-order interpolation, in which the gray value of the transformed pixel is equal to the gray value of the nearest input pixel.
The nearest neighbor interpolation coordinate transformation calculation formula:
  srcX=dstX*(srcWidth/dstWidth)
  srcY=dstY*(srcHeight/dstHeight)
In the above formula, dstX and dstY are the horizontal and vertical coordinates of a pixel of the target image, dstWidth and dstHeight are the length and width of the target image, srcWidth and srcHeight are the width and height of the original image, srcX and srcY are the coordinates of the original image corresponding to the point (dstX, dstY). Like opencv, the upper left corner is the (0,0) coordinate.

The right image is an enlarged target image with coordinates (3,2) at?, calculated from the formula.
srcX=3*(2/4)=1.5,srcY=2*(2/4)=1;
Therefore, the pixel at? Should be the value of the (1.5,1) pixel in the original image, but the pixel coordinates do not have decimals. Rounding is generally used to select the nearest neighbor, so the final result is (2,1), which corresponds to the orange color of the original image.

(2) python implementation

import numpy as np
import cv2

# Nearest neighbor interpolation
def nearest_neighbour(src, dst_shape):
    # Get Original Dimension
    src_height, src_width = src.shape[0], src.shape[1]
    # Calculate New Graph Dimensions
    dst_height, dst_width, channels = dst_shape[0], dst_shape[1], dst_shape[2]

    dst = np.zeros(shape = (dst_height, dst_width, channels), dtype=np.uint8)
    for dst_x in range(dst_height):
        for dst_y in range(dst_width):
            # Finding the corresponding coordinates of the source image
            src_x = dst_x * (src_width/dst_width)
            src_y = dst_y * (src_width/dst_width)
            
            # Rounding will exceed the index, which uses downward rounding, which is 1.5->2 and now 1.5->1             
            src_x = int(src_x)             
            src_y = int(src_y)

            # interpolation
            ddst[dst_x, dst_y,:] = src[src_x, src_y, :]
    return dst

src = cv2.imread('me.jpg')
dst = bicubic_interpolation(src, dst_shape=(720, 540, 3))

# display 
cv2.namedWindow('src', cv2.WINDOW_NORMAL) 
cv2.imshow("src",src) 
cv2.namedWindow('dst', cv2.WINDOW_NORMAL) 
cv2.imshow("dst",dst)
cv2.waitKey(0)

bilinear interpolation

Before looking at bilinear interpolation, first look at how bilinear interpolation is implemented so that you can easily understand bilinear interpolation.

(1) Single Linear Interpolation

Given the P1 and P2 points in the graph, coordinates are (x1, y1), (x2, y2). To calculate the y-value of a position x on a straight line within the [x1, x2] interval

According to junior high school knowledge, find a straight line formula at 2 points (this is the only basic formula required for bilinear interpolation)

Following is a simple format:
y1 and y2 represent the pixel values in the original image, respectively. The above formula can be written as follows:

(2) Bilinear interpolation

Known Q11(x1,y1), Q12(x1,y2), Q21(x2,y1), Q22(x2,y2), find the value of the midpoint P(x,y).

Bilinear interpolation calculates a total of three times of single linear interpolation in two directions, as shown in the figure. First, two times of single linear interpolation in the X direction is performed to obtain two temporary points, R1(x, y1) and R2(x, y2). Then, one time of single linear interpolation in the Y direction is calculated to derive P(x, y) (in fact, the same result is obtained when the direction of the two axes is changed from y to x).
(1) The x-direction monolinear interpolation is carried directly into the last formula of the previous step.

(2) y-direction monolinear interpolation

Take the first result to the second step
When bilinear interpolation is performed on images, it is not difficult to find such a relationship in the calculation:

Then all denominators in the above formula are 1, as follows:

Legacy issues: The origin (0, 0) of the source and target images are selected as the upper left corner, and then each pixel of the target image is calculated according to the interpolation formula. Assuming you need to reduce a 5x5 image to 3x3, the corresponding relationship between each pixel of the source and target images is as follows:

A line is drawn to illustrate that if the upper right corner is chosen as the origin (0, 0), then the rightmost and lowest pixels are not actually involved in the calculation, and each pixel of the target image calculates a gray value that is more left-hand than the source image.
So what about adding 1 to the coordinates or choosing the lower right corner as the origin? Unfortunately, it still works the same way, but this time the image will be lower right.
The best way is that the geometric centers of the two images are coincident, and each pixel of the target image is evenly spaced from each other and has a margin on both sides. This is what matlab and openCV do. The following figure:

The optimization algorithm for the geometric center problem is as follows:

(3) Calculation process

Write a bunch of formulas do not know how to solve, the requirements are x and y, but there are x and Y in the expression, so how to solve? In image processing, we first base on
src_x=(dst_x+0.5) * (src_width/dst_width) - 0.5
src_y = (dst_y+0.5) * (src_height/dst_height) - 0.5
To calculate the location of the target pixel in the source image, where srcX and srcY are generally floating point numbers, such as f(1.2, 3.4) This pixel point is virtual, first find four actual pixel points adjacent to it
  (1,3) (2,3)
  (1,4) (2,4)
Write as f(i+u,j+v), then u=0.2,v=0.4, i=1, j=3
F(R1)=u(f(Q21)-f(Q11)+f(Q11) when interpolating along the difference in X direction
Calculate similarly along the Y direction. Or, sort out one step calculations directly.
f(i+u,j+v) = (1-u)(1-v)f(i,j) + (1-u)vf(i,j+1) + u(1-v)f(i+1,j) + uvf(i+1,j+1)
Paste a handwritten calculation process:

(4) python implementation

# bilinear interpolation
def bilinear_interpolation(src, dst_shape):
    # Get Original Dimension
    src_height, src_width = src.shape[0], src.shape[1]
    # Calculate New Graph Dimensions
    dst_height, dst_width, channels = dst_shape[0], dst_shape[1], dst_shape[2]
    
    dst = np.zeros(shape = (dst_height, dst_width, channels), dtype=np.uint8)
    for dst_x in range(dst_height):
        for dst_y in range(dst_width):
            # Finding the corresponding coordinates of the source image
            src_x = (dst_x+0.5) * (src_width/dst_width) - 0.5
            src_y = (dst_y+0.5) * (src_width/dst_width) - 0.5
            
            # Calculate Interpolation
            i, j = int(src_x), int(src_y)
            u, v = src_x - i, src_y - j
            f = (1-u)*(1-v)*src[i,j] + (1-u)*v*src[i,j+1] + u*(1-v)*src[i+1,j] + u*v*src[i+1,j+1]
            f = np.clip(f, 0, 255)   # Handle data that is out of bounds

            # interpolation
            dst[dst_x, dst_y,:] = f.astype(np.uint8)
    return dst

Bicubic Interpolation

(1) Theory

Bicubic interpolation is also known as trilinear interpolation, cubic convolution interpolation, cubic convolution interpolation, and so on.
The algorithm uses the gray values of 16 points around the sample points to interpolate three times, taking into account not only the influence of the gray level of four immediate neighbors, but also the influence of the rate of change of gray values between neighbors.
Assuming the size of source image A is Mn and the size of target image B after scaling K times is MN, i.e. K=M/m. Each pixel point of A is known but B is unknown, we want to ask the value of each pixel point (X,Y) in target image B. We must first find out the corresponding pixel (x,y) of pixel (X,Y) in source image A, and then according to the distance pixel (x,y) of source image A.The nearest 16 pixels are used as parameters to calculate the pixel values at the B(X,Y) of the target image. The weights of 16 pixels are calculated using the BiCubic basis function. The values of the B pixels (X, Y) of the image are equal to the weighted overlay of 16 pixels.

Based on the scale relation x/X=m/M=1/K, we can get that the corresponding coordinates of B(X,Y) on A are A(x,y)=A(X/K,Y/K). As shown in the figure, the P-point is the location of the target image B in (X,Y) corresponding to the position of the source image A, and the coordinate position of P appears as a decimal part, so we assume that the coordinates of P are P(x+u,y+v), where x,y represent the integer part, u,v represent the decimal part respectively.(The distance from the blue dot to the red dot in the a11 square). Then we can get the position of the nearest 16 pixels as shown in the figure, where a(i,j)(i,j=0,1,2,3) is used, as shown above.
BiCubic function:

The BiCubic basis function is one-dimensional and the pixels are two-dimensional, so we compute the rows and columns of the pixel points separately.
The parameter x in the BiCubic function indicates the distance from the pixel point to the P point, for example, the distance from a00 to P(x+u,y+v) is (1+u,1+v), so the horizontal coordinate weight of a00 is i_0=W(1+u), and the vertical coordinate weight j_0=W(1+v).
Important: Distance is weight
The contribution value of A00 to B(X,Y) is: (a00 pixel value)* i_0* j_0.
Therefore, the P-point horizontal coordinate weights are W(1+u), W(u), W(1-u), W(2-u); the vertical coordinate weights are W(1+v), W(1-v), W(2-v).
As a matrix, a may or may not take 0.5, but the original author suggested 0.5.

S(x) is a cubic interpolation kernel function and a weight function, which can be approximated by the following formula:

(2) python implementation

# Kernel function of cubic convolution
def S(x):
    if abs(x) <= 1: 
        y = 1- 2*np.power(x,2) + abs(np.power(x,3))
    elif abs(x)>1 and abs(x)<2:
        y = 4 - 8*abs(x) + 5*np.power(x,2) - abs(np.power(x,3))
    else:
        y = 0
    return y

# cubic convolution interpolation
def bicubic_interpolation(src, dst_shape):
    # Get Original Dimension
    src_height, src_width = src.shape[0], src.shape[1]
    # Calculate New Graph Dimension Note that channel Number should be the same
    dst_height, dst_width, channels = dst_shape[0], dst_shape[1], dst_shape[2]
    
    dst = np.zeros(shape = (dst_height, dst_width, channels), dtype=np.uint8)
    for dst_x in range(dst_height):
        for dst_y in range(dst_width):
            # Finding the corresponding coordinates of the source image
            src_x = (dst_x+0.5) * (src_width/dst_width) - 0.5
            src_y = (dst_y+0.5) * (src_width/dst_width) - 0.5
            i, j = int(src_x), int(src_y)
            u, v = src_x - i, src_y - j
            
            # boundary condition
            x1 = min(max(0, i-1), src_height-4)
            x2 = x1 + 4
            y1 = min(max(0, j-1), src_width-4)
            y2 = y1 + 4
            
            # Calculating bicubic interpolation
            A = np.array([S(u+1), S(u), S(u-1), S(u-2)])
            C = np.array([S(v+1), S(v), S(v-1), S(v-2)])
            B = src[x1:x2, y1:y2]
            f0 = [A @ B[..., i] @ C.T for i in range(channels)]
            f1 = np.stack(f0)
            f = np.clip(f1, 0, 255)  # Handle data that is out of bounds

            # interpolation
            dst[dst_x, dst_y,:] = f.astype(np.uint8)
    return dst


If you feel good, please give some encouragement, crab~

Tags: Python OpenCV Machine Learning

Posted on Sat, 02 Oct 2021 12:32:15 -0400 by gerry123