Introduction to graphics: image binarization

       Recently, the project has done image processing and grid processing. By the way, I'll write a few blogs.
       Binarization treatment
       In image processing, the general image (color image) carries too much information, which is not suitable for calculation and processing. It is necessary to gray - > binary the image.
       1. Grayscale
       Image coding is divided into two common coding methods: RGB and YUV. Generally, we use RGB coding, but when it comes to areas with high requirements for data compression such as network transmission, YUV is the mainstream, because YUV still ensures high image quality on the premise of losing a certain amount of data.
       The specific compression method of YUV is:
       The common formats of color image recording are RGB, YUV, CMYK, etc. The earliest idea of color TV was to use RGB three primary colors to transmit at the same time. This design method is three times the original black-and-white bandwidth, which was not a very good design at that time. RGB appeals to the human eye's sense of color, while YUV focuses on the visual sensitivity to brightness. Y represents brightness, and UV represents chromaticity (therefore, UV can be omitted from black-and-white films, which is similar to RGB). They are represented by Cr and Cb respectively. Therefore, YUV records are usually presented in Y:UV format.
       Because the human eye is not aware of the chromaticity, the UV chromaticity component is processed by down sampling (half), so the U component (V component) is a quarter of the data of the Y component, that is, it is reduced from 12x of RGB to 6x of YUV (I420).
Baidu yuv
       This is mainly because we need to convert the RGB of the picture into YUV (only Y), that is, extract the brightness (gray). The following is the calculation formula of RGB and YUV conversion:
wiki yuv
       Next, we deal with the conversion from color image to gray image:

#pragma kernel CSMain RWTexture2D<float4> Source; RWTexture2D<float4> Result; float4 grayPixel(float4 rgba) { float gray = 0.299*rgba.x+0.587*rgba.y+0.114*rgba.z; float4 px = float4(gray,gray,gray,1); return px; } [numthreads(8,8,1)] void CSMain (uint3 id : SV_DispatchThreadID) { Result[id.xy] = grayPixel(Source[id.xy]); }

c# call:

using System.Collections; using System.Collections.Generic; using UnityEngine; public class TestBinarization : MonoBehaviour { public Texture2D sourceTex; public ComputeShader grayCS; private int texWidth; private int texHeight; public RenderTexture sourceRT; public RenderTexture grayRT; void Start() { texWidth = sourceTex.width; texHeight = sourceTex.height; GetSourceRT(); GetGrayRT(); } public void GetSourceRT() { sourceRT = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32); sourceRT.enableRandomWrite = true; sourceRT.Create(); Graphics.Blit(sourceTex, sourceRT); } public void GetGrayRT() { grayRT = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32); grayRT.enableRandomWrite = true; grayRT.Create(); int kl = grayCS.FindKernel("CSMain"); grayCS.SetTexture(kl, "Source", sourceRT); grayCS.SetTexture(kl, "Result", grayRT); grayCS.Dispatch(kl, texWidth / 8, texHeight / 8, 1); } }

       The effects are as follows:
       Original drawing:

       Grayscale image:

       Next, the gray image is binarized:

#pragma kernel CSMain RWTexture2D<float4> Source; RWTexture2D<float4> Result; float4 binaryPixel(float4 rgba) { float r = rgba.x; float4 px; if(r<0.7) { px = float4(0,0,0,1); } else { px = float4(1,1,1,1); } return px; } [numthreads(8,8,1)] void CSMain (uint3 id : SV_DispatchThreadID) { Result[id.xy] = binaryPixel(Source[id.xy]); }

       The effects are as follows:

       However, I adjusted the threshold of 0.7 through my own visual inspection, which is not the binary threshold of automatic processing, so how can we automatically process this threshold?
       There are three common algorithms: average method, OTSU method and Kittler method.

Average method:
The average gray value of all pixels in the gray image is used as the threshold, as follows:

public float GetAvgThreshold(RenderTexture rt) { int width = rt.width; int height = rt.height; RenderTexture.active = rt; Texture2D tex = new Texture2D(width, height); tex.ReadPixels(new Rect(0, 0, width, height), 0, 0); tex.Apply(); RenderTexture.active = null; float threshold = 0f; Color[] cols = tex.GetPixels(); for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { threshold += cols[y * width + x].r; } } threshold /= (width * height); #if UNITY_EDITOR Debug.LogFormat("GetAvgThreshold threshold = ", threshold); #endif return threshold; }

public void GetBinaryRT() { binaryRT = new RenderTexture(texWidth, texHeight, 0, RenderTextureFormat.ARGB32); binaryRT.enableRandomWrite = true; binaryRT.Create(); int kl = grayCS.FindKernel("CSMain"); float thre = BinarizationFactory.Instance.GetAvgThreshold(grayRT); binaryCS.SetFloat("threshold", thre); binaryCS.SetTexture(kl, "Source", grayRT); binaryCS.SetTexture(kl, "Result", binaryRT); binaryCS.Dispatch(kl, texWidth / 8, texHeight / 8, 1); }

The effects are as follows:

It can be seen that the effect is not very good.

       OTSU method:
       OTSU method is also called OTSU method( Baidu Dajin method )The principle is that assuming the threshold gray value, the gray image is divided into foreground area (black) and background area (white), the maximum inter class variance between foreground and background is calculated, and the threshold value is calculated by iteration according to the step size (0f/255f-255f/255f). The formula is as follows:
       Inter class variance = proportion of foreground pixels x (average gray value of foreground - average gray value of whole image) ^ 2 + proportion of background pixels x (average gray value of background - average gray value of whole image) ^ 2
       The following is the calculation formula:
       1.threshold (t)
       2. Proportion of foreground (black) pixels (r1)
       3. Foreground (white) average gray (g1)
       4. Average gray level of the whole image (g)
       5. Proportion of background pixels (r2)
       6. Average background gray (g2)
       7. Interclass variance (v)

       The result is v = r1r2 (g1-g2) ^ 2. As long as the maximum V is obtained according to the iteration, the iteration value threshold can be obtained.
       Next write the code:

public float GetOTSUThreshold(RenderTexture rt) { int width = rt.width; int height = rt.height; int pxlen = width * height; RenderTexture.active = rt; Texture2D tex = new Texture2D(width, height); tex.ReadPixels(new Rect(0, 0, width, height), 0, 0); tex.Apply(); RenderTexture.active = null; Color[] cols = tex.GetPixels(); //Iterate with 0.02f float iter = 0.02f; //v = r1*r2*(g1-g2)^2 float threshold = 0; float maxv = float.MinValue; for (float t = 0; t < 1f; t += iter) { float r1 = 0, r2 = 0, g1 = 0f, g2 = 0f; for (int k = 0; k < pxlen; k++) { float gray = cols[k].r; if (gray <= t) //Foreground (black) { r1++; g1 += gray; } else //Background (white) { r2++; g2 += gray; } } g1 /= r1; r1 /= (float)pxlen; g2 /= r2; r2 /= (float)pxlen; float v = r1 * r2 * (g1 - g2) * (g1 - g2); if (maxv < v) { maxv = v; threshold = t; } } #if UNITY_EDITOR Debug.LogFormat("GetOTSUThreshold threshold = ", threshold); #endif return threshold; }

The effects are as follows:

Instead of 1f/255f iteration, I use 0.02f iteration to speed up the calculation. If high precision is required, the number of iterations is a little more, but it can be seen that the effect is fairly good.

       Kittler method
       kittler method, also known as the minimum error threshold method, is very confident in its name. The core idea is to calculate the average value of gradient gray of the whole image, and take the average value as the threshold.
       The specific method is to iterate the pixels line by line to obtain the maximum gradient in the horizontal or vertical direction of the pixels, and the calculation method of this gradient is the absolute value of the difference between adjacent gray values, as shown in the following figure:

       Then, calculate the product gp of the maximum gradient g and the pixel gray p.r, obtain Σ gp and Σ G after iteration, and obtain the divisor to obtain the threshold threshold, as follows:

public float GetKittlerThreshold(RenderTexture rt) { int width = rt.width; int height = rt.height; int pxlen = width * height; RenderTexture.active = rt; Texture2D tex = new Texture2D(width, height); tex.ReadPixels(new Rect(0, 0, width, height), 0, 0); tex.Apply(); RenderTexture.active = null; Color[] cols = tex.GetPixels(); float gp = 0f; //∑gp float gsum = 0f; //∑g for (int y = 1; y < height - 1; y++) { for (int x = 1; x < width - 1; x++) { int px = y * width + x; int left = y * width + x - 1; int right = y * width + x + 1; int up = (y - 1) * width + x; int down = (y + 1) * width + x; float gh = Mathf.Abs(cols[left].r - cols[right].r); float gv = Mathf.Abs(cols[up].r - cols[down].r); //Get the maximum gradient g float g = gh > gv ? gh : gv; //accumulation gp += (g * cols[px].r); gsum += g; } } float threshold = gp / gsum; #if UNITY_EDITOR Debug.LogFormat("GetKittlerThreshold threshold = ", threshold); #endif return threshold; }

       The effects are as follows:

       It can be seen that the effect is not bad.
       Well, these are also several commonly used binary calculation methods. Of course, Baidu and google also have a lot of binary algorithms, which need to be implemented. I have time to talk about the grid processing of the project later.

Introduction to graphics: image binarization

31 October 2021, 16:05 | Views: 3968

Add new comment

0 comments