web technology sharing | the "easy to look" technique of front-end secret script

Hello, everyone, I'm in early winter. I've always wanted to study artificial intelligence, but I've been waiting. In the last issue, I wrote a front-end image processing script. I wanted to strike while the iron was hot, so I worked hard and had the idea of changing the front-end face. Then write a DEMO of changing the face, so there is the DEMO in the following picture.

Today, let's take this DEMO to see how to use tfjs + canvas to change the face of the front end.

technical analysis

The premise of front-end face changing (face changing) is to obtain the range of the face, but at present, only AI can play a key role, because the key lies in the recognition of the face and the calculation of the position and size of the facial features, so as to meet our unrestrained needs. How can it be as simple as mieba's fingers?

First, we need to accurately obtain whether there is a face in the image (picture or video), in addition to the boundary of the face and the position of the facial features. For example, detecting facial feature points is called "neural network" of human face.

Secondly, as the chief surgeon, after we got the report on the position of facial features and communicated with customers to clarify our needs, we carried out careful analysis again. Finally, we were assured to move the "knife" boldly.

Don't forget to take the anesthetic. I forgot it several times. The customer fainted in pain and saved the cost of the anesthetic. I'm really a good doctor who works hard and thrifty.

In addition to getting the position of the facial features, we also need to know the size of the facial features. To be honest, it's a little difficult. Not everyone's eyes are as precise as Du Haitao and Li Ronghao (easy to measure with a microscope), and not everyone's mouth swallows mountains and rivers like Yao Chen and Shu Qi.

I believe some partners have fallen into anxiety. Don't worry, friend. You're not the one who was attacked anyway ðŸĪŠïžŒ Methods are more difficult than difficulties, right? Let's move.

Technology selection

In order to obtain the position of facial features, we must first recognize the face, and then obtain the range and position of facial features. Here, we have to rely on the power of AI intelligence. According to the preliminary investigation on the library of intelligent face detection, there are probably face-api.js, tracking.js, clmtrackr.js and tfjs.

At the same time, step by step, I was full of the confidence of the doctor who was the chief surgeon at the beginning and looked forward to starting from the face API. I tried one by one, failed repeatedly, failed repeatedly, and almost gave up from the beginning. Do I think that my generation of miracle doctors will decline from now on? Suddenly, I stood up from the ruins of self doubt. After confirming the preparation and workflow, I found that the warehouse was three years ago, or it may be a self problem. I resolutely gave up the proposals of face-api.js, tracking.js and clmtrackr.js, and put the color of my trust into tfjs.

Soon I was tfjs official website Find some already trained Model , including what we need Face detection model What a coincidence? No, it opened the door to the face changing technique for us. Google really didn't disappoint me 😄.

After a long and pleasant test, I found that the Blazface library can help us detect:

  • Start and end coordinates of face range
  • Position of left and right eyes
  • Position of left and right ears
  • Position of nose
  • Position of mouth

A picture is worth a thousand words. Seeing is believing.

Technical realization

If we only talk about the theory of technology sharing, isn't it too noisy and inconsistent with my Style? About how to realize it technically, we can start with a simple DEMO and add to the eyes 💗.

I'd like to present a renderings first.

The whole implementation process can be divided into the following steps:

  • Introduce tfjs and load AI model (face recognition)
  • Get the information of all faces in the image
  • Calculate the size of each person's eyes
  • canvas draw a picture and add it above the eye 💗

Here we use maps instead of changing image data. Of course, we can also directly change the ImageData, but Mr. Luo Xiang doesn't recommend it. If we directly change the image data, it will not only have a large amount of calculation, resulting in drawing jamming, but even the browser jamming; In addition, the model can not guarantee the accuracy, resulting in image distortion. After all, we now rely on the coordinates provided by AI model analysis. We look forward to more perfect and accurate models in the future.

Step 1: introduce Blazface detector library and load AI model

The Blazface detector library depends on tfjs, so we need to load tfjs first

Two introduction methods

npm import

const blazeface = require('@tensorflow-models/blazeface');

script tag

<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/blazeface"></script>

Load AI model

⚠ïļ Scientific Internet access is required because the model needs to be loaded from the TFHUB. It is expected that you can choose where to load the model in the future (the proposal has been accepted first).

async function main() {
  // Loading model
  const model = await blazeface.load();
  // TODO...

}

main();

Step 2: get the information of all faces in the image

After ensuring that the model is loaded, we can use the estimateFaces method to detect all face information in the image. This method will return an array. The number of arrays is the number of detected faces. Each face information is an object, which contains the following information:

  • topLeft
    Face ↖ïļ Coordinates of angular boundary
  • bottomRight
    Face ↘ïļ Coordinates of the corner boundary. The width and height of the face can be calculated in combination with topLeft.
  • probability
    Accuracy.
  • landmarks
    An array containing the positions of facial features. It represents right eye, left eye, nose, mouth, right ear and left ear in order.

The estimateFaces method receives two parameters:

  • input
    A DOM node is a DOM object. It can be video or image.
  • returnTensors
    Boolean type, return data type. If false, the specific value returned, x, y coordinates, etc. If true, an object is returned.

Example:

// We pass video or image objects (or nodes) to the model
const input = document.querySelector("img");
// This method will return an array containing face bounding box and facial features coordinates. Each item of the array corresponds to a human face.
const predictions = await model.estimateFaces(input, false);
/*
`predictions` is an array of objects describing each detected face, for example:
[
  {
	topLeft: [232.28, 145.26],
	bottomRight: [449.75, 308.36],
	probability: [0.998],
	landmarks: [
	  [295.13, 177.64], // Coordinates of the right eye
	  [382.32, 175.56], // Coordinates of the left eye
	  [341.18, 205.03], // Coordinates of nose
	  [345.12, 250.61], // Mouth coordinates
	  [252.76, 211.37], // Coordinates of the right ear
	  [431.20, 204.93] // Coordinates of the left ear
	]
  }
]
*/

Step 3: calculate the size of each person's eyes

In the second step, we have obtained the coordinate position of each person's eyes. Next, we need to calculate the size of the eyes. Careful partners may have found that the data analyzed by the model does not provide the attribute of eye size. How can we judge the size of the eyes?
In Figure 3 above, we can see that the coordinates of the eyes are the lower eyelids, the coordinates of the nose are the tip of the nose, and the position of the mouth is the center point, and there is a certain offset. Careful observation shows that the angle will also affect the size of the eyes, but there is a common phenomenon that the height from the boundary to the eyes is about half of the height downward, which is the position of the eyes. Therefore, the size of the eye = the Y coordinate of the eye - the Y coordinate of the upper boundary.

for (let i = 0; i < predictions.length; i++) {
	const start = predictions[i].topLeft;
	const end = predictions[i].bottomRight;
	const size = [end[0] - start[0], end[1] - start[1]];

	const rightEyeP = predictions[i].landmarks[0];
	const leftEyeP = predictions[i].landmarks[1];

	// Eye size
	const fontSize = rightEyeP[1] - start[1];
	context.font = `${fontSize}px/${fontSize}px serif`;
}

Step 4: draw images in canvas and 💗

Because we use the mapping method here, we need to draw the original image first, and then at the position of the eye, through CanvasRenderingContext2D.fillText() The way we will 💗 Here, we can also use the way of pictures. I think the text is faster because the pictures need to be loaded 😛.

// The steps of drawing the original drawing are omitted here. See the source code for details
// ...
// Traversing face information array
for (let i = 0; i < predictions.length; i++) {
	const start = predictions[i].topLeft;
	const end = predictions[i].bottomRight;
	const size = [end[0] - start[0], end[1] - start[1]];
	const rightEyeP = predictions[i].landmarks[0];
	const leftEyeP = predictions[i].landmarks[1];

	// See love
	const fontSize = rightEyeP[1] - start[1];
	context.font = `${fontSize}px/${fontSize}px serif`;
	context.fillStyle = 'red';
	context.fillText('âĪïļ', rightEyeP[0] - fontSize / 2, rightEyeP[1]);
	context.fillText('âĪïļ', leftEyeP[0] - fontSize / 2, leftEyeP[1]);
}

Source code Hands on, can't wait to have a try? Don't worry. I also provide some other interesting DEMO below. Don't forget to like and collect.

Advanced

This article only explains some entry-level image processing technologies. A high degree of image processing is far more responsible than we think. At the same time, it also needs relevant algorithms. Interested partners can consult relevant materials and documents. For example, you can baidu some image tracking algorithms, image processing algorithms, binarization, 265 color to gray and so on.
In addition, I want to help you consolidate from point to face by sharing a few dmeos and realizing sharing.

Epidemic prevention expert

The epidemic situation is threatening, but it has been delayed. In the hot summer, everyone must be bored to wear masks. Why don't I add an invisible mask to you. If it's dry, find a mask PNG without background color, and then we can start our performance.

Steps:

  • Get the position of the mouth through the model
  • Calculate the width of the mouth
  • Drawing images and masks in canvas

analysis:

  • The mask is a picture that needs to be passed CanvasRenderingContext2D.drawImage() Method drawing
  • The center point of the mask is approximately equal to the center point of the mouth
for (let i = 0; i < predictions.length; i++) {
	const start = predictions[i].topLeft;
	const end = predictions[i].bottomRight;
	const size = [end[0] - start[0], end[1] - start[1]];

	const rightEyeP = predictions[i].landmarks[0];
	const leftEyeP = predictions[i].landmarks[1];
	const noseP = predictions[i].landmarks[2];
	const mouseP = predictions[i].landmarks[3];
	const rightEarP = predictions[i].landmarks[4];
	const leftEarP = predictions[i].landmarks[5];

	// Epidemic prevention expert
	const image = new Image();
	image.src = "./assets/images/mouthMask.png";
	image.onload = function() {
		const top = noseP[1] - start[1];
		const left = start[0];
		// The mouth is the center, half above and half below
		context.drawImage(image, mouseP[0] - size[0] / 2, mouseP[1] - size[1] / 2, size[0], size[1]);
	}
}

Source code

Flame red lips

Steps:

  • Get the position of the mouth through the model
  • Calculate the width of the mouth
  • Drawing images in canvas and 👄

analysis:
Similarly, the model does not return the mouth size, and everyone's mouth size is different. Draw inferences from one instance, calmly analyze and try to find a breakthrough:

  • The nose and mouth may be on the same Y axis
  • The width of the ear cannot be the height of the mouth
  • Eye width? The height of the mouth?

[flash of light], yes, the width of the eyes seems to be about the same as the width of the mouth, so I turned my curious eyes to my colleagues. I took out a barrel of unopened cobic chips from the cabinet on the left. After observation, I found that as long as they don't open their mouth, their mouth is about the width between their eyes. Of course, there are some special cases, but it doesn't affect my conclusion, This should be the golden ratio.

So, there is the following code:

for (let i = 0; i < predictions.length; i++) {
	const start = predictions[i].topLeft;
	const end = predictions[i].bottomRight;
	const size = [end[0] - start[0], end[1] - start[1]];

	const rightEyeP = predictions[i].landmarks[0];
	const leftEyeP = predictions[i].landmarks[1];

	// Add a flame red lip
	// The mouth is about the width before the eyes
	const fontSize = Math.abs(rightEyeP[0] - leftEyeP[0]);
	context.font = `${fontSize}px/${fontSize}px serif`;
	context.fillStyle = 'red';
	context.fillText('👄', mouseP[0] - fontSize / 2, mouseP[1] + fontSize / 2);
}

Source code

video processing

Before, we introduced the image processing method. In fact, the video processing method is the same. The difference is that the image is only drawn once, but for video, we need to draw every frame (picture) of the video, and then carry out secondary processing.

analysis:

  • The size of the canvas is equal to the size of the video. The size of the video can only be obtained after onload
  • We need to process each frame of the video. Because it is cyclic processing, we may need to use setTimeout, setInterval or requestAnimationFrame to implement it. Obviously, setInterval is not suitable. In addition, if the amount of data is huge, setTimeout may affect the response speed. Therefore, we need to select requestAnimationFrame for cyclic processing.

Steps:

  • Load video
  • Video loading succeeded. Initialize canvas
  • After initialization, load the AI model
  • Process every frame of video

Source code

Summary of technical knowledge

TensorFlow.js is a JavaScript library developed by Google's open source machine learning platform, referred to as tfjs.

We use the face detection model Blazeface detector provided by tfjs.

The Blazeface detector provides two methods that return Promise objects:

  • The load method is used to load the model
    You need to get the latest model from the tensor flow hub, so you need to surf the Internet safely. For small partners who can't surf the Internet safely, open source can find out whether there is an image address, and map it to the image address by changing the local host
  • estimateFaces is used to detect face information
    Detect the information of all faces and return an array. Each element in the array represents a person's face information, including face boundary information and facial feature position.

In addition, we also apply some basic knowledge of canvas, CanvasRenderingContext2D.fillText() ,CanvasRenderingContext2D.drawImage() .

Homework after class

Friends, wear yourself a pair ðŸ•ķïļ What about?

The following partners can send their works to my email or launch Pull requests on github. If they pass the review, they will be included in my Demo.

If you think my article is interesting, please comment and tell me your thoughts and opinions.

Problem collection

  • DEMO has no effect?
    • ⚠ïļ Scientific Internet access is required because the model needs to be loaded from the TFHUB. It is expected that you can choose where to load the model in the future (the proposal has been accepted first).
    • nginx, python and node server are used locally as a local service to prevent resources from crossing domains (some resources need to be obtained locally)
    • If you are collecting detect video or camera, please check whether there is a local camera first. If not, please use the MP4 provided by the project for testing.
  • Other issues
    • Wait for you to mention it

Related links

Tags: AI webrtc

Posted on Fri, 17 Sep 2021 17:09:37 -0400 by gazalec