Advanced face detection: faster 5-point face marker detector

Advanced face detection: faster 5-point face marker detector

The goal here today is to introduce you to the new dlib facial marker detector, which is faster (8-10% higher), more efficient and smaller (10 times smaller) than the original version.

In the first part of this blog post, we will discuss dlib's new, faster and smaller 5-point facial marker detector and compare it with the original 68 point facial marker detector distributed with the library.

Then we will use Python, dlib and OpenCV to implement facial sign detection, then run it and view the results.

Finally, we will discuss some limitations of using the 5-point facial marker detector and focus on some scenarios where you should use the 5-point version of the 68 point facial marker detector.

The 68 point detector locates 68 points along the lines of eyes, eyebrows, nose, mouth and mandible. The 5-point facial sign detector simplifies this information as follows:

  • Left eye 2 points

  • Right eye 2 points

  • Nose 1 point

The most appropriate use case for the 5-point face marker detector is face alignment.

In terms of acceleration, I found that the new 5-point detector is 8-10% faster than the original version, but the real victory here is the model size: 9.2MB.

It is also important to note that facial marker detectors tend to start very quickly (especially if they are implemented correctly, as they are in dlib).

dlib installation tutorial:

Face detector model:

Facial logo implementation using dlib, OpenCV and Python

Open a new file and name it and insert the following code:

# import the necessary packages
from import VideoStream
from imutils import face_utils
import argparse
import imutils
import time
import dlib
import cv2

The necessary packages are imported, especially the two modules in dlib and imutils.

The imutils package has been updated to handle 68 point and 5-point facial sign models. Make sure to upgrade it in your environment by:

pip install --upgrade imutils

Similarly, updating imutils will allow you to use 68 point and 5 point facial signs.

Parse command line parameters:

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--shape-predictor", required=True,
	help="path to facial landmark predictor")
args = vars(ap.parse_args())

We have a command line parameter: – shape predictor. This parameter allows us to change the path of the facial sign predictor that will be loaded at run time.

Then, let's load the shape predictor and initialize our video stream:

# initialize dlib's face detector (HOG-based) and then create the
# facial landmark predictor
print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(args["shape_predictor"])
# initialize the video stream and sleep for a bit, allowing the
# camera sensor to warm up
print("[INFO] camera sensor warming up...")
vs = VideoStream(src=1).start()
# vs = VideoStream(usePiCamera=True).start() # Raspberry Pi

Initialize dlib's pre training HOG + linear SVM face detector and load shape_predictor file.

To access the camera, we use the VideoStream class in imutils.

You can choose (via comment / uncomment lines 25 and 26) whether to use:

1. Built in / USB webcam

2. Or, if you will use PiCamera on Raspberry Pi

From there, let's go through the frames and do some work:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream, resize it to
	# have a maximum width of 400 pixels, and convert it to
	# grayscale
	frame =
	frame = imutils.resize(frame, width=400)
	gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
	# detect faces in the grayscale frame
	rects = detector(gray, 0)
	# check to see if a face was detected, and if so, draw the total
	# number of faces on the frame
	if len(rects) > 0:
		text = "{} face(s) found".format(len(rects))
		cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX,
			0.5, (0, 0, 255), 2)

First, we read a frame from the video stream, adjust its size, and then convert it to gray.

Then let's use our HOG + linear SVM detector to detect faces in gray images.

From there, we first ensure that at least one face is detected, so as to draw the total number of faces in the image on the original frame.

Next, let's cycle through face detection and draw markers:

	# loop over the face detections
	for rect in rects:
		# compute the bounding box of the face and draw it on the
		# frame
		(bX, bY, bW, bH) = face_utils.rect_to_bb(rect)
		cv2.rectangle(frame, (bX, bY), (bX + bW, bY + bH),
			(0, 255, 0), 1)
		# determine the facial landmarks for the face region, then
		# convert the facial landmark (x, y)-coordinates to a NumPy
		# array
		shape = predictor(gray, rect)
		shape = face_utils.shape_to_np(shape)
		# loop over the (x, y)-coordinates for the facial landmarks
		# and draw each of them
		for (i, (x, y)) in enumerate(shape):, (x, y), 1, (0, 0, 255), -1)
			cv2.putText(frame, str(i + 1), (x - 10, y - 10),
				cv2.FONT_HERSHEY_SIMPLEX, 0.35, (0, 0, 255), 1)

Loop through rects.

We use face in imutils_ The utils module draws the face bounding box on the original frame (you can read more here).

Then we pass the face to the predictor to determine the face flag, and then we convert the face flag coordinates into a NumPy array.

Now is the interesting part. To visualize the markers, we will use to draw small points and number each coordinate.

Traverse the marker coordinates. Then we draw a small filled circle and mark the number on the original frame.

Let's finish our facial marking script:

	# show the frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
# do a bit of cleanup

Run our facial marker detector and execute the command

python --shape-predictor shape_predictor_5_face_landmarks.dat

Is dlib's 5:00 faster than the 68:00 facial marker detector?

In my own test, I found that dlib's 5-point facial marker detector is 8-10% faster than the original 68 point facial marker detector.

8-10% acceleration is significant; However, what is more important here is the size of the model.

The original 68 point face marker detector is nearly 100MB.

The 5-point facial marker detector is less than 10MB, only 9.2MB - this is a model more than 10 times smaller!

When you build your own application that uses facial signs, you now have a much smaller model file that can be distributed with the rest of the application.

Limitations of 5-point facial marker detector

For face alignment, the 5-point face marker detector can be regarded as a direct substitute for the 68 point detector - the same general algorithm applies:

Calculate the 5-point facial sign

Calculate the center of each eye according to the two landmarks of each eye

Use the midpoint between eyes to calculate the angle between eye centroids

The normal alignment of human faces is obtained by applying affine transformation

Although the 68 point facial marker detector may provide us with a better approximation of the eye center, in practice, you will find that the 5-point facial marker detector can also work normally.

The 5-point facial marker detector must be smaller (9.2MB and 99.7MB, respectively), but it cannot be used in all cases. For example, in drowsiness detection, we need to calculate the eye aspect ratio (EAR), which is the ratio of eye landmark width to eye landmark height.

When using the 68 point facial marker detector, we have six points per eye, enabling us to perform this calculation.

However, using the 5-point facial marker detector, we have only two points per eye - which is not enough to calculate the eye aspect ratio.

If your plan is to build a drowsiness detector or any other application that requires more facial points, including facial signs along the following directions:





Mandibular line


In today's blog post, we discussed dlib's new, faster and more compact 5-point facial marker detector.

This 5-point facial marker detector can be considered as a substitute for the 68 point marker detector originally distributed with the dlib library.

After discussing the differences between the two facial marker detectors, I provide an example script that applies the 5-point version to detect the eye and nose areas of my face.

In my test, I found that the 5-point facial marker detector is 8-10% faster and 10 times smaller than the 68 point version.

Tags: Python OpenCV Computer Vision

Posted on Mon, 06 Dec 2021 16:07:35 -0500 by amthree