Advanced face detection: live detection using OpenCV

In vivo detection using OpenCV

In this blog post, you will learn how to perform in vivo detection using OpenCV. You will create a living detector that can find false faces in the face recognition system and perform anti face deception.

In the first part of the tutorial, we will discuss living body detection, including what it is and why we need it to improve our face recognition system.

From there, we will review the data sets we will use to perform in vivo testing, including:

  • How to construct a dataset for in vivo detection? Our example human face image and dummy face image
  • We will also review the project structure of the living detector project.

In order to create a living detector, we will train a depth neural network that can distinguish between real face and false face. Therefore, we need to: build the image dataset itself. Implement a CNN that can execute live detector (we call this network "LivenessNet"). Train the in vivo detector network. Create a Python + OpenCV script that can take our trained living detector model and apply it to real-time video.

Let's start!

What is in vivo testing and why do we need it?

Face recognition systems are becoming more common than ever before. From face recognition on iPhone / smartphone to face recognition monitored on a large scale in China, face recognition systems are everywhere.

However, face recognition systems are easily fooled by "deceptive" and "unreal" faces. Just put a person's photo (whether printed or on a smartphone) on the face recognition camera to bypass the face recognition system.

In order to make the face recognition system more secure, we need to be able to detect such false / unreal faces - in vivo detection is a term used to refer to such algorithms.

There are a variety of in vivo detection methods, including:

  • Texture analysis includes calculating the local binary pattern (LBP) of the face region and classifying the face as real or deceptive using SVM.
  • Frequency analysis, such as checking the Fourier domain of the face.
  • Variable focus analysis, for example, checks for changes in pixel values between two consecutive frames.
  • Heuristic based algorithms, including eye movement, lip movement and blink detection. This set of algorithms attempts to track eye movements and blinks to ensure that the user does not lift another person's photo (because the photo does not blink or move his lips). Optical flow algorithm, that is, to check the differences and characteristics of optical flow generated from 3D objects and 2D planes.
  • 3D face shape, similar to the shape used on Apple iPhone face recognition system, enables the face recognition system to distinguish between real face and printout / photo / another person's image.

Combined with the above contents, face recognition system engineers can select the living detection model suitable for their specific applications.

A complete review of the in vivo detection algorithm can be found in Chakraborty and Das's 2014 paper "overview of face in vivo detection".

For the purpose of today's tutorial, we will treat in vivo detection as a binary classification problem.

Given the input image, we will train a convolutional neural network, which can distinguish between real face and forged / deceived face.

Project structure

$ tree --dirsfirst --filelimit 10
├── dataset
│   ├── fake [150 entries]
│   └── real [161 entries]
├── face_detector
│   ├── deploy.prototxt
│   └── res10_300x300_ssd_iter_140000.caffemodel
├── model
│   ├──
│   └──
├── fake
├── real  
├── le.pickle
├── liveness.model
└── plot.png

There are four main directories in our project:

Dataset /: our dataset directory consists of two types of images:

  • When playing my face video, the fake image from the camera is aimed at my screen.
  • The real image of me captured from the self shooting video with my mobile phone.

face_detector /: it is composed of our pre trained Caffe face detector, which is used to locate the face ROI.

model /: This module contains our liveessnet class. Video /: I provided two input videos to train our liveessnet classifier.

fake: store pictures taken by mobile phone.

Real: store real pictures

Today we will review three Python scripts in detail. By the end of the article, you will be able to run them on your own data and enter the video source. In the order in which they appear in this tutorial, the three scripts are: this script obtains the face ROI from the input video file and helps us create a deep learning face activity dataset. as the file name indicates, this script will train our liveessnet classifier. We will use Keras and TensorFlow to train the model. The training process produces several documents:

le.pickle: our class label encoder.

liveness.model: our serialized Keras model is used to detect facial vitality.

plot.png: the training history chart shows the accuracy and loss curve, so we can evaluate our model (i.e. over / under fitting). our demo script will launch your webcam to capture frames for real-time facial vivisection.

Face ROI is detected and extracted from our training (video) dataset

Now that we have had the opportunity to review our initial data set and project structure, let's see how to extract real and false face images from our input video. If this script will populate two directories, the ultimate goal is:

dataset/fake /: contains the ROI of faces from the fake folder

dataset/real /: save the ROI of the face in the real folder.

In view of these frames, we will train the in vivo detector based on depth learning on the image later. Open gather_ The file and insert the following code:

# import the necessary packages
import numpy as np
import argparse
import cv2
import os
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--input", type=str, required=True,
	help="path to input video")
ap.add_argument("-o", "--output", type=str, required=True,
	help="path to output directory of cropped faces")
ap.add_argument("-d", "--detector", type=str, required=True,
	help="path to OpenCV's deep learning face detector")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
ap.add_argument("-s", "--skip", type=int, default=16,
	help="# of frames to skip before applying face detection")
args = vars(ap.parse_args())

Import the required package. In addition to the built-in Python module, this script only needs OpenCV and NumPy.

Parse command line parameters:

  • – input: we input the path of the video file.
  • – output: the path to the output directory where each clipping face will be stored.
  • – detector: the path of the face detector. We will use OpenCV's deep learning face detector. For convenience, this Caffe model is included in today's download.
  • – confidence: filter the minimum probability of weak face detection. By default, this value is 50%.
  • – skip: we don't need to detect and store each image because adjacent frames are similar. Instead, we will skip N frames between two detections. You can use this parameter to change the default value of 16.

Let's continue to load the face detector and initialize our video stream:

# load our serialized face detector from disk
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
# open a pointer to the video file stream and initialize the total
# number of frames read and saved thus far
images = os.listdir(args["input"])
read = 0
saved = 0

Load OpenCV deep learning face detector.

Get all the pictures. We also initialize two variables for the number of frames read and the number of frames saved during loop execution. Let's continue to create a loop to process frames:

for filename in images:
	filepath = os.path.join(args["input"], filename)
	img = cv2.imread(filepath)

Loop reading pictures.

Continue face detection:

   (h, w) = img.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(img, (300, 300)), 1.0,
							 (300, 300), (104.0, 177.0, 123.0))
	# pass the blob through the network and obtain the detections and
	# predictions
	detections = net.forward()
	# ensure at least one face was found
	if len(detections) > 0:
		# we're making the assumption that each image has only ONE
		# face, so find the bounding box with the largest probability
		i = np.argmax(detections[0, 0, :, 2])
		confidence = detections[0, 0, i, 2]

To perform face detection, we need to create a blob from the image.

This blob has 300 × 300 width and height to fit our Caffe face detector. You will need to scale the bounding box later to get the frame size.

The blob is executed through the forward transfer of the deep learning face detector. Our script assumes that there is only one face in each frame of the video. This helps prevent false positives. If you are working on a video that contains multiple faces, I suggest you adjust the logic accordingly. Therefore, the face detection index with the highest probability is captured. The index is used to extract the confidence of the detection.

Let's filter the weak detection and write the face ROI to disk:

    	# ensure that the detection with the largest probability also
		# means our minimum probability test (thus helping filter out
		# weak detections)
		if confidence > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the face and extract the face ROI
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")
			face = img[startY:endY, startX:endX]
			# write the frame to disk
			p = os.path.sep.join([args["output"],
			cv2.imwrite(p, face)
			saved += 1
			print("[INFO] saved {} to disk".format(p))


Ensure that our face detection ROI meets the minimum threshold to reduce false positives. From there, we extract the face ROI bounding box coordinates and the face ROI itself. We generate a path + filename for the facial ROI and write it to disk. At this point, we can increase the number of saved faces. When processing is complete, perform cleanup.

Construct our in vivo detection image data set

Now we have implemented script, let's get started.

Open a terminal and execute the following command to extract faces for our "fake / cheat" class:

python --input fake --output dataset/fake --detector face_detector

Similarly, we can do the same thing for the "real" class:

python --input real --output dataset/real --detector face_detector

Since "real" video files are longer than "fake" video files, we will use longer frame skipping values to help balance the number of output face ROI for each category. After executing the script, you should have the following image counts:

  • Fake: 55images
  • Real: 84images
  • Total: 139images

Implement "LivenessNet", our deep learning activity detector

The next step is to implement "LivenessNet", our in vivo detector based on deep learning.

The core of LivenessNet is actually a simple convolutional neural network. We will deliberately keep the network shallow and as few parameters as possible for two reasons: reducing the chance of over fitting on our small data set. Ensure that our in vivo detector is fast and can run in real time (even on resource constrained devices, such as Raspberry Pi).

Now let's implement LivenessNet -- Open and insert the following code:

# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras import backend as K
class LivenessNet:
	def build(width, height, depth, classes):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1
		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

Import package. To gain insight into each of these layers and functions, it is important to refer to computer vision depth learning using Python.

Define the LivenessNet class. It contains a static method build. The build method accepts four parameters:

  • Width: the width of the image / volume.
  • height: how high is the image.
  • depth: the number of channels of the image (in this case, 3, because we will use RGB images).
  • Classes: number of categories. We have two classes: "real" and "fake".

Initialize the model. Define inputShape, and channel sorting. Let's start adding layers to our CNN:

		# first CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(16, (3, 3), padding="same",
		model.add(Conv2D(16, (3, 3), padding="same"))
		model.add(MaxPooling2D(pool_size=(2, 2)))
		# second CONV => RELU => CONV => RELU => POOL layer set
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(Conv2D(32, (3, 3), padding="same"))
		model.add(MaxPooling2D(pool_size=(2, 2)))

CNN network is similar to VGG. It is very shallow and has only a few learned filters. Ideally, we do not need deep networks to distinguish between real and deceptive faces.

The first conv = > relu = > conv = > relu = > pool layer, in which batch normalization and dropout are also added. The second conv = > relu = > conv = > relu = > pool layer. Finally, we will add our FC = > relu layer:

		# first (and only) set of FC => RELU layers
		# softmax classifier
		# return the constructed network architecture
		return model

It is composed of full connection layer and ReLU activation layer, with softmax classifier head.

Model return.

Create live detector training script

Given our real / spoofed image dataset and the implementation of LivenessNet, we are now ready to train the network. Open the file and insert the following code:

# set the matplotlib backend so figures can be saved in the background
import matplotlib
# import the necessary packages
from pyimagesearch.livenessnet import LivenessNet
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from imutils import paths
import matplotlib.pyplot as plt
import numpy as np
import argparse
import pickle
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", required=True,
	help="path to input dataset")
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to trained model")
ap.add_argument("-l", "--le", type=str, required=True,
	help="path to label encoder")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
	help="path to output loss/accuracy plot")
args = vars(ap.parse_args())

Our facial vitality training script consists of many imports (lines 2-19). Now let's review:

  • matplotlib: used to generate training charts. We specified the "Agg" backend so that we could easily save our drawings to disk in line 3.
  • LivenessNet: LivenessNet we defined in the previous section.
  • train_test_split: a function from scikit learn that constructs our data segmentation for training and testing. Classification report: also from scikit learn, the tool will generate a brief statistical report on the performance of our model.
  • ImageDataGenerator: used to perform data enhancement and provide us with batch randomly mutated images.
  • Adam: an optimizer that fits this model very well. (alternative methods include SGD, RMSprop, etc.). Path: from my imutils package, this module will help us collect the paths of all image files on disk.
  • pyplot: used to generate a good training map.
  • numpy: Python's numerical processing library. This is also the requirement of OpenCV.
  • argparse: used to process command line parameters.
  • pickle: used to serialize our label encoder to disk.
  • cv2: our OpenCV binding.
  • os: This module can do many things, but we just use it as an operating system path separator.

It should be easier to see the rest of the script. This script accepts four command line arguments:

  • – dataset: enter the path of the dataset. Earlier in this article, we used gather_ The script creates the dataset.
  • – model: our script will generate an output model file -- here you provide its path.
  • – le: you also need to provide the path to output the serialized label encoder file.
  • – plot: the training script will generate a plot. If you want to override the default value of "plot.png", you should specify this value on the command line.

The next code block will perform some initialization and build our data:

# initialize the initial learning rate, batch size, and number of
# epochs to train for
INIT_LR = 1e-4
BS = 8
# grab the list of images in our dataset directory, then initialize
# the list of data (i.e., images) and class images
print("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))
data = []
labels = []
# loop over all image paths
for imagePath in imagePaths:
	# extract the class label from the filename, load the image and
	# resize it to be a fixed 32x32 pixels, ignoring aspect ratio
	label = imagePath.split(os.path.sep)[-2]
	image = cv2.imread(imagePath)
	image = cv2.resize(image, (32, 32))
	# update the data and labels lists, respectively
# convert the data into a NumPy array, then preprocess it by scaling
# all pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0

Set training parameters, including initial learning rate, batch size and EPOCHS.

From there, our imagePaths were captured. We also initialized two lists to hold our data and class tags. Loop to build our list of data and tags. The data is loaded by us and adjusted to 32 × 32 pixel image composition. Each image has a corresponding label stored in the label list.

All pixel intensities are scaled to the range of [0, 1], and the list is made into a NumPy array. Now let's encode the tag and partition the data:

# encode the labels (which are currently strings) as integers and then
# one-hot encode them
le = LabelEncoder()
labels = le.fit_transform(labels)
labels = to_categorical(labels, 2)
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
	test_size=0.25, random_state=42)

Single heat coded label. We use scikit learn to divide our data - 75% for training and 25% for testing. Next, we will initialize our data enhancement object and compile + train our facial vitality model:

# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
	width_shift_range=0.2, height_shift_range=0.2, shear_range=0.15,
	horizontal_flip=True, fill_mode="nearest")
# initialize the optimizer and model
print("[INFO] compiling model...")
opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS)
model =, height=32, depth=3,
model.compile(loss="binary_crossentropy", optimizer=opt,
# train the network
print("[INFO] training network for {} epochs...".format(EPOCHS))
H =, trainY, batch_size=BS),
	validation_data=(testX, testY), steps_per_epoch=len(trainX) // BS,

Construct a data enhancement object that generates an image with random rotation, scaling, shifting, clipping, and flipping.

Build and compile LivenessNet model. Then we started training. Considering our shallow network and small data sets, this process will be relatively fast. Once the model has been trained, we can evaluate the results and generate a training map:

# evaluate the network
print("[INFO] evaluating network...")
predictions = model.predict(x=testX, batch_size=BS)
	predictions.argmax(axis=1), target_names=le.classes_))
# save the network to disk
print("[INFO] serializing network to '{}'...".format(args["model"]))["model"], save_format="h5")
# save the label encoder to disk
f = open(args["le"], "wb")
# plot the training loss and accuracy"ggplot")
plt.plot(np.arange(0, EPOCHS), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, EPOCHS), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, EPOCHS), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, EPOCHS), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy on Dataset")
plt.xlabel("Epoch #")
plt.legend(loc="lower left")

Make predictions on the test set. From there, a classification report is generated and printed to the terminal. The LivenessNet model is serialized to disk with the label encoder.

Generate a training history map for later inspection.

Training LivenessNet

python --dataset dataset --model liveness.model --le le.pickle

In vivo detection using OpenCV

The last step is to combine all parts:

  • We will access our webcam / video stream
  • Apply face detection to each frame
  • For each face detected, our in vivo detector model is applied

Open and insert the following code:

# import the necessary packages
from import VideoStream
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.models import load_model
import numpy as np
import argparse
import imutils
import pickle
import time
import cv2
import os
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", type=str, required=True,
	help="path to trained model")
ap.add_argument("-l", "--le", type=str, required=True,
	help="path to label encoder")
ap.add_argument("-d", "--detector", type=str, required=True,
	help="path to OpenCV's deep learning face detector")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args())

Import the package we need. It is worth noting that we will use-

  • VideoStream to access our camera feed.
  • img_to_array so that our framework adopts a compatible array format.
  • load_model loads our serialized Keras model.
  • The convenience of imutils.
  • cv2 is used for our OpenCV binding.

Let's parse our command line parameters:

  • – model: the path of our pre trained Keras model for in vivo detection.
  • – le: our path to the label encoder.
  • – detector: the path of OpenCV deep learning face detector, which is used to find face ROI.
  • – confidence: filter out the minimum probability threshold of weak detection.

Now let's continue to initialize the face detector, liveessnet model + tag encoder and our video stream:

# load our serialized face detector from disk
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])
modelPath = os.path.sep.join([args["detector"],
net = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
# load the liveness detector model and label encoder from disk
print("[INFO] loading liveness detector...")
model = load_model(args["model"])
le = pickle.loads(open(args["le"], "rb").read())
# initialize the video stream and allow the camera sensor to warmup
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()

OpenCV load face detector pass.

From there, we load our serialization, pre training model (LivenessNet) and label encoder. Our VideoStream object is instantiated and our camera is allowed to warm up for two seconds. At this point, it is time to start traversing the frame to detect human face and dummy face / deceptive face:

# loop over the frames from the video stream
while True:
	# grab the frame from the threaded video stream and resize it
	# to have a maximum width of 600 pixels
	frame =
	frame = imutils.resize(frame, width=600)
	# grab the frame dimensions and convert it to a blob
	(h, w) = frame.shape[:2]
	blob = cv2.dnn.blobFromImage(cv2.resize(frame, (300, 300)), 1.0,
		(300, 300), (104.0, 177.0, 123.0))
	# pass the blob through the network and obtain the detections and
	# predictions
	detections = net.forward()

To open an infinite while loop block, we start by capturing and resizing a single frame.

After resizing, we grab the size of the frame so that we can zoom later. Using the blobFromImage function of OpenCV, we generate a blob, and then continue the reasoning by passing it to the face detector network.

Now we are ready for the interesting part - in vivo detection using OpenCV and deep learning:

# loop over the detections
	for i in range(0, detections.shape[2]):
		# extract the confidence (i.e., probability) associated with the
		# prediction
		confidence = detections[0, 0, i, 2]
		# filter out weak detections
		if confidence > args["confidence"]:
			# compute the (x, y)-coordinates of the bounding box for
			# the face and extract the face ROI
			box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
			(startX, startY, endX, endY) = box.astype("int")
			# ensure the detected bounding box does fall outside the
			# dimensions of the frame
			startX = max(0, startX)
			startY = max(0, startY)
			endX = min(w, endX)
			endY = min(h, endY)
			# extract the face ROI and then preproces it in the exact
			# same manner as our training data
			face = frame[startY:endY, startX:endX]
			face = cv2.resize(face, (32, 32))
			face = face.astype("float") / 255.0
			face = img_to_array(face)
			face = np.expand_dims(face, axis=0)
			# pass the face ROI through the trained liveness detector
			# model to determine if the face is "real" or "fake"
			preds = model.predict(face)[0]
			j = np.argmax(preds)
			label = le.classes_[j]
			# draw the label and bounding box on the frame
			label = "{}: {:.4f}".format(label, preds[j])
			cv2.putText(frame, label, (startX, startY - 10),
				cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
			cv2.rectangle(frame, (startX, startY), (endX, endY),
				(0, 0, 255), 2)

Traverse face detection. In the loop:

  • Filter out weak detection.
  • Extract the face bounding box coordinates and ensure that they do not exceed the size of the frame.
  • The face ROI is extracted and preprocessed in the same way as our training data.
  • Use our living detector model to determine whether the face is "real" or "fake / deceptive".
  • Next is where you insert your own code to perform face recognition, but only real images. Pseudo code is similar to if label = = "real": run_face_reconition() ).
  • Finally (for this demonstration), we draw label text and a rectangle around the face.

Let's show our results and clean up:

# show the output frame and wait for a key press
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF
	# if the `q` key was pressed, break from the loop
	if key == ord("q"):
# do a bit of cleanup

Deploy our living detector into real-time video

Open a terminal and execute the following command:

python --model liveness.model --le le.pickle \
	--detector face_detector


In this tutorial, you learned how to perform vivisection using OpenCV. With this biometric detector, you can now find fakes in your own facial recognition system and perform reverse deception. To create our activity detector, we used opencv, deep learning, and Python.

The first step is to collect our true and false data sets. To accomplish this task, we:

First, use our smartphone to record our own video (i.e. "real" faces). Place your smartphone on your laptop / desktop, replay the same video, and then record the replay using our webcam (i.e. "fake" face).

Face detection is applied to two groups of videos to form our final living detection data set. After building our dataset, we implemented "LivenessNet", a Keras + deep learning CNN.

This network is deliberately shallow, ensuring that we reduce the chance of over fitting on our small data sets. The model itself can run in real time (including on Raspberry Pi).

Overall, our in vivo detector can achieve 99% accuracy on our verification set. To demonstrate the complete vivisection pipeline, we created a Python + OpenCV script that loads our vivisection detector and applies it to real-time video streams. As our demonstration shows, our in vivo detector can distinguish between true and false faces.

I hope you like today's blog about OpenCV live detection.
For the complete project, see:

Tags: Deep Learning NLP

Posted on Thu, 25 Nov 2021 17:06:47 -0500 by WendyB