Face Anti-Spoofing: Technologically Identifying a Fraudster by Their Face

META

Activist
SUPREME
MEMBER
Joined
Mar 1, 2026
Messages
118
Reaction score
379
Deposit
0$
Face Anti-Spoofing: Technologically Identifying a Fraudster by Their Face

Biometric identification of a person is one of the oldest ideas for recognizing people that engineers ever attempted to implement technically. Passwords can be stolen, observed, or forgotten; keys can be forged. But the unique characteristics of a person themselves are much harder to fake or lose. These may include fingerprints, voice, retinal blood vessel patterns, gait, and more.

Of course, biometric systems are also being deceived! That is exactly what we will talk about today. How attackers try to bypass face recognition systems by impersonating another person, and how such attempts can be detected.

A video version of this story can be watched here, and those who prefer reading are invited to continue below.

According to Hollywood directors and science-fiction writers, fooling biometric identification is quite simple. One only needs to present the system with the “required parts” of the real user—either separately or by taking the user hostage entirely. Another option is to “wear someone else's face,” for example by physically transplanting a mask or even presenting fake genetic characteristics.

In real life, criminals also attempt to impersonate someone else. For example, robbing a bank while wearing a mask of a Black man, as shown in the image below.

Face recognition appears to be a very promising direction for use in the mobile sector. While fingerprint authentication has long been widely adopted and voice technologies are gradually evolving in a predictable way, the situation with face identification has developed in a rather unusual way and deserves a brief historical overview.


---

How It All Began: From Science Fiction to Reality

Modern recognition systems demonstrate extremely high accuracy. With the emergence of large datasets and complex architectures, face recognition accuracy has reached levels such as 0.000001 (one error per million), making them suitable for mobile platforms. However, the main weakness remains their vulnerability.

To impersonate another person in the real world, rather than in movies, attackers most often use masks. They attempt to trick computer systems by presenting someone else’s face instead of their own.

Masks can vary widely in quality—from a printed photograph held in front of the face to complex 3D masks with heating elements. They can be presented separately (for example, a sheet of paper or a screen) or worn on the head.

Significant attention was drawn to the issue when researchers successfully deceived the Face ID system on the iPhone X using a sophisticated mask made of stone powder with inserts around the eyes that imitated the warmth of a living face via infrared radiation.

Such vulnerabilities are extremely dangerous for banking or government authentication systems based on facial recognition, where unauthorized access may lead to serious losses.


---

Terminology

The field of face anti-spoofing is relatively new and does not yet have a fully established terminology.

Let us call an attempt to deceive an identification system by presenting a fake biometric parameter (in this case, a face) a spoofing attack.

The set of protective measures designed to resist such deception will be called anti-spoofing. It may be implemented through a wide variety of technologies and algorithms integrated into the identification system pipeline.

ISO standards propose a broader terminology set, including:

Presentation attack – attempts to cause incorrect identification or avoid identification by presenting images, recorded videos, etc.

Normal (Bona Fide) – normal system operation (not an attack).

Presentation attack instrument – the tool used to perform the attack, such as an artificially created body part.

Presentation attack detection – automated means of detecting such attacks.


However, the standards are still under development, so the terminology is not yet fully established, especially in Russian.


---

Evaluation Metrics

The performance of such systems is often measured using HTER (Half-Total Error Rate):

HTER = (FAR + FRR) / 2

Where:

FAR (False Acceptance Rate) – incorrectly allowed identifications

FRR (False Rejection Rate) – incorrectly denied identifications


Biometric systems usually focus on minimizing FAR, preventing attackers from entering the system. However, reducing FAR often increases FRR, meaning legitimate users are mistakenly rejected.

For government or military systems this may be acceptable. But mobile technologies, which operate at huge scale and prioritize user experience, are very sensitive to rejection errors. If you want to reduce the number of phones smashed against the wall after the tenth failed authentication attempt, you should pay attention to FRR.


---

Types of Attacks

Let us examine how attackers deceive face recognition systems.

Mask Attack

The most obvious method is wearing a mask of another person.

Printed Attack

A photo of a person is printed on paper and shown to the camera.

Replay Attack

A more advanced attack involves displaying a pre-recorded video of another person on a screen.

This method is very effective because many systems rely on time-based signals such as:

blinking

micro head movements

facial expressions

breathing


All these can be reproduced in video.


---

Typical Attack Artifacts

Printed attacks often show:

reduced texture quality due to printing

halftone printing artifacts

horizontal printer lines

absence of local motion

visible borders of the printed image


Replay attacks often reveal:

moiré patterns

reflections or glare

flat image without depth

visible screen edges



---

Classical Attack Detection Algorithms

One of the earliest approaches (2007–2008) relied on blink detection.

The idea was to train a classifier capable of distinguishing open and closed eyes in a video sequence. Systems then ask the user to perform random actions such as:

turning the head

blinking

smiling


If the sequence is random, it is difficult for attackers to prepare.

However, honest users sometimes also fail these tests, which reduces usability.

Another approach analyzes image quality degradation caused by printing or screen display.

A popular method uses Local Binary Patterns (LBP). For each pixel, the algorithm compares it with its eight neighbors and builds an 8-bit pattern describing local texture. These patterns are then aggregated into histograms and fed into an SVM classifier.

The HTER of this approach was around 15%, meaning many attackers still succeeded.

Later improvements reduced HTER to around 3% by analyzing different color channels and applying LBP again.


---

Detecting Moiré Patterns

Another method focuses on detecting moiré artifacts, periodic patterns caused by overlapping screen pixel grids.

This approach achieved around 6% HTER on several datasets.


---

Motion-Based Detection

Instead of analyzing a single image, some methods analyze motion across frames using optical flow.

One approach achieved HTER of 1.52%.

Another method applied Eulerian video magnification, amplifying subtle color changes related to pulse signals, and combined this with optical-flow-based features.

This method achieved extremely good results on certain datasets.


---

Deep Learning Approaches

Eventually the field shifted toward deep learning.

One early deep learning approach analyzed depth maps. Since printed photos are flat, they have no real depth.

A neural network predicted depth maps for facial patches and combined them with the global image depth map.

This approach achieved HTER of 3.78%.

However, many later works simply combined multiple neural networks trained on existing datasets.

Some extremely complex architectures achieved HTER as low as 0.04%—but only on specific datasets.


---

The Dataset Problem

When models trained on one dataset were tested on another, performance dropped dramatically.

For example:

Training on Idiap → testing on MSU: 90.5% TPR

Training on MSU → testing on Idiap: 47.2% TPR

Training on MSU → testing on CASIA: 10.8% TPR


This shows that many systems fail to generalize.


---

Competitions

In 2017, the University of Oulu organized a competition using a new dataset designed for mobile scenarios.

Protocols included:

1. Different lighting and backgrounds


2. Different printers and screens


3. Different smartphone cameras


4. Combination of all factors



The best system achieved about 10% error on the hardest protocol.

Most participants used familiar techniques:

pretrained CNNs

texture analysis

color features

frame-pair analysis



---

Alternative Approaches

One promising approach uses remote photoplethysmography (rPPG) to detect a person’s pulse from video.

Light interacts with living skin differently depending on blood flow, making it possible to detect heartbeat signals.

Masks or screens cannot reproduce this signal.

Another method combined depth estimation and rPPG signals in a complex neural network pipeline.

However, cross-dataset accuracy still remained relatively low.


---

Motion Geometry Approach

Another approach analyzes micro-movements of facial features.

For example:

When a real head turns, distances between facial landmarks change in specific ways. But when a flat photograph moves, these changes behave differently.

This method uses recurrent neural networks to analyze frame sequences.


---

Active Illumination

Researchers from Tencent proposed an active illumination method.

Instead of passively observing the scene, the system dynamically illuminates the face using patterns displayed on the screen (called light CAPTCHA).

Reflections and scattering are then analyzed.

This method achieved around 1% error and has reportedly been deployed in real systems.


---

Conclusion

Several important conclusions can be drawn:

Classical handcrafted features such as LBP, blinking detection, breathing detection, and motion analysis remain valuable.

The best solution will likely combine multiple methods: depth analysis, reflection analysis, and physiological signals.

Additional modalities such as voice data may further improve reliability.

Existing datasets have reached saturation; new datasets are needed.

Face recognition technology has progressed faster than face anti-spoofing, creating a security gap.

A system-level approach combining multiple technologies is required.


With the growing interest in facial recognition and the entry of large companies into this field, new opportunities have emerged for ambitious research teams to develop fundamentally new solutions.
 
Top Bottom