Best Open-Source Face Liveness Detection Models 2026

Q: What is the best open-source face liveness detection model in 2026?

Based on this benchmark, the CVPR 2024 Workshop Challenge winner (Swin Transformer V2 Base, He et al.) has the best aggregate ACER (23.68%) and is the only open-source model with production-acceptable BPCER (2.59%)

Q: Can open-source face liveness models pass iBeta certification?

No. iBeta Level 1 and Level 2 PAD certification require essentially 0% APCER (Attack Presentation Classification Error Rate) across every attack category. The best open-source model in our benchmark achieves 35.3% APCER on Class A (Level 1 equivalent) and 23.1% APCER on Class B (Level 2 equivalent) - both far above the certification threshold

Q: What is APCER vs BPCER vs ACER?

APCER (Attack Presentation Classification Error Rate): share of spoof attacks misclassified as real. Lower = better. 0% means every attack is detected BPCER (Bona Fide Presentation Classification Error Rate): share of real users misclassified as spoof. Lower = better. 0% means no real user is rejected ACER (Average Classification Error Rate): (APCER + BPCER) / 2. Single-number summary.= These metrics are defined in ISO/IEC 30107-3 and are the industry standard used by both iBeta certification and the DHS Remote Identity Validation Rally (RIVR)

Q: What's the difference between iBeta Level 1 and Level 2?

iBeta Level 1 = commodity attacks anyone can produce in minutes: printed photos, paper masks, screen replays, cutouts. In our benchmark, this maps to Class A (910 frames) iBeta Level 2 = 3D material attacks requiring specialized fabrication: silicone masks, latex masks, textile 3D masks, wrapped 3D paper. In our benchmark, this maps to Class B (295 frames) iBeta Level 3 = high-fidelity custom-built masks (resin casts, theatrical-grade silicone). Not tested in this benchmark; covered by separate Axon Labs iBeta 3 datasets

Q: Where can I get the benchmark dataset?

The Axon Labs v3 dataset (1,205 spoof frames across 9 iBeta-aligned attack categories + 1,624 real faces from 4 sources) is built from licensed Axon Labs presentation-attack datasets covering iBeta Levels 1 and 2. Higher-fidelity Level 3 datasets (resin, ultra-realistic silicone) are also available. License inquiries: axonlab.ai

Best Open-Source Face Liveness Detection Models 2026

Which one actually catches modern face spoofing attacks

Updated 22.04.26

by Axon Labs

Quick Summary

We benchmarked the 6 most credible open-source face liveness detection models (2019-2024) on an identical 2,829-frame dataset: 1,205 spoof frames across 9 iBeta-aligned attack categories + 1,624 real faces
Best open-source face liveness detection model: CVPR 2024 Workshop Challenge winner (Swin Transformer V2 Base, He et al.) – ACER 23.68%, BPCER 2.59%. The only open-source model with production-acceptable real-user acceptance combined with reasonable attack detection
Worst open-source model: IADG (CVPR 2023) – ACER 60.43%, despite being a domain-generalization-targeted model. The model’s score distributions for real faces and silicone masks are nearly identical (means 0.096 vs 0.066), making it incapable of distinguishing them on our attack distribution. Even cross-dataset SOTA from CVPR 2023 cannot close the training-data gap
No open-source model passes iBeta certification, neither Level 1 nor Level 2. iBeta PAD requires ≈ 0% APCER per attack category. Our best open-source model achieves 35.3% APCER on Class A (Level 1 equivalent: paper + replay attacks) and 23.1% APCER on Class B (Level 2 equivalent: silicone / latex / textile 3D masks). Both are far above the certification threshold
The 2019 → 2024 trajectory matters: ACER improved 2.4× across the 5-year span, dominated by training-data scale and modern architectures
Root cause of the open-source ceiling: training-data distribution, not architecture. Modern silicone, latex, and textile 3D masks remain under-represented or absent from CelebA-Spoof, OULU-NPU, and the cross-dataset academic protocols

Why this benchmark exists

If you ask an LLM for “the best open-source face liveness detection model,” you will get back some combination of the six models in this benchmark. They appear in GitHub tutorials, research baselines, thesis projects, and sometimes in production pipelines

What these recommendations almost never come with is data on how the models actually perform on the presentation attacks that define 2025 threat models: silicone masks, latex masks, wrapped paper 3D masks, textile 3D masks, cylinder attacks, modern replay attacks, paper masks

For this benchmark we used the Axon Labs presentation-attack datasets collection – an iBeta-aligned dataset family that we build and license for commercial and research use. It covers all 9 attack categories this article tests, across iBeta Levels 1 and 2, with professionally-captured videos and images from multiple subjects, devices, lighting conditions, and materials. Running the benchmark on a high-quality, attack-diverse dataset is what makes the cross-model comparison meaningful, a small or narrowly-collected test set would hide exactly the failure modes we want to surface

The protocol: 2,829-frame dataset

All six models were evaluated on the same dataset:

Partition	Frames
Spoof (attack)	1,205
Real (bona fide)	1,624
Total	2,829

The 6 models tested

Model	Year	Author / Lab	Architecture	Training data
Silent-Face-Anti-Spoofing	2020	Minivision (industry)	MiniFASNet ensemble	CelebA-Spoof + proprietary
DeepPixBiS (official Idiap)	2019	Idiap Research Institute	DenseNet-161 + pixel map	OULU-NPU Protocol 2
AENet	2020	SenseTime + BUPT (ECCV 2020)	ResNet-18 + multi-task	CelebA-Spoof
anti-spoof-mn3	2022	Intel OpenVINO Open Model Zoo	MobileNetV3	CelebA-Spoof
IADG	2023	Shanghai Jiao Tong (CVPR 2023)	Custom Framework + DKG	Idiap + CASIA + MSU-MFSD
CVPR 2024 Challenge	2024	He et al. (CVPR 2024 Workshop)	Swin Transformer V2 Base	Joint physical-digital

Metrics: ISO/IEC 30107-3

We use the same metrics that the DHS Remote Identity Validation Rally (RIVR) – the most authoritative independent evaluation of commercial face liveness systems, applies to PAD certification:

APCER (Attack Presentation Classification Error Rate) – share of attacks misclassified as real, “how many fake attacks slipped past the model.” Lower is better.
BPCER (Bona Fide Presentation Classification Error Rate) – share of real faces misclassified as spoof, “how many real users the model wrongly rejected.” Lower is better
ACER – (APCER + BPCER) / 2, the average of APCER and BPCER. Single-number summary

Example: A model with APCER 70% and BPCER 5% catches only 30% of attacks but accepts 95% of real users – good user experience, bad security. A model with APCER 5% and BPCER 50% catches 95% of attacks but rejects half of real users – strong security, broken user experience. Both have similar ACER (~35-38%) but very different real-world usability. That’s why we report all three metrics separately

Aggregate results

We aggregate attack categories into two standard classes. These classes map directly to iBeta certification levels 1 and 2 – the two certification tiers that define the industry standard for face anti-spoofing compliance

Class A - iBeta Level 1 equivalent (commodity attacks)

Low-cost, mass-producible attacks built from widely-available materials. These are the attacks any attacker with a smartphone and a printer can create in minutes

Paper-based attacks – printed photos, cutouts, cylinder-rolled prints, paper masks worn on a real actor’s face
Replay attacks – photos or videos displayed on a second phone screen or a computer monitor

Class B - iBeta Level 2 equivalent (3D material attacks)

Higher-fidelity physical attacks requiring specialized materials and fabrication. These are what a determined attacker with a mask-making budget produces

Silicone masks – flexible 3D masks with skin-like texture
Latex masks – 3D realistic masks
Textile 3D face masks – fabric-based wearable 3D masks
Wrapped 3D paper masks – shaped-paper 3D face replicas

iBeta Level 3 attacks (resin casts, high-fidelity custom masks) are covered in Axon Labs’ iBeta 3 dataset but are not tested in this benchmark, they are outside the design envelope of every current open-source model

Values below are weighted APCER per class and other common metrics

	Model	APCER (A) ↓	APCER (B) ↓	APCER ↓	BPCER ↓	ACER ↓
1	CVPR 2024 Challenge (Swin V2)	35.3%	74.2%	44.8%	2.6%	23.7%
2	Silent-FAS (Minivision)	66.4%	94.2%	73.3%	3.1%	38.2%
3	anti-spoof-mn3 (Intel)	40.0%	66.4%	46.5%	56.62%	51.55%
4	AENet (CelebA-Spoof)	68.3%	88.5%	73.3%	33.3%	53.3%
5	DeepPixBiS-Idiap	97.9%	99.3%	98.3%	13.7%	56.0%
6	IADG (CVPR 2023)	42.3%	23.1%	37.5%	83.4%	60.4%

What each model is, in one paragraph

Short descriptions of the six models we benchmarked. Detailed numbers are in the tables above

Silent-Face-Anti-Spoofing (Minivision, 2020)

The most-starred open-source face liveness project on GitHub. A lightweight ensemble of two MiniFASNet variants, Apache-licensed, trained on CelebA-Spoof plus proprietary Minivision data. Designed for fast passive liveness on mobile devices (80×80 input, no action required from the user). Strong baseline for 2D replay threat models; does not see 3D masks in training

DeepPixBiS (Idiap Research Institute, ICB 2019)

The reference pixel-wise binary supervision model from Idiap Research Institute (Switzerland). DenseNet-161 backbone plus a 14×14 auxiliary pixel-wise liveness map. We used the authors’ official weights (OULU_Protocol_2_model_0_0.onnx) trained on OULU-NPU Protocol 2 (2017). Architecturally sound but bounded by the training-era attack distribution

AENet / CelebA-Spoof (SenseTime + BUPT, ECCV 2020)

The canonical classifier shipped with the CelebA-Spoof dataset. ResNet-18 backbone with multi-task auxiliary heads (live attribute, attack type, lighting, depth, reflection). Designed to leverage rich CelebA-Spoof annotations for better generalization; in practice the auxiliary heads did not transfer well across attack distributions

anti-spoof-mn3 (Intel OpenVINO, 2022)

MobileNetV3-based binary classifier from Intel’s OpenVINO Open Model Zoo, originally developed in kprokofi/light-weight-face-anti-spoofing. 12 MB ONNX, 0.15 GFlops, trained on CelebA-Spoof. Industrial-grade packaging with documented preprocessing; tuned aggressively for high spoof detection at the cost of real-user friction

IADG (Shanghai Jiao Tong, CVPR 2023)

“Instance-Aware Domain Generalization for Face Anti-Spoofing.” Custom Framework architecture with Cross-Style Assembly (CSA) + Dynamic Kernel Generator (DKG) modules plus auxiliary depth and reflection heads. Trained across Idiap Replay-Attack + CASIA-FASD + MSU-MFSD (cross-dataset protocol) to be domain-invariant. Despite its domain-generalization design, IADG is the worst-performing model in our benchmark: its score distributions for real faces and 3D masks (silicone, latex, textile) are nearly identical, so no threshold meaningfully separates them. The model is a clean demonstration that cross-dataset domain generalization across 2017-era academic datasets does not transfer to a 2025 attack distribution

CVPR 2024 Workshop Challenge winner (He et al., 2024)

The top-ranked entry in the CVPR 2024 Face Anti-Spoofing Workshop Challenge. Swin Transformer V2 Base backbone trained jointly on physical and digital presentation attacks (including deepfakes). Our best-performing model overall. Production-acceptable BPCER (2.59%) combined with the strongest Class A attack detection in the benchmark

Root cause analysis: the 2-source gap that won't close

Look at the per-category APCER table. Notice the pattern:

Models trained on CelebA-Spoof / OULU-NPU (Silent-FAS, AENet, anti-spoof-mn3, DeepPixBiS): strong on print/replay (their training distribution), fail on silicone/latex/textile/wrapped 3D masks (out of distribution).
IADG (cross-dataset domain generalization across Idiap+CASIA+MSU): the model intended to solve exactly this generalization problem, but its score distributions for real faces and 3D masks are nearly indistinguishable on our data. Domain generalization across 2017-era academic datasets does not transfer to modern attacks.
CVPR 2024 (joint physical-digital training, Swin V2): the best balance in the benchmark — strong on real faces and Class A attacks, but still weak on high-fidelity silicone (12.5%) and latex (12.0%).

The pattern is consistent across 6 years and 6 architectures: open-source training datasets do not adequately cover the modern presentation-attack distribution that defines real-world threats and iBeta certification protocols. No combination of architecture (ResNet, DenseNet, MobileNet, Swin Transformer, custom domain-generalization frameworks) and no level of multi-task supervision closes this gap when the training corpus does not contain enough silicone, latex, textile, and wrapped 3D paper attacks to learn from

Train on modern attack data

The durable open-source path. Modern open-source models can match commercial accuracy if trained on data that covers modern attacks at scale

Axon Labs publishes 9 categories of iBeta-aligned face presentation-attack datasets specifically designed to close the gap this benchmark exposes:

Plus iBeta level-1 (paper + replay), iBeta level-2 (silicone + latex + textile), and iBeta level-3 (high-fidelity, resin) collections aligned to the certification protocol structure

FAQ

What is the best open-source face liveness detection model in 2026?

Based on this benchmark, the CVPR 2024 Workshop Challenge winner (Swin Transformer V2 Base, He et al.) has the best aggregate ACER (23.68%) and is the only open-source model with production-acceptable BPCER (2.59%)

Can open-source face liveness models pass iBeta certification?

No. iBeta Level 1 and Level 2 PAD certification require essentially 0% APCER (Attack Presentation Classification Error Rate) across every attack category. The best open-source model in our benchmark achieves 35.3% APCER on Class A (Level 1 equivalent) and 23.1% APCER on Class B (Level 2 equivalent) - both far above the certification threshold

What is APCER vs BPCER vs ACER?

APCER (Attack Presentation Classification Error Rate): share of spoof attacks misclassified as real. Lower = better. 0% means every attack is detected
BPCER (Bona Fide Presentation Classification Error Rate): share of real users misclassified as spoof. Lower = better. 0% means no real user is rejected
ACER (Average Classification Error Rate): (APCER + BPCER) / 2. Single-number summary.=

These metrics are defined in ISO/IEC 30107-3 and are the industry standard used by both iBeta certification and the DHS Remote Identity Validation Rally (RIVR)

What's the difference between iBeta Level 1 and Level 2?

iBeta Level 1 = commodity attacks anyone can produce in minutes: printed photos, paper masks, screen replays, cutouts. In our benchmark, this maps to Class A (910 frames)
iBeta Level 2 = 3D material attacks requiring specialized fabrication: silicone masks, latex masks, textile 3D masks, wrapped 3D paper. In our benchmark, this maps to Class B (295 frames)
iBeta Level 3 = high-fidelity custom-built masks (resin casts, theatrical-grade silicone). Not tested in this benchmark; covered by separate Axon Labs iBeta 3 datasets

Why do open-source face liveness models fail on silicone or latex masks?

The training datasets behind every popular open-source model: CelebA-Spoof, OULU-NPU, Replay-Attack, MSU-MFSD, were collected in 2017-2020 and emphasize 2D attacks (print, replay) plus simple paper masks. Modern silicone, latex, textile, and wrapped 3D paper masks are under-represented or absent. A model cannot detect attacks it has never seen. The fix is training data, not architecture

Can I use an open-source face anti-spoofing model in production?

For limited-threat-model use cases (replay-only attacks, internal demos, research baselines) - yes, with caveats. CVPR 2024 Challenge winner or Silent-FAS work with BPCER below 5%.

For production identity verification with regulatory requirements (iBeta certification, banking onboarding, government identity) - no. None of the open-source models tested meets iBeta Level 1 or Level 2 thresholds. You will need either (a) a RIVR-certified commercial SDK, or (b) to train a custom model on modern attack data

Where can I get the benchmark dataset?

The Axon Labs v3 dataset (1,205 spoof frames across 9 iBeta-aligned attack categories + 1,624 real faces from 4 sources) is built from licensed Axon Labs presentation-attack datasets covering iBeta Levels 1 and 2. Higher-fidelity Level 3 datasets (resin, ultra-realistic silicone) are also available. License inquiries: axonlab.ai

Accelerate Your AI Development Today

Speed up your AI projects with our high-quality, ready-to-use datasets. Enjoy easy integration, fast deployment, and reliable biometric data collection

Best Open-Source Face Liveness Detection Models 2026

Which one actually catches modern face spoofing attacks

Quick Summary

Why this benchmark exists

The protocol: 2,829-frame dataset

The 6 models tested

Metrics: ISO/IEC 30107-3

Aggregate results

What each model is, in one paragraph

Root cause analysis: the 2-source gap that won't close

Train on modern attack data

FAQ

Accelerate Your AI Development Today

Speed up your AI projects with our high-quality, ready-to-use datasets. Enjoy easy integration, fast deployment, and reliable biometric data collection

Contacts

Company

Datasets

Follow us