Liveness Detection Datasets

Practical Guide to Face Anti-Spoofing Data

April, 2026

by Axon Labs

A face recognition system that does not also detect spoofing is not deployable. Once your verification model is good enough that it stops failing on real users, the next thing that breaks is an attacker holding up a printed photo, a phone screen, a silicone mask, or a deepfake video. Stopping these is the job of presentation attack detection (PAD), also called face anti-spoofing or liveness detection.

This guide walks through the public PAD datasets that face anti-spoofing teams still rely on in 2026, what each one is good and bad for, and where commercial datasets close the gaps that academic ones leave behind. Every entry uses the same evaluation framework so you can compare them at a glance.

If you are building face recognition itself, see our companion guide to face recognition datasets.

Datasets at a glance

Dataset
Year
Subjects
Volume
Attack types
License
Source
CASIA-FASD
2012
50
600 videos
Print, cut-photo, replay
Research
Replay-Attack
2012
50
1,300 videos
Print, mobile replay, HD replay
Research
3DMAD
2013
17
76,500 frames
3D paper-craft masks
Research
MSU-MFSD
2014
35
280 videos
Print, replay (phone, tablet)
Research
Replay-Mobile
2016
40
1,190 videos
Print, replay, mobile capture
Research
HKBU-MARs V2
2016
12
1,008 videos
6 types of 3D masks, 7 cameras
Research
OULU-NPU
2017
55
4,950 videos
Print, replay; 6 mobile devices
Research
SiW (Spoof in the Wild)
2018
165
4,478 videos
Print, replay
Research
ROSE-Youtu
2018
20
3,350 videos
Print, replay, 3D mask
Research
SiW-M
2019
493
1,630 videos
13 attack types incl. masks, makeup, partial
Research
CASIA-SURF
2019
1,000
21,000 videos
Print, cut-photo
Research
CelebA-Spoof
2020
10,177
625,537 images
10 spoof types, 43 attributes
Research
FaceForensics++
2019
1,000 originals + 4,000 manipulated
4 deepfake methods
Research
Celeb-DF v2
2020
59
5,639 deepfake + 590 real
Deepfakes
Research
Silicone Mask Attack
2024
10,000+ videos
18 silicone masks
Commercial, GDPR
Photo Print Attacks
2025
3,000+
10,000+ videos
Print (matte, glossy, cut-photo, warped)
Commercial, GDPR
Replay Display Attacks
2025
6,500+
9,000+ videos
Screen replay (phone, tablet, monitor)
Commercial, GDPR

A note up front: none of the public datasets above are licensed for commercial use. Every single one is research-only. If you are building a product that has to pass an iBeta certification, a NIST FRVT submission, or a customer security review, you will eventually need a commercially licensed source – see the Axon Labs’ dataset collection

Classic 2D PAD benchmarks

These are the datasets the field was built on. They are small and they only cover 2D attacks (print and replay), but every face anti-spoofing paper still references them as a comparability anchor

Released by the Chinese Academy of Sciences in 2012. 600 videos from 50 subjects, with three attack types: warped printed photo, cut-photo (eye holes), and video replay on a tablet. Captured at three image qualities (low, normal, high) using different cameras

CASIA-FASD is the original face anti-spoofing benchmark. It is small enough to be used as an evaluation set and is often paired with Replay-Attack for cross-database evaluation in PAD papers. It has no 3D mask attacks at all

Released by Idiap Research Institute the same year as CASIA-FASD. 1,300 videos from 50 subjects: 200 real-access videos plus 1,000 attack videos and 100 enrollment videos. Attacks include high-resolution print, mobile-phone replay and tablet replay, captured under two lighting conditions

Replay-Attack and CASIA-FASD together formed the original cross-database benchmark for PAD generalisation. Both are saturated by modern models on the in-database protocol but still useful for cross-database experiments

Released by the Chinese Academy of Sciences in 2014, CASIA-WebFace was the first widely available web-scale training set for face recognition: 494,414 images of 10,575 identities, scraped from IMDb. For several years it was the default training corpus for academic models, and many older open-source face recognition models (early ArcFace, SphereFace) were trained on it.

Today it is too small to train a competitive model from scratch, but it is still a clean, reasonably balanced starting point for fine-tuning experiments. License is research-only

Idiap’s mobile-era follow-up to Replay-Attack. 1,190 videos from 40 subjects captured on mobile devices (smartphone and tablet) under five lighting conditions. Attacks are matte and high-quality print plus screen replay. The first PAD benchmark explicitly designed around mobile capture, which by the late 2010s had become the dominant deployment scenario

Mobile-era and large-scale 2D benchmarks

A 2017 release from University of Oulu and NPU. 4,950 videos from 55 subjects, captured on six different mobile devices across three sessions with different illumination and background. Attack types are print and replay, generated on two printers and two display devices

OULU-NPU is the canonical mobile-era PAD benchmark. Its four evaluation protocols specifically test cross-device, cross-environment and cross-attack-type generalisation, which is what production mobile liveness systems actually have to do. If you are evaluating a mobile PAD model and you only run one public benchmark, run OULU-NPU

Michigan State’s larger PAD release. 4,478 videos from 165 subjects, captured at 1080p with significant variations in pose, expression, illumination and distance. Three protocols cover cross-pose-and-expression generalisation, cross-medium attacks and cross-protocol setups. SiW set the standard for “in the wild” 2D PAD before 3D and multi-modal datasets took over

The most attack-diverse public PAD dataset. 1,630 videos from 493 subjects, with 13 distinct attack types: print, replay, half mask, silicone mask, transparent mask, paper mask, mannequin head, obfuscation makeup, impersonation makeup, cosmetic makeup, partial paper, partial funny eye, and partial paper-glasses. Designed specifically for zero-shot PAD evaluation, training on a subset of attack types and testing on unseen ones

SiW-M is the closest public proxy for the full attack landscape an iBeta Level 2 / Level 3 evaluation will throw at your model. It is still small (only 1,630 videos), but no other public set covers this many attack types in a single corpus

3,350 videos from 20 subjects, captured on five mobile devices. Includes printed photo, video replay and 3D paper-mask attacks. Smaller than OULU-NPU but valuable for cross-device PAD research

3D mask attack datasets

The 2D benchmarks above will not protect you from a silicone or latex mask attack. These datasets fill that gap, but they are tiny

The first public 3D mask attack dataset. Idiap, 2013. 76,500 video frames from 17 subjects, captured with Microsoft Kinect (RGB + Depth). Masks are paper-craft 3D head models from ThatsMyFace.com – low fidelity by 2026 standards but still valuable as a cross-modality benchmark

Hong Kong Baptist University, 2016. 1,008 videos from 12 subjects, with six types of 3D masks captured on seven different cameras. Designed specifically for remote photoplethysmography (rPPG) PAD research, using subtle skin colour changes to detect a real face

The combination of multiple mask types and multiple cameras makes HKBU-MARs more useful than its small subject count suggests, but 12 people is still 12 people. It is not a training set; it is an evaluation set

Multi-modal and multi-channel PAD

Modern PAD systems increasingly use more than just RGB,  depth from a structured-light or time-of-flight sensor, infrared, sometimes thermal. Several datasets target this

The first large-scale multi-modal PAD benchmark. 21,000 videos from 1,000 subjects, captured simultaneously in RGB, Depth and IR with an Intel RealSense camera. Attack types are limited (print and cut-photo), but the modality coverage and subject volume made CASIA-SURF the standard multi-modal PAD benchmark for several years

The cross-ethnicity extension. 23,583 videos from 1,607 subjects across three ethnicities (African, East Asian, Central Asian), in RGB + Depth + IR. Adds 3D attacks (silicone and paper masks) on top of the 2D attacks from CASIA-SURF. CeFA is the first PAD dataset with explicit ethnic labels, important if you need to demonstrate that your model is not biased across demographic groups, which is increasingly a regulatory requirement

Large-scale 2D spoof datasets

The largest public PAD dataset by image count. 625,537 images of 10,177 subjects, built on top of CelebA, which is why the subject count is so high. Each image carries 43 attributes: the 40 face attributes inherited from CelebA plus three PAD attributes (spoof type, illumination, environment). Ten spoof types are covered, captured by more than ten sensor models across eight environment-illumination scenes

CelebA-Spoof is the closest the public ecosystem gets to a “train a deep PAD model from scratch” corpus. The trade-off is that it is image-based, not video, so it cannot be used for temporal liveness signals. RGB only, research-only license

Deepfake detection datasets

Deepfakes are a different attack category from physical presentation attacks, but most production face verification systems now have to detect both. The three datasets below are the academic standard for deepfake detection

A 2019 release with 1,000 original YouTube videos and corresponding manipulated versions generated by four methods: DeepFakes, Face2Face, FaceSwap and NeuralTextures. Provides four compression levels to test detector robustness

FaceForensics++ is the deepfake detection benchmark that almost every paper still reports on. It is small by modern standards but the four-method coverage and the quality controls make it the canonical evaluation set

590 original celebrity videos and 5,639 deepfake videos generated with an improved synthesis pipeline. Celeb-DF is widely used as a cross-dataset test set: train on FaceForensics++ or DFDC, test on Celeb-DF, and see how badly your detector generalises. The answer is usually “very badly”, which is the entire point of the dataset

Where public datasets stop being enough

PAD is the area where the gap between public datasets and what production needs is the widest. Six recurring problems:

  1. They are too small for the attack types that matter most. SiW-M has 13 attack types, but only 1,630 videos total. HKBU-MARs has six 3D mask types, but 12 subjects. Public 3D-mask data simply does not exist at training scale
  2. They use low-fidelity attack instruments. Most public 3D mask datasets use paper-craft or low-cost rigid masks. Real attackers use high-quality silicone, custom latex, and increasingly 3D-printed resin. None of these are well represented in academic data
  3. They do not cover iBeta certification requirements. iBeta Level 1 requires print, paper-mask and replay attacks at scale. iBeta Level 2 requires high-quality 3D masks (silicone, latex, resin). iBeta Level 3 requires very high fidelity 3D masks. No public dataset gives you the volume and quality to pass any iBeta level reliably, and a successful iBeta certification is a hard prerequisite for almost every regulated KYC, fintech and identity verification deployment
  4. They lack matched real-and-spoof pairs. Many datasets contain attack videos and bonafide videos from different people. Production PAD models need genuine and attack videos from the same identity to learn the right discriminative features
  5. They under-represent demographics. Most are East Asian and European subjects. Skin tone matters for PAD because some texture and rPPG signals behave differently across tones, and a model that fails on darker skin will fail your fairness review
  6. They are research-only, with no commercial license at all. Every public PAD dataset listed above prohibits commercial use. There is no public PAD dataset you can legally train a shipped product on

For research and benchmarking, public datasets are fine and you should use them. For shipping a liveness system that has to pass iBeta and a customer security review, you will need a commercial source

Commercial liveness datasets from Axon Labs

Axon Labs builds and licenses face anti-spoofing datasets specifically for commercial PAD systems. Every dataset is GDPR-compliant, collected with informed consent, and licensed for commercial training and evaluation. 21% of the companies certified by iBeta in 2025 are Axon Labs clients,  these datasets are directly aligned with what passes certification

iBeta Level 1 Dataset

The dataset for teams preparing an iBeta Level 1 PAD certification. 35,000+ attack videos covering all Level 1 presentation attack instrument types: paper masks, photo prints, video replay attacks, captured under the conditions and protocols that iBeta evaluates against. Aligned bonafide videos from the same identities are included → iBeta Level 1 Dataset

iBeta Level 2 Dataset

For Level 2 certification: 25,000+ videos of higher-fidelity 3D attacks including silicone masks, latex masks and cloth 3D masks, plus the Level 1 attack categories. This is the level most fintech and KYC vendors actually need to pass → iBeta Level 2 Dataset

iBeta Level 3 Dataset

For the highest tier of presentation attack certification: high-fidelity 3D mask attacks designed to defeat sophisticated PAD systems → iBeta Level 3 Dataset

Silicone Mask Attack Dataset

A focused dataset of 10,000+ videos generated from 18 different silicone masks, captured under varied lighting and device conditions. Use it when you need targeted training data for the silicone-mask failure mode rather than a full iBeta corpus → Silicone Mask Attack Dataset

Photo Print Attacks Dataset

Paper-based spoofing data from 3,000+ individuals, covering matte, glossy, cut-photo and warped print attacks across multiple printers and paper grades → Photo Print Dataset

Replay Display Attacks Dataset

Screen-replay spoofing across multiple display devices (smartphones, tablets, monitors) and recording cameras, the second most common 2D attack vector after print → Replay Display Dataset

Specialised mask and PAI datasets

Beyond the iBeta-aligned core, Axon Labs maintains specialised attack datasets for narrower research and evaluation needs: high-fidelity rubber masks, 3D resin masks, wrapped 3D attacks, advanced 3D paper masks, 3D latex masks, cutout prints, and IR+RGB iBeta data. See the full datasets catalogue for the complete list

Custom liveness data collection

When your problem is narrower than any catalogue dataset,  a specific attack instrument, a specific device, a specific country, a specific demographic,  Axon Labs runs custom collections under the same consent and compliance framework → Contact us for a scoped quote

Which dataset should you actually use?

A short decision guide based on what we hear most often from PAD teams:

  • You are writing a paper or running a benchmark. Use OULU-NPU for mobile 2D PAD, SiW-M for multi-attack zero-shot, and CASIA-SURF CeFA for multi-modal cross-ethnicity. Cross-database evaluation against CASIA-FASD and Replay-Attack is still expected by most reviewers
  • You are training a deep PAD model from scratch on RGB only. CelebA-Spoof is the only public option with the volume to support it
  • You are building a multi-spectral PAD system. Use WMCA or HQ-WMCA. There is no public substitute for the four- and five-channel coverage
  • You are preparing for iBeta Level 1 certification. Public data will not get you there → iBeta Level 1 Dataset
  • You are preparing for iBeta Level 2 → iBeta Level 2 Dataset
  • You need to harden a production PAD model against silicone masks specifically → Silicone Mask Attack Dataset
  • You are also building face recognition. See our companion guide on face recognition datasets
  • You need to detect deepfakes as well as physical presentation attacks. Train on DFDC, evaluate cross-domain on Celeb-DF and FaceForensics++. Combine with a physical PAD model: deepfake detectors and physical-attack detectors do not generalise to each other

Frequently asked questions

There is no single best dataset, the right choice depends on what attack types and modalities you care about. For mobile 2D PAD, OULU-NPU is the canonical benchmark. For attack-type diversity, SiW-M covers 13 attack categories in one corpus. For multi-modal PAD, WMCA (RGB + Depth + IR + Thermal) and CASIA-SURF CeFA (RGB + Depth + IR with cross-ethnicity labels) are the strongest options. For training a deep model from scratch, CelebA-Spoof is the largest by image count

No. Every public face anti-spoofing dataset listed above is licensed for academic research only. Training a commercial liveness system on them is a license violation, and most customer security reviews will catch it

No public dataset covers iBeta Level 1 protocols at the volume and quality required. Commercially licensed datasets such as the iBeta Level 1 Dataset from Axon Labs are built specifically against the iBeta PAI requirements

SiW-M includes a small number of silicone mask videos as one of its 13 attack categories. WMCA and HQ-WMCA include flexible silicone masks alongside other PAI types in a multi-channel setup. None of these are large enough for production training. The Silicone Mask Attack Dataset from Axon Labs provides 10,000+ videos generated from 18 distinct silicone masks under commercial license

Yes. Physical presentation attack detectors (trained on print, replay, mask data) do not transfer to deepfake detection, and vice versa. Production face verification systems usually run two models in parallel, a physical PAD model and a deepfake detector, and need separate training data for each

Use CASIA-SURF CeFA, which has explicit ethnicity labels across African, East Asian and Central Asian groups, and report per-group APCER (attack presentation classification error rate) and BPCER (bona fide presentation classification error rate). Treat any group-to-group gap of more than a few percentage points as a finding that needs targeted training data, not just a tweaked threshold

Accelerate Your AI Development Today

Speed up your AI projects with our high-quality, ready-to-use datasets. Enjoy easy integration, fast deployment, and reliable biometric data collection

© 2022 – 2026 Copyright protected