Selfie with ID Dataset for Face Identification

Selfie & ID Photo Dataset

150,000+ facial images from 12,000+ unique individuals: 10-15 photos per ID

Check samples on Kaggle

Introduction

The Selfies & ID Photos Face Recognition Dataset is a comprehensive, enterprise-grade collection designed for training robust face recognition, identity verification, and KYC systems. With 10-15 high-quality images per person (diverse selfies + 2 official ID document photos), this dataset provides the depth and variety needed for production-ready AI models

Why this dataset solves real production challenges

Enterprise Face Recognition systems fail in production not from algorithm weakness, but from training data gaps.
This dataset addresses the 3 critical gaps we identified from our client deployments:

  • Insufficient variation per identity
    Problem: Models trained on 1-3 photos/person fail when users change appearance
    Solution: 15 diverse selfies per person = robust to lighting, pose, accessories, time
  • Selfie-to-document matching gap 
    Problem: Models trained only on selfies can’t verify against official ID photos
    Solution: Same person in both casual selfies AND official documents (rare combination)
  • Demographic bias in production
    Problem: Models perform poorly on underrepresented ethnicities
    Solution: Balanced coverage across Caucasian, Asian, African, Latin American, Arab population

Dataset summary

  • 12,000+ real individuals (not synthetic/AI-generated)
  • 10-15 facial images per person – a mix of selfies and paired ID document photos
  • Selfies + Official ID Photos – unique combination for identity verification
  • Balanced demographics – ages 18-65
  • Multi-ethnic coverage – Caucasian, African, Asian, Latin American
  • Real-world conditions – diverse backgrounds, lighting, expressions

Examples

Composition

Parameter
Value
Total Participants
12,000+ unique individuals
Total Images
150,000+ facial images
Images per Person
10-15 (selfies + 2 official ID photos)
Metadata Fields
Demographics, device info, temporal data

Demographics

Category
Coverage
Age Range
18-65 years (wide distribution)
Gender
Balanced male/female split
Ethnicities
Caucasian, African, East Asian, South Asian, Latin American

Structured Metadata Included

Dataset includes file with structured metadata for each participant:

  • Demographics – Gender, ethnicity, age group for balanced training
  • Device Information – OS type (Android/iOS/Windows), device model for multi-device analysis  
  • Temporal Data – Historic photo year timestamps for age-gap analysis
  • Photo Categories – Indoor/outdoor/lighting conditions for scenario-based filtering

Source and collection methodology

The Selfies & ID Photos Face Recognition Dataset was collected through a structured, multi-stage process involving a diverse group of participants recruited from multiple geographic regions. All data collection followed strict ethical guidelines with full informed consent obtained from each participant prior to any image capture

Training Face Recognition Models for Identity Verification

The Selfies and Paired ID Photos Dataset provides training data for face recognition models, face verification systems, and KYC identity verification pipelines. ML engineers use this data to train face matching models that compare selfie photos against ID document images – the core task in remote onboarding flows and biometric authentication systems across banking, fintech, and government applications

Use cases and applications

  • Face Recognition & Detection. Train robust face recognition and detection models using diverse selfies per person to identify individuals across varying lighting, poses, and environmental conditions in security, surveillance, and access control applications
  • KYC & Identity Verification. Automate customer identity verification by matching live selfies against official ID document photos for banking onboarding, fintech applications, and regulatory compliance with AML/KYC requirements
  • Biometric Authentication. Implement secure facial biometrics for mobile device unlocking, payment authorization, multi-factor authentication, and physical access control systems with sub-second response time and 99%+ accuracy

Download information

A sample version of this dataset is available on Kaggle. Leave a request for additional samples in the form below

Have a question?

Each unique individual in the dataset has 10-15 facial images - a mix of selfie photos taken in different conditions and a paired official ID document photo. This multi-image structure enables training face recognition models to handle variations in lighting, pose, and time intervals between captures, which is the most common failure mode in production identity verification systems

We collect data from our internal team. All information is further verified by our specialists

Once your enquiry has been sent, we will contact you to discuss the details and complete the necessary paperwork. The timing of receiving the dataset depends on the specific request and additional requirements

Our unique selling point is to provide legally clean datasets to our customers. We obtain the consent from all the participants to use their data for AI model development. We are able to provide comprensive reporting on the licensing, data collection and privacy compliance of our datasets. Although there seems to be a diverse response to how to control AI development and deployment, we are able to service global customers seeking to launch global AI products.

The price depends on your specific requirements. Please submit a request to receive a free consultation

Contact us

Tell us about yourself, and get access to free samples of the dataset 

Didn't find what you were looking for?

Our collection includes many datasets for various requests

© 2022 – 2026 Copyright protected