Summary
50 hours of real footstep audio recordings for training sound event detection, activity recognition, and acoustic biometrics models. Manually verified files captured in natural indoor and outdoor environments, with detailed metadata for every recording: surface type, footwear, location, and background noise level. Larger than any publicly available footstep audio dataset, ready for commercial use under a clean license
Introduction
This dataset contains 50 hours of real-world footstep audio recorded in natural conditions: indoors and outdoors, across different surfaces, footwear, and noise environments. Every file was manually verified: each recording contains clearly audible footstep sounds, with no synthetic audio, no augmentation, and no AI-generated content
Each file ships with structured metadata describing the recording context, making this dataset directly usable for supervised learning across footstep detection, sound event classification, acoustic person identification, and walking surface recognition tasks
Dataset Features
Scale & Quality
- 50 hours of footstep audio recordings
- Manually verified files – every recording reviewed for clear footstep audibility
- Real-world field recordings – no synthetic audio, no augmentation
- Captured indoors and outdoors in natural conditions
Audio Specifications
- WAV files + M4A files
- Sample rate: 48 kHz (majority), with 44.1 kHz and 16 kHz subsets
- Mono and stereo recordings
- Recorded primarily on smartphones, with additional laptop and tablet captures
Metadata for Every File
- Surface type: wood/laminate, tile, carpet, concrete/asphalt, stairs, other
- Footwear: barefoot, slippers, sandals, sneakers, dress shoes/boots, other
- Location: indoor / outdoor
- Background noise level: low / medium / high
- Recording device class: smartphone / laptop / tablet
Use cases and applications
- Footstep detection in smart home, security, and IoT systems
- Sound event detection (SED) models that include footsteps as a target class
- Acoustic person identification – biometric models that recognize individuals by their walking sound
- Walking surface classification – distinguishing footsteps on different floor materials
- Foley generation – training data for AI sound design models targeting walking sequences
Why this dataset solves real production challenges
- Largest footstep audio dataset available commercially. Publicly available academic datasets cap at 14 hours (AFPID-II) or fewer than 1,000 samples (FSD50K, ESC-50). At 50 hours of curated recordings, this dataset replaces months of in-house data collection
- Manually verified, not scraped. Every file was reviewed by an annotator to confirm footsteps are clearly audible. No YouTube extracts, no synthetic generation, no contaminated samples
- Structured metadata across four dimensions. Surface, footwear, location, and noise are encoded per file, supporting both filtered training and multi-task learning setups
Sample dataset
A sample version of this dataset is available on Kaggle and HuggingFace. Leave a request in the form below for additional samples or the full version
Have a question?
The full dataset contains 50 hours. Each file is 10 to 100 seconds long. By volume of curated footstep audio, this is the largest commercially available dataset in the category, 3–5× larger than the most cited academic alternatives (AFPILD with 10 hours, AFPID-II with 14 hours). The full version is licensed for commercial training of production ML models
Every file ships with four structured metadata fields: surface type (wood/laminate, tile, carpet, concrete/asphalt, stairs, other), footwear (barefoot, slippers, sandals, sneakers, dress shoes/boots, other), location (indoor or outdoor), and background noise level (low, medium, high). This supports filtered training, multi-task learning, and stratified evaluation splits across walking conditions
Yes, this dataset is designed specifically for supervised training of footstep detection, sound event detection, and audio classification models. The 50 hours of verified positive samples cover the most common deployment surfaces and footwear types, with realistic background noise variation
Yes. Acoustic person identification is a recognized use case for this data. Compared to academic benchmarks like AFPILD (40 subjects) and AFPID-II (41 subjects), our dataset offers a different angle: broader surface and footwear coverage per recording, which lets you train models robust to environmental variation
Files in WAV format and in M4A. The majority of recordings are 48 kHz, with smaller subsets at 44.1 kHz and 16 kHz. This variation matches real-world deployment conditions, smart home microphones, phone-recorded audio, and embedded device captures all sample at different rates, so models trained on this distribution generalize better than those trained on a single fixed rate
Yes. A sample subset is freely available on our Kaggle and HuggingFace pages, you can download and explore the audio quality and metadata format directly. For an extended evaluation sample with specific surface or footwear conditions, leave a request through the form below
Contact us
Tell us about yourself, and get access to free samples of the dataset
Didn't find what you were looking for?
Our collection includes many datasets for various requests
iBeta Level 1 Dataset
– 35,000+ videos
– 85+ participants
– zoom in and
zoom out
iBeta Level 2 Dataset
– 25 000+ videos
– 3D masks
– iBeta Level 2
iBeta Level 3 Dataset
– 10,000+ videos
– 12 Unique Masks
– iBeta Level 3
Display Replay Dataset for Liveness Detection
– 9,000+ videos
– 6,500+ participants
– Balanced mix of genders and ethnicities



