video-captions
Provide captions for video content
Approximately 15% of adults have some degree of hearing loss. Captions are essential for deaf and hard-of-hearing users who cannot access audio content. They also benefit users in sound-sensitive environments (libraries, open offices), users watching without headphones in public, non-native speakers, and users with auditory processing disorders. WCAG SC 1.2.2 is a Level AA requirement — its absence is a legal compliance failure under the ADA, EN 301 549, and similar regulations worldwide.
Quick Reference
- Prerecorded video with audio: synchronized captions required — WCAG 2.1 SC 1.2.2 (Level AA)
- Live video with audio: real-time captions required — WCAG 2.1 SC 1.2.4 (Level AA)
- Use
<track kind='captions'>with a.vtt(WebVTT) file for HTML5<video>elements - Captions must include all spoken dialogue, speaker identification, and relevant non-speech audio (music, sound effects)
- Subtitles and captions are different: captions include non-speech audio; subtitles translate dialogue only
Check
Find all <video> elements and video embeds (<iframe> from YouTube, Vimeo, etc.). For each <video> with audio: check for a <track> child element with kind='captions' and a valid src pointing to a .vtt file. Verify the default attribute is present on at least one track so captions are on by default (or document the UX reason they are off by default). For YouTube/Vimeo embeds: check that the platform's caption toggle is accessible. Also check that the .vtt file exists and is valid (not empty, not just music notes).
Fix
For <video> elements without captions: (1) Create a WebVTT (.vtt) file containing synchronized caption text — include all spoken words, speaker IDs for multi-speaker content, and descriptions of relevant sounds (e.g., '[applause]', '[upbeat music]'). (2) Add <track kind='captions' srclang='en' label='English' src='captions-en.vtt' default> inside the <video> element. (3) For auto-generated captions (YouTube, AI tools): review and correct errors — auto-captions average 80% accuracy and often fail on proper nouns, technical terms, and accented speech. (4) For live streams: implement real-time captioning via a third-party captioning service or CART (Communication Access Realtime Translation).