SUMMARY - Captions, Transcripts, and Inclusion
Captions, Transcripts, and Inclusion
When a government posts an important video announcement without captions, when a university lecture is recorded but not transcribed, when a viral social media video spreads information that deaf and hard of hearing people cannot access, the question of who belongs in public discourse becomes concrete. Captions and transcripts are not amenities but infrastructure for inclusion.
Who Benefits from Captions and Transcripts
The most obvious beneficiaries are people who are deaf or hard of hearing—approximately 3.2 million Canadians according to Statistics Canada. But the benefits extend far beyond.
People learning a language use captions to connect spoken words with text. People watching in noisy environments (transit, waiting rooms, open offices) rely on captions when audio is impractical. People with auditory processing differences, including many autistic people and people with ADHD, often understand content better when they can read along. People searching for specific information can search transcripts far more easily than scanning through video. And anyone can benefit from the flexibility to engage with content in different ways.
Research consistently shows that captions improve comprehension and retention for most viewers, not just those who require them. The curb-cut effect—where accessibility features benefit everyone—applies powerfully to captioning.
The Current State in Canada
Broadcasting Requirements
The CRTC requires Canadian broadcasters to caption 100% of English- and French-language programming during the broadcast day. Quality standards require accuracy, synchronization, and completeness. However, compliance varies, and quality issues—particularly with live captioning—remain common.
Government and Public Sector
Federal accessibility requirements under the Accessible Canada Act and provincial requirements like Ontario's AODA apply to government communications, including video content. In practice, implementation is uneven. Critical public health information during the COVID-19 pandemic was sometimes released without captions or sign language interpretation, demonstrating that compliance lags behind legal requirements.
Online Platforms
Major platforms offer automated captioning with varying accuracy. YouTube's auto-captions have improved significantly but still make errors, particularly with accents, technical terms, proper nouns, and speakers with disabilities affecting speech. TikTok, Instagram, and other platforms have added auto-captioning features, but quality and availability vary.
User-generated content remains largely uncaptioned. Creators may not know how to add captions, may not consider them, or may find the process too time-consuming. Platform design that makes captioning easy and normalized could change this, but current interfaces often treat captions as an afterthought.
Education and Workplace
Post-secondary institutions typically provide accommodation for students requiring captions, but students often must request accommodations, navigate bureaucratic processes, and wait for materials to be captioned—placing the burden on those already facing barriers. Real-time captioning for lectures remains expensive and not universally available.
Workplace meetings, training videos, and corporate communications frequently lack captions, particularly in smaller organizations without dedicated accessibility staff.
Technical Considerations
Types of Captions
Closed captions can be turned on or off by viewers and are the standard for most video content.
Open captions are embedded in the video and cannot be turned off, ensuring visibility but reducing flexibility.
Real-time captions (CART—Communication Access Realtime Translation) provide live captioning for events, meetings, or broadcasts, typically generated by trained stenographers or voice writers.
Automated captions use speech recognition technology to generate captions without human intervention, with quality depending on audio clarity, speaker characteristics, and vocabulary.
Quality Standards
Effective captions must be accurate (reflecting what is actually said), synchronous (appearing when words are spoken), complete (including all dialogue and relevant sounds), and readable (using appropriate line breaks, timing, and positioning).
Automated captions often fail on these dimensions, particularly accuracy. Error rates increase with background noise, multiple speakers, accents, technical terminology, and speakers with speech differences. For critical content—legal proceedings, medical information, emergency communications—automated captions may be legally and ethically inadequate.
Transcripts
Transcripts provide a text version of audio or video content, enabling searching, scanning, and engagement without watching video. They are particularly valuable for long-form content and for people who prefer reading to watching. Interactive transcripts that link text to corresponding video moments offer additional functionality.
Economic and Resource Considerations
Professional human captioning typically costs $1-3 per minute of content, making it expensive for high-volume content producers. Real-time captioning costs $100-250 per hour or more. These costs, while justified for accessibility, can be prohibitive for small organizations, independent creators, and cash-strapped public institutions.
Automated captioning has dramatically reduced costs and increased volume, but quality tradeoffs mean that automation alone does not constitute adequate accessibility for all contexts.
A middle approach—automated captioning with human review and correction—balances cost and quality but still requires resources many content producers lack.
Policy and Cultural Questions
Should captioning be legally required for all public-facing video content? Current requirements apply mainly to broadcasting and government, leaving most online content unregulated. Platform accountability—requiring platforms to provide quality captioning tools and encourage their use—represents one policy approach.
How should quality be balanced against availability? Perfect captions that take days to produce may be less useful than imperfect captions available immediately. Different contexts may warrant different standards.
How can captioning become a cultural norm rather than an accessibility accommodation? When creators caption content by default, rather than only when requested, inclusion is built into practice rather than requiring advocacy.
The Question
If video and audio have become primary ways that information, entertainment, and civic discourse are shared, then exclusion from this content is exclusion from public life. How should Canada ensure that captions and transcripts are available for content that matters—government communications, educational materials, news, and civic information—while navigating the costs and logistics of captioning the vast volume of content created daily? Should platforms bear responsibility for caption availability and quality? And how can captioning shift from accommodation to expectation?