Icon of transcript near play button.A transcript is a non-synchronized text version of the audio material in a video or audio file.

Harvard’s Digital Accessibility Policy cites the Web Content Accessibility Guidelines (WCAG) 2.1 AA as a standard, which recommends that transcripts be provided for pre-recorded audio content. 

If you’re posting a podcast or audio recording on a Harvard website, you’ll need to include a transcript nearby the corresponding audio content, either by integrating the transcript on the same page as the audio, or linking to it.

Who benefits from transcripts?

Transcripts can provide equal access and effective communication for Deaf and hard of hearing people, which means that transcripts are essential in allowing full participation. But transcripts can also enhance the experience for non-native language speakers, people with learning disabilities, and anyone in a noisy environment.

Transcripts are also searchable, which can boost the Search Engine Optimization (SEO) of your content, and make it easier for listeners to find and return to relevant pieces of the audio.

How do I implement transcripts?

There are two primary approaches to producing pre-recorded transcripts. Select an approach that’s right for your project based on resources available, timeline, and workflow.

Professional vendor (recommended approach)

3Play Media (Harvard’s preferred vendor) or Rev provide transcription on a per minute fee-for-service basis. Using a professional vendor is highly recommended, as it will provide the greatest accuracy and quick turnaround time, with a minimal use of internal resources and effort. You might consider providing the vendor with a “cheat sheet” of speaker names, terminology, or other notes that might assist in producing accurate transcripts.

Auto-generated and manual editing

Transcripts generated by Artificial Intelligence (AI) provide variable degrees of accuracy. This approach works best when the multimedia is of a single person speaking clearly, and with limited noise interference in the background. Accuracy rates drop if there are multiple people with overlapping dialog, if there is background noise, or if specialized terminology is used. 

In order to ensure transcripts meet the accuracy standards of Harvard’s Policy, any transcripts produced with AI will need to be edited before the video is posted online. If you use an AI tool (such as YouTube,, Microsoft Stream, or IBM Watson), you can then make corrections with an editing tool (such as CADET or Amara) to ensure that the transcript is as accurate as possible. While this approach is low-cost, it requires more time spent on editing and proofreading for accuracy. Watch this video on how to edit captions in YouTube Studio from Harvard Library’s User Research Center.