Adding closed captions to your livestream through speech-to-text with correction

27 May 2022

Clevercast allows you to add closed captions to your livestream through automatic speech-to-text conversion. Alternatively, you can use Clevercast to generate closed captions for your live stream through human transcription.

The closed captions can be corrected by a human editor before they are shown in the livestream (or translated into other languages).

Closed captions resulting from speech-to-text conversion can be automatically translated into other languages. They can also be combined with audio translations through remote simultaneous interpretation in the same livestream.

 

Delay of a livestream with speech-to-text

It is important to know that a livestream with speech-to-text captions has an exceptional delay of about 2 minutes. This means that the footage you broadcast will appear two minutes later in the (embedded) livestream player, where your viewers can view it along with the closed captions.

This is necessary because speech-to-text services (currently) still need a relatively long time to generate accurate closed captions, and because we allow for 30 seconds of correction time by a human editor. Therefore, the livestream needs to be delayed so the closed captions remain in sync with the video.

The livestream is delivered using the standard HTTP Live Streaming (HLS) protocol. Clevercast uses adaptive streaming to guarantee smooth streaming in the best possible quality on any device at any time. Using a third-party player is possible, but you would lose player features like event status switching, player countdown and messages, failover..

 

Creating a livestream with (multilingual) closed captions via speech-to-text

Setting up a livestream with speech-to-text captions is easy. First, create a Multilingual Live event by selecting the language spoken in the livestream and choosing your preferred broadcast protocol (RTMP or SRT). Next, add the speech-to-text language to the event and, optionally, extra languages for automatic translation. You can also specify words and phrases that are likely to be used by the speakers (eg. names, technical words and phases…), to increase the accuracy of the speech-to-text conversion.

Once this is done, you can copy the video player’s embed code to your site or platform. Viewers will initially see the poster image or message that you have set for the event.

When you want to start streaming, set the event in Clevercast to preview and start broadcasting. Given the delay, it will take a while for the stream to appear in the preview player. When the stream appears, you can use the CC menu in the preview player to see the closed captions.

To make the livestream available to your viewers in the embedded player, follow our basic event flow.

  • Press the Start Event button (at least) 2 minutes before you start broadcasting the live action (if necessary, broadcast an intro loop before the live action starts)
  • After the last frame of live action is broadcasted, preferably wait (at least) 4 minutes before stopping or pausing the event in Clevercast. After about 2 minutes, the last frame will appear in the preview player and in the embedded player of most of your viewers. But since iOS devices allow HLS latency to grow to 2 minutes, you should wait an extra 2 minutes.

For more info, see our Closed Caption Event Management manual.

 

Editing the closed captions from speech-to-text conversion

Clevercast allows you to edit the closed captions before they are shown in the video player (or auto-translated into other languages). You are not required to do this. If you don’t, the closed captions will simply appear in the video player without being corrected.

The interface is called a Correction Room. You need a browser with WebRTC capabilities (eg Firefox, Chrome) to watch the incoming broadcast while editing the closed captions (displayed in boxes that are in sync with the video). Once a caption box appears, you have about 30 seconds to make corrections. After 30 seconds, the caption-box is grayed out and the text is used as a closed caption.

After making a correction, you can also sent it to the speech-to-text service by pressing a ‘Boost‘ button. This increases the probability that your correction will be recognized over other similar sounding words or phrases.

For more info, see our Closed Caption Correction manual.

 

Watching the live stream with closed captions

Clevercast player lets your viewers select the closed captions for their preferred language and change the selection at any time. You can embed Clevercast player on your site or on a third-party platform.

Clevercast uses global CDNs (currently Akamai) for livestream delivery, so viewers anywhere in the world receive the stream from a local server. The number of viewers is unlimited. It’s a regular live stream, so all other Clevercast and Clevercast Player features apply (see our FAQ for more info).

 

Server-side recording

Clevercast automatically records the closed captions, which are the result of transcription, as WebVTT files, together with the livestream. The .vtt files for each transcribed language can be downloaded along with the MP4 file.

If your plan includes support for Video on-Demand hosting, you can convert a recording to a VoD video with closed captions. The player for this VoD item can be embedded on your site or a 3rd party platform. Viewers are able to select the closed captions in the embedded player, in the same way as for the live stream.