Adding closed captions to your livestream through speech-to-text

Clevercast allows you to add very accurate closed captions to your livestream through automatic speech-to-text conversion. Alternatively, you can use Clevercast to generate closed captions for your live stream through human transcription.

Optionally, the closed captions can be corrected by a human editor before they are shown in the live stream.

Closed captions resulting from speech-to-text conversion can be automatically translated into other languages. They can also be combined with audio translations through remote simultaneous interpretation in the same live stream.

Delay of a live stream with speech-to-text

It is important to know that a live stream with AI-generated captions currently has an exceptional delay of about 2 minutes. This means that the footage you broadcast will appear two minutes later in the (embedded) live stream player, where your viewers can view it along with the closed captions.

This is necessary because Automatic Speech Recognition (ASR) models (currently) still need sufficient speech context to generate accurate captions, and because we allow for 30 seconds of correction time by a human editor. Therefore, the live stream needs to be delayed so the closed captions remain in sync with the video.

The live stream is delivered using the standard HTTP Live Streaming (HLS) protocol. Clevercast uses adaptive streaming to guarantee smooth streaming in the best possible quality on any device at any time. Using a third-party player is possible, but you would lose player features like event status switching, player countdown and messages, failover..

Creating a live stream with (multilingual) closed captions via speech-to-text

Setting up a live stream with speech-to-text captions is easy. First, create a Multilingual Live event by selecting the language spoken in the livestream and choosing your preferred broadcast protocol (RTMP or SRT). Next, add the speech-to-text language to the event and, optionally, extra languages for automatic translation.

Once this is done, just copy the video player’s embed code to your site or platform. Viewers will initially see the poster image or message that you have set for the event.

When you want to start streaming, set the event in Clevercast to preview and start broadcasting. Given the delay, it will take some time for the stream to appear in the preview player. When the stream appears, you can use the CC menu in the preview player to see the closed captions.

To make the livestream available to your viewers in the embedded player, follow our basic event flow.

Press the Start Event button (at least) 2 minutes before you start broadcasting the live action (if necessary, broadcast an intro loop before the live action starts)
After the last frame of live action is broadcasted, preferably wait (at least) 4 minutes before stopping or pausing the event in Clevercast. After about 2 minutes, the last frame will appear in the preview player and in the embedded player of most of your viewers. But since iOS devices allow HLS latency to grow to 2 minutes, you should wait an extra 2 minutes.

For more info, see our Closed Caption Event Management manual.

Editing the closed captions from speech-to-text conversion

Clevercast allows you to edit the closed captions before they are shown in the video player (or auto-translated into other languages). This is an optional feature. If you turn it off, the delay of the live stream will be 30 seconds shorter.

The interface to edit the captions is called a Correction Room. You need a browser with WebRTC capabilities (eg Firefox, Chrome) to watch the incoming broadcast while editing the closed captions (displayed in boxes that are in sync with the video). Once a caption box appears, you have about 30 seconds to make corrections. After 30 seconds, the caption-box is grayed out and the caption is added to the live stream.

For more info, see our Closed Caption Correction manual.

Watching the live stream with closed captions

Clevercast player lets your viewers select the closed captions for their preferred language and change the selection at any time. You can embed Clevercast player on your site or on a third-party platform.

Clevercast uses global CDNs (currently Akamai) for livestream delivery, so viewers anywhere in the world receive the stream from a local server. The number of viewers is unlimited. It’s a regular live stream, so all other Clevercast and Clevercast Player features apply (see our FAQ for more info).

Server-side recording

Clevercast automatically records the closed captions, which are the result of transcription, as WebVTT files, together with the livestream. The .vtt files for each transcribed language can be downloaded along with the MP4 file.

If your plan includes support for Video on-Demand hosting, you can convert a recording to a VoD video with closed captions. The player for this VoD item can be embedded on your site or a 3rd party platform. Viewers are able to select the closed captions in the embedded player, in the same way as for the live stream.

Adding closed captions to your live stream through speech-to-text with correction

Delay of a live stream with speech-to-text

Creating a live stream with (multilingual) closed captions via speech-to-text

Editing the closed captions from speech-to-text conversion

Watching the live stream with closed captions

Server-side recording

Recent Posts

Recent Comments

Resources

Company

Contact