Live captions through speech-to-text conversion

Add multilingual closed captions to your live stream via Automatic Speech Recognition (ASR)

Our live captioning platform lets you make live streams accessible to all audiences. Clevercast’s automated captioning solution is easy to use, reliable and efficient. Its unique speech-to-text technology, supporting just-in-time correction, ensures that captions are more accurate and readable than any other solution on the market.

Unique speech-to-text technology

Our technology for auto-generating closed captions leverages the latency that comes with the HTTP Live Streaming protocol. By slightly increasing it, we achieve a triple goal:

  • Complete sentences can be sent to the ASR engine that converts speech to text. This allows the ASR engine to better interpret the words and construct correct phrases and sentences. This significantly increases the accuracy of the conversion.
  • A human editor can make just-in-time corrections to the automatically generated captions, before they are translated and shown in the live stream.
  • Captions can be added intelligently to the live stream, letting our player render (partial) sentences. This makes the captions easier to read and understand.

Best-in-class solution

Clevercast introduces a two-minute delay to ensure that the ASR engine has the full speech context at its disposal, resulting in a better interpretation of words and sentences. Therefore, your broadcast will appear about two minutes later in the embedded player.

Without this delay, the ASR engine would have to rely on single words or very short phrases without context, leading to a lot of errors and short, incomprehensible captions.

In addition, Clevercast lets you increase accuracy by boosting the ASR engine’s recognition of certain words and phrases, like names, abbreviations and technical terms. This can be done before and during the livestream.

Just-in-time corrections for accuracy

The accuracy of speech-to-text conversion depends on a number of factors, like the clarity of the audio (e.g. volume, background noises) and the speaker (e.g. articulation, speed, accent, dialect).

To improve the accuracy, Clevercast allows a human editor to make just-in-time corrections to the captions, before they are translated or shown in the live stream.

Even though this feature is optional, we strongly recommend using it. Making corrections is a fairly simple task that doesn’t require training. Our intuitive interface allows anyone to edit the captions in a browser with mouse and keyboard. Even a limited number of corrections (e.g. names, acronyms, industry-specific terminology) can greatly increase the quality of the captions. 

Intelligent caption rendering in the player

Because of the two-minute delay, Clevercast can add the captions to the live stream in an intelligent manner. This allows Clevercast player to show (partial) sentences, rather than separate words. This makes the closed captions easier to read and understand.

Viewers, anywhere in the world, can watch the live stream and select their preferred caption language in our video player. Our customizable HTML5 player can be easily embedded into any device and platform. Just copy the embed code from Clevercast.

Alternatively, you can choose to display the rolling text in a separate widget. This widget also allows your viewers to change their preferred language.

Automatic translation to other caption languages

Clevercast can automatically translate closed captions in real time and make the additional caption languages available in the live stream.

The accuracy of the translations largely depends on the quality of the source. In this case, the captions after just-in-time correction are used as the source. This adds to the importance of having a human editor for your event.

For an event with a lot of languages, we currently recommend using a professional captioner for the initial language, instead of speech-to-text conversion. Although the accuracy of ASR engines is constantly improving, the quality of automatic speech-to-text conversion is still significantly lower than that of professional captioning.

Cloud recording and Video on-Demand

Clevercast makes a cloud recording of the multilingual live stream, which can be downloaded. All caption languages can be downloaded as WebVTT files. This allows you to upload them to YouTube or social media channels for on-demand viewing.

You can also convert the cloud recording of your live stream to Video on-Demand (VoD). Our VoD player with all closed captions can be added to your site or platform by copying the embed code for your event.

