Voice recognition and video processing

Voice recognition



Voice recognition and text recognition, two complementary technologies

After defined text recognition with the OCR technology, in the previous article , let’s answer the question left open:

Is it possible to combine text recognition in a video to the voice recognition?


In order to satisfy its customers and offer an ever-more efficient solution, Authôt presented, on December 8, 2015, a new functionality of its automatic transcription application: OCR technology and voice recognition technology combined.

They are plenty of advantages of this combination of technologies, and they allow a video processing more complete and efficient.

Names may be misspelled by the automatic transcription, based on voice recognition technology. The OCR will identify different words, integrated in the video or in the picture, and automatically correct the misspelled words in the transcription. In this way, the automatic transcription is more reliable.

Besides spelling, OCR technology combined to voice recognition gives other functionalities:

  • To have the script/ transcription

You just made a video recording or you have a video of a conference then with these technologies you have the opportunity to obtain automatic transcription of the content, so the exact script.

  • To put subtitle in your videos (with or without translation)

From an online course or a recorded interview, these technologies offer you the opportunity to add and display subtitles in the speaker language or translate in the language of your choice. Subtitle has become essential for digital accessibility of every multimedia content and also for visibility increase.

  • To search for precise terms in the video

Text recognition and voice recognition, through advanced research, show you if required terms are in the video and if they are, how many times they appear and when? This tool becomes increasingly useful for a time saver and a keyword search.

  • Chaptering done automatically by slides or topics

For a better navigation in the video, the OCR will allow to do a content chaptering by every picture change, by covered topics, by slides and with voice recognition, a chaptering by change of speaker.

These combined technologies significantly increase treatment and broadcasting possibilities of a video. It is easier to make connections between videos with keywords or common ground!


Authôt: You speak. We write.