Audio and video recording for automatic transcription


recording formats

Automatic transcription : Know which recording formats are compatible

Before beginning our article, it seems essential to explain in a synthetical way what automatic transcription is. An automatic speech-to-text system is basically a collection of computer codes designed to convert a speech recording into its word-for-word text version.
If you are interested in this topic, we invite you to read this article here
For saving time, a lot of Companies, Institutions, Medias, Universities and School use an automatic transcription system. So, which are the files compatible for a transcription? Just audio files?

Audio recording formats for transcription

Transcription is a simple process: Convert Speech to Text. When we say ‘transcription’, that's'audio file'which comes to mind first. As we will see later on, recording quality is essential in order to obtain a reliable transcription.

Audio file formats are very numerous, some of them are a proprietary technology such as the .wma file or Waveform Audio File Format, developed by Microsoft.
Others are free and open as the .wav file or Waveform Audio File Format is a Microsoft and IBM audio file format standard for storing an audio bitstream on PCs. This format is compatible with Windows, Macintosh, and Linux operating systems.
WAV files can also be edited and manipulated with relative ease using software.
If you have
these extensions in your audio recordings library, or if you have unusual formats as .3GPA, don't panic! They will be accepted by the transcription system, provided that the recording quality is good, as we’ll see later.

For example, on the online platform Authôt, you just have to send your audio format and the system will convert it automatically in .MP3 or .OGG. The conversion in .OGG is needed in order to be compatible with browsers which don’t accept the mp3. For more informations, we invite you to consult the list of formats which are accepted on our platform, these are all present in the library FFMPEG (PDF)

Video recording formats for transcription

Audio files are not the only ones to be concerned by the transcription! Video files are too, and in this way it’s the video’s sound which is transcribed.
numeric video, is a file which contains images, sound and text (metadata) placed in a container. In this container, images, sound and text are compressed. Compression and decompression of these files are realised by codec.

A container is a file format which contain the audio and video stream, the codec information and the metadata.

A codec (acronym for coding/decoding) is a compression / decompression algorithm of an numeric audiovisual signal.

video recording

Such as audio files, there is a multitude of different video files formats. For example, the .MOV, file extension compatible with QuickTime or, the .VQF which, in the line of the .MP3, allows a compression more important with a better quality.
Similar to audio files, every format are accepted by the transcription system. They will be automatically converted in
.MP4, compatible with the several browsers.

For general information, on our transcription application, on average each month since the beginning of the year, our users send:

  • audio files: 36%
  • video files: 64 %

Video files are then, more and more important due to the development of a broad range of new services: captioning, synchronization, buning, translation etc.

Video is unquestionably the great winner on the web over the past few years!

Recording quality, essential for a good transcription!

The result of the transcription depend mainly on recording quality (our system has 95% accuracy with high quality audio/ video records).
Today, the majority of recorders are numeric. Their advantages: 
a compatibility with computers via an USB connection, a large memory capacity and a long autonomy. 

nagra recording
All microphones do not capture sounds in the same manner, some microphones are designed to capture sounds coming from one single direction, and others are sensitive in all directions of space. As a consequence, the choice of the material is essential in order to have a good recording rendering, in particular if you are recording in a slightly noisy environment.


In general, you are highly advised to record in a quiet place without surrounding noises, indeed, microphones do not make differences between these noises and voice. Thus, they will highly reduce the transcription quality. For example, a recording made in a restaurant will not have many chances to have a good quality because of the numerous external noises.
recording in a car with closed windows, regular speed, and without radio is a possible solution. Indeed, the overlap between car noise frequencies and voice frequencies are low and do not prevent from a good transcription.

To improve the quality of your recording, instead of speaking loudly, try to get closer from the microphone. Be careful, however, being closer from the microphone also increases the risk of saturating the output signal.
Elocution is also a major element concerning the recording, it should not be too fast. Tone of voice must be regular. Strong accents also impact on transcription quality.

meeting recording

Recordings can be made during meeting, in this case, it is recommended to place at least two microphones at the end of the table and connected with a recorder in order to capture everybody’s voices.
Each person can also wear a
lapel microphone, connected with a recorder. Other solution, the room can be equipped of a conference octopus. During meetings, you must avoid speakers interrupt themselves or speak at once. This strongly alters the transcription.


Thus, for a reliable transcription, a good recording is necessary! Authôt technology offers a multi-speakers and automatic transcription system, ideal for one-to-one conversation and meeting.
Save time sending your audio or video recording files at the format of your choice on app.authô and obtain your transcription
in one click!
The next article will inform you in details on the different formats available for the export on the app, once the transcription is done.
Stay tuned!


Authôt. You speak. We write.