Files formats for automatic transcription
Automatic transcription: Discover all the files formats downloadable and their uses
As a reminder, every sectors can be concerned by an automatic transcription service. This is why, a good comprehension of files formats is essential. Automatic transcription is the Speech to Text conversion. Input files, or “Speech” files have been seen, we offer you now to review the different output files formats, or "Text" files and their uses.
Different files formats for different uses
Different files formats for different uses
The online Authôt text editor will enable you to put in form your text (underline, bold, italics, align, do paragraphs etc.). Thus, it is possible for you to work your text directly online and export it in .docx. This file format “doc”, unlike to the “text” allows you to keep the layout and can be open with Word. So this file format is a real time saving!
With these text formats you have your transcription in a raw version, or in a reworked version thanks to the online editor.
This is particularly relevant if you are a company or an Institution which needs meeting or conference transcriptions. Names of the different speakers can be added if necessary.
- the text to display
-
for every line of text, the start and end timing
Example of a file exported from the Authôt application in .srt format
- there is space for optional header data between the first line and the first cue
- timecode fractional values are separated by a full stop instead of a comma
- timecode hours are optional
-
cue settings allow the customization of cue positioning on the video
Authôt also offers: synchronization, burned subtitles, and translation. Thus, subtitles can be done in every languages.
Subtitles files formats are also increasingly used by Education with MOOC’s, and Public Institutions in order to meet digital accessibility standards (February 2005 law).
It is therefore important to respect several rules if you wish to have a reliable and accessible subtitled video. For example, in Teletext, which is used to display subtitles on some broadcast platforms, line length is limited to 37 fixed-width (monospaced) characters, since at least 3 of the 40 available bytes are used for control codes. Guidelines for both platforms are summarised in the table below.
Transcription allows you to subtitle your video file, thanks to .srt and .webvtt formats.
Example of a file exported from the Authôt application in timcode format
On Authôt application you can have your transcription in a timecode .text format. That allows you to open and read the timecode of your file.
Timecode files are extremely valuable notably for audiovisual sectors which edited movies. The automatic transcription and the export in .text are real time savings (interview here). They have their scripts and video rushes.
This format is available on Authôt application and allows you to add audio and text synchronisation for your podcasts for example. This article (here) will give you some tips about the use of .html files formats.
Opposite to the html language, the xml permits to create its own tags, they are entirely customizable. Thus, the xml language is less rigid. Download in .xml format allows you to sort out and format easily your data.
For example with Authôt, your data sort out can be done via the tags “words” which contains an attribute “start” and an attribute “end”. Thanks to the XML, you can create an customizable display, create subtitles etc.
Example of a file exported from the Authôt application in .sjson format
Sjson meets very specific requests, but is available on our application.
Screenshot of Authôt application
A long list of files formats is available on app.authôt.com, in order to meet every specific needs!
You want another format? Please contact us 🙂
Authôt. You speak. We write.
Sources:
Vimeo
BBC.github
Sciencepo
Blog Authôt