Speech to Text
September 2020 | rk
The new audio export function in Yasla Pro/Lab 3.4 has a bit of an unusual background: During a conversation with users, the question arose whether it would be possible to integrate speech recognition into Yasla. The background was to make translations into spoken language available to non-hearing lecturers for correction and the like. Time for some research and tests.
Offline vs. Online
The available speech recognition systems fall roughly into two camps: systems where recognition takes place on the local computer (‘offline’ or ‘on device’), and those where the speech recordings are sent to the servers of a service provider (‘online’).
Online systems are currently the state of the art; the large amounts of data collected for training purposes are clearly reflected in the recognition performance. The results of the offline systems are not really usable in my opinion; even with professional audio recordings, words are missing, many words are mistranslated etc.
The online systems are qualitatively better; whether they are sufficiently good to be useful can be doubted, however. But they are problematic from other angles:
all online systems have limitations. Apple’s system only allows audio data of one minute maximum duration. Google’s system is only free of charge for small amounts of audio data, regular use and/or larger audio files can result in significant costs.
the translation takes time, there are noticeable delays. Depending on the system, the recognition is faster than in real time, but not by orders of magnitude: A ten-minute recording, for example, usually takes several minutes to recognize
for use in teaching, the legal situation can be difficult: The transmission of voice recordings of third parties to mostly US-American companies is - with good reason - a no-go in terms of data protection.
In addition, there is a structural problem: since it is a matter of translating a translation, it is not necessarily easy to find out who made a mistake: The student in the first translation, or the speech recognition system in the second?
Even if the technology does not yet seem ready for a practical integration into Yasla, the idea remains interesting.
With the new function for direct export of audio files in Yasla Pro/Lab 3.4, it is now easier to try out speech recognition systems and services yourself. Happy experimenting!