Speech to Text

September 2020 | rk

The new audio export function in Yasla Pro/Lab 3.4 has a bit of an unusual background: During a conversation with users, the question arose whether it would be possible to integrate speech recognition into Yasla. The background was to make translations into spoken language available to non-hearing lecturers for correction and the like. Time for some research and tests.

Offline vs. Online

The available speech recognition systems fall roughly into two camps: systems where recognition takes place on the local computer (‘offline’ or ‘on device’), and those where the speech recordings are sent to the servers of a service provider (‘online’).

Online systems are currently the state of the art; the large amounts of data collected for training purposes are clearly reflected in the recognition performance. The results of the offline systems are not really usable in my opinion; even with professional audio recordings, words are missing, many words are mistranslated, etc.

Online systems are qualitatively better; whether they are sufficiently good to be useful can be doubted, however. But they are problematic from other angles:

In addition, there is a structural problem: since it is a matter of translating a translation, it is not necessarily easy to find out who made a mistake: The student in the first translation, or the speech recognition system in the second?

Conclusion

Even if the technology does not yet seem ready for practical integration into Yasla, the idea remains interesting.
With the new function for the direct export of audio files in Yasla Pro/Lab 3.4, it is now easier to try out speech recognition systems and services yourself. Happy experimenting!

back