Speech to Text

September 2020 | rk

The new audio export function in Yasla Pro/Lab 3.4 has a bit of an unusual background: During a conversation with users, the question arose whether it would be possible to integrate speech recognition into Yasla. The background was to make translations into spoken language available to non-hearing lecturers for correction and the like. Time for some research and tests.

Offline vs. Online

The available speech recognition systems fall roughly into two camps: systems where recognition takes place on the local computer (‘offline’ or ‘on device’), and those where the speech recordings are sent to the servers of a service provider (‘online’).

Online systems are currently the state of the art; the large amounts of data collected for training purposes are clearly reflected in the recognition performance. The results of the offline systems are not really usable in my opinion; even with professional audio recordings, words are missing, many words are mistranslated etc.

The online systems are qualitatively better; whether they are sufficiently good to be useful can be doubted, however. But they are problematic from other angles:

In addition, there is a structural problem: since it is a matter of translating a translation, it is not necessarily easy to find out who made a mistake: The student in the first translation, or the speech recognition system in the second?

Conclusion

Even if the technology does not yet seem ready for a practical integration into Yasla, the idea remains interesting.
With the new function for direct export of audio files in Yasla Pro/Lab 3.4, it is now easier to try out speech recognition systems and services yourself. Happy experimenting!

back