AI Speech Recognition for "Smaller" Languages
English

AI Speech Recognition for "Smaller" Languages

by

I often practice languages with ChatGPT using voice messages. I've noticed that while it works well for English, German, Spanish, and Italian, it absolutely doesn't understand my Dutch. What it recognizes and transcribes is often gibberish.

Of course, part of the problem is my pronunciation. I'm currently at an A1/A2 level, and it's still quite hard for me to produce certain Dutch sounds. But I also asked ChatGPT and Perplexity whether there are general issues with automatic speech recognition for Dutch, and both of them suggested that the reason might be a relatively small training dataset for Dutch in general, and especially for non-native speech. As a result, these models perform much better with native speakers. They're simply not as used to hearing foreign accents in Dutch and may "decide" that it's actually another language.

At the same time, when I use a specialized AI tool for language learning, like LanguaTalk, I barely have these issues. Maybe it's because I'm supposed to explicitly choose a language and its variety (Dutch from the Netherlands) in the settings, so that the model is forced to try and detect Dutch even if it sounds strange :)

Have you had similar experiences with Dutch or any other "smaller" language?

P.S. This struggle made me finally book a lesson with a Dutch tutor 🙂

3