We all know that communication is more than just what you say. How you say it is often just as important. That's why Google's latest prototype AI translator does not translate oral words, but translates the tone and cadence of your voice.
This system is called Translatotron, and Google researchers explain in detail how this system works. Recent blog posts They will not say that Translatotron will soon release commercial products, but it will probably happen over time. As Google's translator described earlier in The Verge earlier this year, the company's goal is to make more realistic speeches with more nuances in translation tools.
You can hear this sound in the audio sample below. The first clip is an input. The second is the basic translation.
|Translatotron translation using variants|
Sounds impressive, though not smooth translation. You can hear more audio samples from Translatotron.
Translator's appeal to AI engineers is best exemplified by speaker speech, but it converts audio directly from audio input to audio output without converting it to plain mediated text .
This kind of AI model is known as an end-to-end system because there is no interruption to the secondary task or task. We say that we can quickly produce translation results, avoiding errors in multiple translation steps.
Perhaps interestingly, the data the model processes is not native audio. Instead, use spectrogram data or detailed visualization of the sound. Essentially, this means translating languages from one language to another using improvisational pictures.
As Google's translation effort does, there is a skepticism about how such a system works naturally. The company often publishes ambitious new language and translation tools and often performs less fluidly than we would like. Still: The future is advancing and artificial intelligence translation is getting better.