VALL-E can mimic your voice from a three-second audio clip

A team of researchers over at Microsoft have published a paper about a new AI called VALL-E, which can mimic your voice from just a three-second sample. It has the ability to generate a realistic interpretation of your speech patterns. Not only can it mimic your voice, it can mimic your mannerisms and style as well.

Although many may find the new project to be interesting or fascinating, there are still plenty of fears and concerns surrounding this. Should this AI be accessible to many people, it could lead to more scams and prank calls, mimicking loved ones or worse.

If you want to see some of the samples, you can watch some on Microsoft’s github demo or you can watch the below video:

Researchers describe VALL-E as a “neural codec language model” that has been trained on “discrete codes derived from an off-the-shelf neural audio codec model.”

The researchers also say that the AI has been trained on 60,000 hours of speech, “which is hundreds of times larger than existing systems.”

VALL-E has the ability to “preserve the speaker’s emotion and acoustic environment” of prompt. It’s still not perfect at landing tone and emotion, but it could easily one day become much more advanced than we could ever anticipate.

The Future of AI is Strange

With all of these new AI systems like Chat-GPT, Midjourney, DALL-E, and more, there have been plenty of concerns raised by artists, writers, and programmers regarding the future of their job titles. If Chat-GPT can write out entire essays and Midjourney and other AI art generators can create hyper-realistic images, what’s there left for professionals in these fields to do?

Sure, the AI are not perfect, but they have largely eliminated the need for people to be very creative. In fact, Chat-GPT is now a concern at universities and educational institutions globally. Students have been using the AI system to essentially curate entire essays, basically needing to do almost no work at the end of the day.

With VALL-E being able to mimic your voice this easily, who knows what it could be used for next. It could replace voice acting pretty soon if it becomes even more advanced. Star Wars is already using AI to mimic James Earl Jones’ voice for Darth Vader, so what’s next for the entertainment industry?

It would certainly aid in the voice acting world, as projects may no longer need to be scrapped if a voice actor passes while the movie is in production. But the presence of such an AI begs questions regarding its ethicality and if it can be used to create harm.

We are certainly living in a strange time with all of this advanced technology and there’s no telling what advancements are coming upon us.


Hey, thanks for reading DZSH! Check out our newsletter too so you can always stay in the loop!

Zainah Yousef is the author of The Fallen Age Saga and specializes in gaming, social media advice, and reviews. She's been writing all her life and she probably won't stop anytime soon.