Date: July 16, 2024
Microsoft has developed an Advanced Artificial Intelligence Speech Generator that it feels might be too dangerous to place in the public’s hands.
Deepfake is an AI innovation that terrorized the world recently for over a year. While the technology was limited to creating video replicas of people that seem almost indistinguishable, now deepfake voices have emerged to add to the nightmare. What would you do if someone on the internet started using your voice to share offensive content? Mimicking is an act of impersonating someone’s unique voice without making it noticeable.
Microsoft has recently mastered the art of text-to-speech, marking a significant leap in the field of AI development. However, this may not be in the public’s best interest. The AI tool VALL-E 2 has achieved a marvelous feat in generating lifelike human speech that is almost impossible to recognize as generated.
What makes the AI so believable is the Repetition Aware Sampling, which ensures that the AI does not end up in a loop of monotonous speech through similar pronunciation. By addressing repetitions of tokens, the AI has developed, in a way, its own units of words and syllables.
Another advanced feature of VALL-E 2 is Grouped Code Modeling, which allows the AI to reduce the process sequence length. In simple words, the AI tool has formed smaller thoughts for each speech or sentence. This helps the tool process conversations faster and keep them as separate records, just like an action and reaction.
According to individual researchers who gained access to the AI tool, it is the first AI text-to-speech generator to have achieved such robustness, naturalness, and similarity with the human speaker. This breakthrough can be useful in many aspects, but it can also create chaos quite seamlessly.
"VALL-E 2 is purely a research project. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public," said the researchers at Microsoft.
While the world can benefit from a wide range of applications in the education and entertainment sector, giving the general public access to tech can be extremely dangerous. Microsoft has decided to restrict the AI tool to public use, aiming to prevent misleading, fake, and scandalous content. Its advanced voice cloning capability can bypass many security systems that identify fake voices and masked actors.
By Arpit Dubey
Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. With a knack for crafting compelling narratives, Arpit has a sharp specialization in everything: from Predictive Analytics to Game Development, along with artificial intelligence (AI), Cloud Computing, IoT, and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician's mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.
OpenAI Is Building an Audio-First AI Model And It Wants to Put It in Your Pocket
New real-time audio model targeted for Q1 2026 alongside consumer device ambitions.
Nvidia in Advanced Talks to Acquire Israel's AI21 Labs for Up to $3 Billion
Deal would mark chipmaker's fourth major Israeli acquisition and signal shifting dynamics in enterprise AI.
Nvidia Finalizes $5 Billion Stake in Intel after FTC approval
The deal marks a significant lifeline for Intel and signals a new era of collaboration between two of America's most powerful chipmakers.
Manus Changed How AI Agents Work. Now It's Coming to 3 Billion Meta Users
The social media giant's purchase of the Singapore-based firm marks its third-largest acquisition ever, as the race for AI dominance intensifies.