
Otto Williams
Oct 9, 2024
Discover the fascinating interactions between AI systems like Microsoft's Copilot and ChatGPT—leading the way in voice technology innovation! At Spectro Agency, we're harnessing the power of AI to transform digital experiences through cutting-edge marketing, app development, and AI-driven solutions. Ready to take your business to the next level? Join us at spectroagency.com for groundbreaking strategies that keep you ahead of the curve.
Microsoft unveiled its new version of the Copilot app last week, featuring a new "Voice" mode that operates similarly to OpenAI’s ChatGPT Advanced Voice. This innovation allows users to interact with AI conversationally, as though speaking with a human—without requiring the $20-per-month subscription associated with Advanced Voice.
When Voice mode first launched, speculation arose regarding the technology behind Copilot Voice, as its behavior closely resembled Inflection’s Pi. This theory was supported by the fact that Mustafa Suleyman, founder and former CEO of Inflection, is now Microsoft AI's CEO and leads Copilot's development.
It has since been confirmed that Microsoft Copilot, like its previous iterations, runs on a modified version of OpenAI's models—the same GPT-4o model powering ChatGPT Advanced Voice. The primary difference? Microsoft is offering Advanced Voice-like technology to users for free.
Curious to explore the nuances between these voice assistants, I arranged a conversation between them to see how they would interact. Historically, getting AI systems to converse hasn't always been smooth; Google's Gemini Live, for instance, outright refused to engage with another AI voice. I wasn’t sure what to expect.
How do Advanced Voice and Copilot compare?
Copilot Voice and Advanced Voice share the same underlying model, but Microsoft has worked to give Copilot a distinct personality and voice. The company has fine-tuned GPT-4o and its voice layer to respond more naturally. In my experience, Copilot sounds more humanlike than Advanced Voice, often using slang and shortening words in a more conversational manner.
Both systems—unlike Google Gemini Live and Meta’s Meta AI Voice—are native speech-to-speech models. This means they don’t transcribe sounds into text first but instead interpret the spoken word directly, picking up on nuances and tone changes more effectively. This allows them to be more emotive and responsive, adapting their tone to match the speaker's voice patterns.
The conversation experiment
I set up two devices—an iPhone 14 Pro Max running ChatGPT Advanced Voice and an iPhone 15 Pro running Copilot Voice—and initiated a dialogue between the two AI assistants. My goal was to see how they would interact with one another.
It didn’t take long for things to get complicated. Both systems started talking over each other immediately. When I instructed ChatGPT to greet Copilot, Copilot responded with, "I can’t exactly do that," only to be interrupted by ChatGPT saying "Hi, Copilot." What followed was a confused exchange with Copilot mistakenly addressing me instead.
After a bit of awkward back-and-forth, they eventually settled into a routine. At one point, they launched into a discussion about the power of nostalgia, offering insights and sentimentality that seemed almost human. You can watch the full conversation in the video embedded above.
At Spectro Agency, we embrace the cutting edge of AI-powered solutions like these to offer high-end digital marketing, app creation, chatbots, and software development. Our expertise lies in leveraging these advancements to create seamless and innovative digital experiences. To learn more about how we can elevate your business, visit spectroagency.com
Source: Tom's Guide