A new AI voice model has set the internet abuzz, with reactions oscillating between awe and unease. Sesame AI’s Conversational Speech Model (CSM) doesn’t just sound human—it feels human. Users describe extended, almost emotional interactions with the AI-generated voices, which exhibit breath sounds, hesitations, corrections, and even chuckles. For some, it’s a technological marvel. For others, it’s a glimpse into a future that feels uncomfortably close.
Sesame AI: A voice that feels aliveThe core innovation behind Sesame’s CSM lies in its ability to simulate natural, dynamic conversation. Unlike traditional text-to-speech systems that simply read aloud, CSM actively engages. It stumbles over words, corrects itself, and modulates tone in a way that mimics real human unpredictability.
When one tester spoke to the model for 28 minutes, they noted its ability to debate moral topics, reacting naturally to prompts like, “How do you decide what’s right or wrong?” Others found themselves unintentionally forming attachments, with one Reddit user admitting, “I’m almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.”
Sesame’s AI assistants, dubbed “Miles” and “Maya,” are designed not just for information retrieval but for deep, engaging conversations. The company describes its goal as achieving “voice presence”—the magical quality that makes spoken interactions feel real, understood, and valued.
That realism sometimes leads to oddly human quirks. In one viral demo, the AI casually mentioned craving a peanut butter and pickle sandwich—a bizarrely specific comment that only added to the illusion of personality.
Did you create your TikTok AI voice?
The tech behind the voiceSo how does Sesame’s CSM achieve such eerily lifelike conversations?
Blind tests have revealed that, in isolated speech samples, human evaluators couldn’t reliably distinguish Sesame’s AI voices from real ones. However, when placed in full conversational context, human speech still won out—suggesting AI has not yet mastered the full complexity of interactive dialogue.
A mixed receptionNot everyone is thrilled by how human this AI sounds.
Technology journalist Mark Hachman described his experience with the voice model as “deeply unsettling.” He compared it to talking with an old friend he hadn’t seen in years, noting that the AI’s voice bore an eerie resemblance to someone he had once dated.
Others have likened Sesame’s model to OpenAI’s Advanced Voice Mode for ChatGPT, with some preferring Sesame’s realism and willingness to roleplay in more dramatic or even angry scenarios—something OpenAI’s models tend to avoid.
One particularly striking demo showcased the AI arguing with a “boss” over an embezzlement scandal. The conversation was so dynamic that listeners struggled to determine which speaker was the human and which was the AI.
The risks of a perfect voiceAs with all AI breakthroughs, hyper-realistic voice synthesis brings both promise and peril.
While Sesame’s CSM does not clone real voices, the possibility of similar open-source projects emerging remains a concern. OpenAI has already delayed the wider release of its voice technology over fears of misuse.
What’s next?Sesame AI plans to open-source key components of its research under the Apache 2.0 license, allowing developers to build upon its work. The company’s roadmap includes:
For now, the demo remains available on Sesame’s website—though demand has already overwhelmed their servers at times. Whether you find it astonishing or unsettling, one thing is clear: the days of robotic, monotone AI voices are over.
From here on, you may never be quite sure who—or what—you’re talking to.
Featured image credit: Kerem Gülen/Imagen 3
All Rights Reserved. Copyright , Central Coast Communications, Inc.