Fred, a trans man, clicked his mouse, and his tenorful tones suddenly sank deeper. He’d switched on voice-changing algorithms that provided what sounded like an instant vocal cord transplant. “This one is ‘Seth,’” he said, of a persona he was testing on a Zoom call with a reporter. Then, he switched to speak as “Joe,” whose voice was more nasal and upbeat.
Fred’s friend Jane, a trans woman also testing the prototype software, chuckled and showcased some artificial voices she liked for their feminine sound. “This one is ‘Courtney’”—bright and upbeat. “Here’s ‘Maya’”—higher pitched, sometimes by too much. “This is ‘Alicia,’ the one I find has the most vocal variance,” she concluded more mellowly. The glitches were slight enough to prompt the fleeting thought that the pair may not have joined the call with their “real” voices to begin with.
Fred and Jane are early testers of technology from startup Modulate that could add new fun, protections, and complications to online socializing. WIRED is not using their real names to protect their privacy; trans people are often targeted by online harassment. The software is the latest example of the tricky potential of artificial intelligence technology that can synthesize real-seeming video or audio, sometimes termed deepfakes.
Modulate’s cofounders Mike Pappas and Carter Huffman initially thought the technology they term “voice skins” could make gaming more fun by letting players take on characters’ voices. As the pair pitched studios and recruited early testers, they also heard a chorus of interest in using voice skins as a privacy shield. More than 100 people asked if the technology could ease the dysphoria caused by a mismatch between their voice and gender identities.
“We realized many people don’t feel they can participate in online communities because their voice puts them at greater risk,” Pappas, Modulate’s CEO, says. The company is now working with game companies to provide voice skins in ways that offer both fun and privacy options, while also pledging to prevent them becoming a tool of fraud or harassment themselves.
Games such as Fortnite and social apps like Discord have made it common to join voice chats with strangers on the internet. As with the early days of texting via the internet, the voice boom has unlocked both new delights and horrors.
“Many people don’t feel they can participate in online communities because their voice puts them at greater risk.”
Mike Pappas, CEO, Modulate
The Anti-Defamation League found last year that almost half of gamers had experienced harassment via voice chat while playing, more than via text. A sexist streak in gaming culture causes women and LGBTQ people to be singled out for special abuse. When Riot Games launched team-based shooter Valorant in 2020, executive producer Anna Donlon said she was stunned to see a culture of sexist harassment quickly spring up. “I do not use voice chat if I’m going in alone,” she told WIRED.
Modulate’s technology is not yet widely available, but Pappas says he is in talks with game companies interested in deploying it. One possible approach is to create modes within a game or community where everyone is assigned a voice skin to match their character, whether a gruff troll or knight in armor; alternatively, voices could be assigned randomly.
In June two of Modulate’s voices launched inside a preview of an app called Animaze, which transforms a user into a digital avatar in livestreams or video calls. The developer, Holotech Studios, markets the voices as both a privacy feature and way to “morph your voice to better fit a character with different age, gender, or body type than your own.” Modulate also offers game companies software that automatically notifies moderators of signs of abuse in voice chats.
Modulate’s voice skins are powered by machine learning algorithms that adjust the audio patterns of a person’s voice to make them sound like someone else. To teach its technology to voice many different tones and timbres, the company collected and analyzed audio from hundreds of actors reading scripts crafted to provide a wide range of intonation and emotion. Individual voice skins are created by tuning algorithms to replicate the sound of a specific voice actor.