AI Ventriloquism: Send in the clones [Unfair use? series, Part 3] 🗣️
Discoveries on how common voice cloning tools are, a few responsible options, and many lacking attention to ethics. (Audio; 18:28)
While doing my diligence on the broad landscape of genAI-based tools for our Part 3 article, I came across a whole bunch of irresponsible voice cloning tools. Here’s a summary of what I found, with recommendations for ethical ways to meet needs for voice cloning and vocal covers.
This article is not a substitute for legal advice and is meant for general information only.
What are voice clones and vocal covers?
AI-based voice clone tools train individual voice models on a specific speaking voice. Then the tool applies a voice model to make that voice ‘speak’ words or phrases which the speaker most likely never said. If the tool user is using a voice model that’s not their own voice, the tool user can create a recording of the voice owner ‘saying’ words without their consent.
This is AI-based ventriloquism with real people instead of puppets or dummies.
Voice cloning can make it seem that people spoke words, phrases, sentences, etc. they never did. There are innocuous applications (e.g. text-to-speech for accessibility, or a grandparent ‘singing’ a lullaby or ‘reading’ a bedtime story). But the same tools can just as easily be used to make recordings of people ‘saying’ lies or sentiments they might vehemently disagree with. It’s putting words into virtual clones of people’s mouths, often without their consent.
Heard about scams using ‘deepfakes’ of voice recordings?1 2 These tools create them. Creating and using fake recordings WITHOUT PROPER LICENSING (consent, credit, and compensation3) FROM THE VOICE OWNER is highly unethical, period - and potentially extremely damaging. And that’s what many of these tools enable.
Similarly, vocal cover tools are ‘style transfer’ tools based on generative AI. They apply a model trained on a specific speaking or singing voice to a song by a different artist. They don’t generate new melodies, but they do generate new recordings with a substitute vocal track. Essentially, then they use AI to make a recording that sounds like that speaker or vocalist is ‘singing’ a (different) song. There are two ethical issues to consider with vocal covers:
the rights of the speaker or singer whose voice is being applied to the song, and
the rights of the owner of the song.
The rise of AI voice cloning tools
To be clear, it’s no surprise that these voice cloning tools exist. They’re not new, and many people have raised alarms about the potential for scams and disinformation. What distressed me was realizing that the unethical use of voice cloning is seemingly so unquestioned, by so many sites and reviewers (and users). In my research, I found many articles about these tools that don’t even mention ethical concerns, and only a few that do.
Given that surveys have shown that most Americans feel musicians and other artists should be paid for their work4, this nonchalance about voice cloning feels like a disconnect.
And one doesn’t even need to venture into dark corners of the web to find these voice cloning tools. They’re right out there, in public.
It’s one thing for artists to develop and use models of their own voice. A few have. Artist Holly Herndon was one of the first to break ground with AI.5 Her 2021 release of her Holly+ AI voice model (“digital vocal twin”) will “allow for others to make artwork with my voice, and will distribute ownership of my digital likeness through the creation of the Holly+ DAO”6. Grimes also developed and released an Elf.Tech tool to allow others to use her voice7, which she encourages people to use under 50-50 royalty split terms.8
Of the tools not developed directly by artists like Herndon and Grimes, only a few pass the ethical “smell test”. For most AI-driven vocal cover tools I’ve looked at so far with built-in voice models, I have yet to see any indication that the artist whose voice is modeled (or the composer, lyricist, or instrument performers of the covered song), receives any compensation or has given consent.
A handful of responsible options
Relatively little attention has been given in reviews of these tools to ethical considerations. One exception from Dec. 2023 called out “AI modelling websites are making money selling the usage of models of famous singers and very few of these websites are talking about these artists getting a cut of the profits. Of those we've tested, only a few, for example Kits and Voice-Swap, mention royalty splits with the original affiliate artists who its AI algorithms are modelled on, and the website also lets these artists have the final say on commercial releases.”9
In searching for tools and reviews, I found several lists, including10. A few tool offerings appear to use AI responsibly: ElevenLabs, Kits.ai, Revocalize, Speechify, and Voice-Swap. Let’s summarize those first.
ElevenLabs - Offers the option to create a model of your own voice (possibly taking some hours to render), with some interesting technical precautions to try to ensure that it’s really your own voice you’re modeling.11 They offer over 1200 voices across 29 languages, and provide ethics resources on their home page:
Kits.ai - Their slogan (“Clone voices. Sing like anyone. Play any instrument. 100% royalty-free.”) is a bit concerning at first blush, but their ethics page indicates they are trying to act responsibly12:
Revocalize - Offers a small number of officially licensed voice models, but focus is on creating new models, not using famous voices; their offering “allows you to clone, protect, and create unique vocal tracks in any voice”.13
Voice-Swap - Offers “Session Singers” and a roster of exclusive (compensated) artists whose voices are available for use through their tool. The artist list on the site shows 16 names as of this writing. Their FAQ asserts that “As of July 2023, voice-swap is the only platform that works legally with famous artists.”14
Speechify (not for songs) - Offers voices of celebrities they have partnered with, e.g. Gwyneth Paltrow and Snoop Dogg, and they have precautions to ensure only live voices are cloned.15
Ethical voice cloning and text-to-speech tools have clear use cases that can add value. In addition to the accessibility features Speechify promotes, think how much time it might save a book author to AI-generate an audiobook version in their own voice, or a newsletter author to generate a podcast recording to help them reach new listeners!
Did I miss any voice cloning or vocal cover tools (genAI-based or not) that you consider ethical? Let me know!
Lots of irresponsible options
The few sites listed above that let you create voice models ensure it’s truly your own voice (and not e.g. the voice of someone for whom you have a recording but not their consent). This use seems ok ethically and legally.
But creating a unique model of the user’s voice is not the focus of most of the voice cloning tools. (Note that some of these tools offer other features besides voice cloning.) Most other examples in this space don’t appear to care about responsible and ethical use of AI. The main marketing for these tools promotes their prebuilt voice models of people or characters.
If a site has no FAQ or policy statement on how they train their voice models, or it doesn’t say that the voice training data has been licensed, and there’s no publicity about them successfully negotiating with an artist to use their voice, I assume the voices aren’t licensed, and are being used in the cloning tool without consent.
To name just a few celebrities whose voices are offered via these tools (there are thousands): Taylor Swift, Michael Jackson, Donald Trump and Joe Biden, Narendra Modi, Morgan Freeman, Messi, Ronaldo, Freddie Mercury, Donald Duck, Sponge Bob. It is unreasonable to believe that all of these thousands of people, or their estates, or character owners, have consented to these uses of the voice they own.
Examples of such tools include Voicify (now Jammable16), Fineshare’s Singify17 and
FineVoice18, Vocloner19, Musicfy20, Uberduck (“The world's greatest AI rap generator”)21, FakeYou (“deep fake text to speech”)22, Lalals23, and Covers.ai (“For personal and parody use only”)24.
FakeYou doesn’t have an ethics page or relevant information; they say only that their mission is to “empower anyone to create full feature-length content from home without institutional capital, large teams, huge amounts of time, or deep reservoirs of highly specialized talent”. As of this writing, they offer 3966 voices for Text To Speech (“Turn text into your favorite character's speaking voice”), including separate speaking and singing voices for Michael Jackson. For Voice to Voice, they offer 8617 voices. They also now have a beta “Voice Designer” feature. All of these seem ethically shady.
Vocloner allows users to “Clone the voice of anyone in seconds” with “just need one audio file of the voice you want to clone. Upload a sample audio file and enter the text you would like the voice to say.” There is zero information about ethical use. (They also now have an AI Music Generator in beta.)
The home page for Lalals claims “we ensure that our service is compliant and legal and our AI voices are only trained with legal material”, but doesn’t back that up with any details. The voices they claim to have include Paul McCartney and John Lennon. Online reviews for Lalals don’t address ethics, e.g. 25, or even this one which identifies other creative “perils” of tools like Lalals26.
A Voicify user can select a voice model, pair it with a YouTube URL of any song, and get a ‘cover’ of that song with that voice model. Voicify has quickly expanded its pool of AI vocal models from 800+ to 22,000+, including “voices of famous musicians, politicians, and cartoon and game characters”)27. There’s no information on their site about ethics, data sourcing, consent, or compensation, and this supposedly comprehensive “Jammable AI Review: Everything You Should Know (2024)” article by Ana Gajic from March 20, 2024 doesn’t address these ethical concerns at all.
The FAQ for Singify (AI Voice Cover) response to a question about legality is: “The legality of AI-generated song covers depends on factors like intended use, copyright laws, and permissions. Commercial use may require authorization from the original copyright holder, while personal use may fall under fair use exemptions in some jurisdictions.” They do not address licensing or data sourcing. FineVoice offers to “Quickly convert text or audio into the voice of your favorite character and add pauses, emphasis, and even unique personality” and offers no information about consent from the ‘favorite characters’ or other voices they offer.
Musicfy appears to offer ability to use your own voice or one of their copyright-free pretrained voices. That seems ok. But farther down the home page, they offer “parody” voices such as Barack Obama, Bart Simpson, Spongebob Squarepants, Mickey Mouse, and Shrek. Another page claims they have 1000+ “celebrity voices”. It’s pretty unlikely that all of those celebrity voice owners consented to this. And “Musicfy admits that its service raises ethical concerns when using famous voices but states: “You can freely use AI voice cloning, but it’s important to exercise responsibility”.28 Musicfy provides a blog page describing the permissions needed to cover a song - but with no acknowledgement of the contribution of their tool to unethical or illegal use.29
Due to lawsuits from Universal Music and others, Uberduck’s former voice library dropped from 5000+ to “227 TTS voices, 15 AI vocal voices, and one rap voice with several backing tracks” as of March 2530. At this writing, Uberduck currently offers 213 voices for text-to-speech, and only 11 for voice-to-voice (2 of which belong to Grimes).
These voice cloning tools and the unlicensed celebrity voices they provide make it impossible for us to believe that a politician or other public figure actually said the words on any ‘voice recording’ we may hear.
This is a clear path to ‘election interference’, folks - in the US and worldwide. It’s already been happening for a few years31 32, and as the tools get better at sounding more realistic & “human”, the problem is only going to get worse.
“The real existential risk is that we all become mad, because we cannot believe what we are seeing, hearing, or reading.” - Carme Artigas, Spain’s Secretary of State for AI and Digitalization 33
What we can do about it
Laws
Unauthorized voice cloning isn’t illegal everywhere yet; hopefully it’s only a matter of time. Although our voices aren’t covered by copyright, most countries recognize a person’s right of publicity. Right of publicity allows a person to control commercial use of some aspects of their identity, such as their name, image and likeness (NIL), where voice is considered as part of one’s likeness. (In the US, if you’ve heard about NIL funds motivating NCAA athletes to transfer schools to get more favorable deals34, yes this is the same NIL concept.)
In the US, like in so many other areas of law, we have a patchwork of state laws. The right of publicity isn’t yet guaranteed at the federal level. The US “No AI Fraud Act” introduced in January isn’t universally agreed upon; for instance, the Electronic Frontier Foundation has raised concerns that the draft Act, as currently written, may cause more harm than good35. However, legal precedent is already established for successfully suing for unauthorized use or imitation of a voice, e.g. Bette Midler vs. Ford, and Tom Waits vs. Frito-Lay.
Tennessee’s ELVIS Act36 is a good step towards actually making this illegal and actionable for musicians. But even that doesn’t cover non-musicians.
In followup to the UN AI Act, there’s now talk of creating a UN “watchdog” for deepfakes, for not only voice but also images and video.37 It’s going to take a while to get regional and worldwide legal agreements in place. At least the conversations are happening now.38 We can & should lobby our representatives to take this issue seriously.
Regardless of whether the law forbids unauthorized voice cloning, something so clearly unethical shouldn’t have to be illegal for people to avoid doing it.
Tool Cleanup and Legitimate Use
There have been some efforts, primarily by the large music agencies, to remove unlicensed voices from these tools, and hopefully those efforts will continue to have an impact. Uberduck is one positive example. However, even after Uberduck removed unlicensed voices following the lawsuits, there are still “concerns expressed about potential misuse of voice cloning or copyright compliance, calling for user responsibility.” (quote is from the same reference article as above)
The major record labels are surfacing initiatives to proactively manage authorized use of their artists’ voices and material.39 Negotiations with Google (for YouTube) are one example that we’ll discuss more in Part 3. These initiatives will help for music and vocal covers, at least for more famous people, but will have little impact on less famous, independent voices or deepfaked speech.
Responsible Use
TL;DR ❌ If you care about ethics, and I hope you do since you’re reading this article, simply don’t use or support ‘cover’ or ‘cloning’ tools that steal people’s voices without consent, credit, and compensation [3].
If you’d like to try a voice cloning tool, choose from among the ethical ones listed above (e.g. ElevenLabs, Kits.ai, Revocalize, Voice-Swap, or Speechify)!
Don’t Trust; Verify
Check out the innovative and useful tools ElevenLabs offers to combat deepfakes created with voice cloning tools. For instance, their safety page40 includes links for their AI Speech Classifier, which tries to detect if a voice is AI-generated.
Also, check out these ethics guidelines for responsible use41.
What’s Next?
OK, I had to get this off my chest and keep it from polluting Part 3 - thanks for ‘listening’. Back to work on it now. 😊 Once Part 3 is out, I may reward myself by squeezing in some time to try cloning my own voice, with a responsible tool that can be trusted not to reuse my voice for unethical purposes.
Edit - P.S. I did some quick analysis on the voiceover tools that I’m considering. See this May 9 article: “Say what? Adding audio voiceovers with AI, ethically”.
End Notes
“Listen carefully: The growing threat of audio deepfake scams”, by Greg Noone/TechMonitor.AI, Feb. 4, 2021
“That panicky call from a relative? It could be a thief using a voice clone, FTC warns”, by Joe Hernandez/NPR, March 22, 2023
https://holly.mirror.xyz/54ds2IiOnvthjGFkokFCoaI4EabytH9xjAYy1irHy94 (via About button on https://holly.plus/)
https://elf.tech/connect (Grimes’ site)
“Grimes invites fans to make songs with an AI-generated version of her voice”, by Vanessa Romo, April 24, 2023
Voicify/Jammable: https://www.voicify.ai, https://deepgram.com/ai-apps/voicify, https://www.toolpilot.ai/products/jammable
Singify: https://singify.fineshare.com/
FineVoice: https://www.fineshare.com/finevoice/
VoCloner: https://vocloner.com/, AI Music Generator: https://aimusicgenerator.pro/
Musicfy: https://musicfy.lol/
Uberduck: https://www.uberduck.ai/, https://app.uberduck.ai/text-to-speech, https://app.uberduck.ai/voice-to-voice
FakeYou: https://fakeyou.com/
Lalals: https://lalals.com/
Covers.ai: https://covers.ai
https://aireviewguys.com/lalals-review/
“Understanding Voicify AI - A Quick Guide”, by RavenArena on Medium, April 16, 2024
“How to make an AI cover song with any artist's voice”, by Matt Mullen/MusicRadar, Nov. 28, 2023
“What Is Uberduck? A brief overview of Uberduck, where it's best used, and the top alternatives”, by ElevenLabs Team, Mar 25, 2024
“How AI deepfakes threaten the 2024 elections”, by Rehan Mirza/The Journalist’s Resource, Feb. 16, 2024
“A fake recording of a candidate saying he’d rigged the election went viral. Experts say it’s only the beginning”, by Curt Devine, Donie O'Sullivan and Sean Lyngaas / CNN, Feb. 1, 2024
“The UN’s role in setting international rules on AI”, Carme Artigas/UN News interview, Jan. 1, 2024
“NCAA name, image and likeness FAQ: What the rule changes mean for the athletes, schools and more”, Dan Murphy / ESPN, Jun 30, 2021
“Tennessee becomes the first state to protect musicians and other artists against AI”, by Rebecca Rosman/NPR, March 22, 2024
“Does the UN need a watchdog to fight deepfakes and other AI threats?”, World Economic Forum, Aug 2, 2023
“AI music isn't going away. Here are 4 big questions about what's next”, by Jewly Hight/NPR, April 25, 2024
“Vocal deepfakes are here to stay: AI voice cloning is about to change pop music forever. Here's why”, by Matt Mullen/MusicRadar, Aug. 21, 2023
Kits.ai and Jen are now certified as Fairly Trained: https://musically.com/2024/08/05/fairly-trained-adds-more-supporters-and-updates-certification/
Subsequent to this page's publication on April 29, I discovered VoiceMod via "Fairly Trained" (https://www.fairlytrained.org/certified-models). They belong on the short list of ethical voice cloning tools. A little more information is available here: https://www.theverge.com/2023/11/30/23981835/voicemod-ai-voice-creator-community-voices)