Say what? Adding audio voiceovers with AI, ethically 🗣️
🗣️ Comparison of features and pricing on 3 AI-based voice cloning tools, plus next steps in my experiment with AI-generated audio voiceovers for my publications. (audio; 12:44)
Thinking of adding audio voiceovers to your Substack posts? Me too!
Wondering if generative AI tools can help you with your audio voiceovers? Me too!
This article summarizes what I’ve learned in the past week about 3 AI-based voice cloning tools, my planned next steps for evaluating them, and how you can participate!
I use two abbreviations in this article:
PVC - Professional Voice Cloning (ElevenLabs feature, also supported by Wondercraft)
TTS - Text To Speech
Background
Substack has been abuzz lately about the benefits of adding audio (and video) for subscribers and newsletter publishers.1 I decided to explore adding audio voiceover to my Substack posts. It needs to be:
time-efficient (to save more of my weekly effort budget for actual writing), and
minimal cost (because I’m on a shoestring budget).
Using a canned, pre-trained generative AI voice can be free for a few short recordings (some links are in the endnotes2), and it could be efficient. It doesn’t appeal to me, though. I want my own voice saying my own words.
That leaves 3 options to consider:
Manually recording my own voice with conventional tools (no AI) - no cost, but likely to be time-consuming (links for tips on doing this well are in the endnotes3)
Using an ‘instant’ AI-based clone of my voice to generate audio
Using a high-quality “professional” AI-based clone of my voice to generate audio
Options 2 and 3 motivated me to tackle research into AI-based Text To Speech (TTS) voice cloning tools in the past few weeks. As mentioned in my April retrospective, I’m now starting an experiment. The experiment goal is to:
evaluate ethical AI-based voice cloning tools - initially, Speechify and ElevenLabs,
compare cost, quality, and effort vs manual recording for audio voiceovers, and
come to a decision (probably in June).
My April 28 6 P's in AI Pods article summarized AI-based voice cloning/vocal cover tools and identified which ones appear to be ethical (spoiler: most do not). That article has details on ethics of ElevenLabs and Speechify. On Tuesday, I learned about Wondercraft.ai voice cloning and started looking into its ethics.
Wondercraft.ai - a quick look at ethics
Wondercraft’s FAQ4 offers an ElevenLabs PVC promo code and links to the ElevenLabs VoiceLab5 for Professional Voice Cloning. However, Wondercraft also lets you configure your own voice model on their site with ~4 minutes of spoken audio input (they even provide a sample script).
The FAQ6 doesn't mention any guardrails to try to ensure that users only create a custom (non-PVC) voice model in Wondercraft with *their* voice, not an unauthorized voice. Wondercraft only provides a page with a stern warning about legality. 🙄 That’s a potential ethical concern. I’m also not yet finding any documents describing:
the sources of the pre-built voices they offer, or
how they protect your voice model from reuse for other purposes.
(In fairness, I haven’t yet read through all of their many policy PDFs.)
Bottom line: Although Wondercraft may not be as strong ethically as I hope, for now I’m adding it to my short list, along with Speechify and ElevenLabs.
TTS Features and Pricing Models
In all 3 tools, training a model for TTS with your own voice requires a monthly plan which is not free. Higher-quality voice models (PVC) cost more. (All 3 tool providers offer discounts for annual plans.)
Pricing for all plans scales rapidly based on the number of minutes of audio recording needed. To keep my costs down for Option 2 and Option 3, I’ll need to set a minimal ‘minutes budget’ that I can live with.
Caveat: This analysis isn’t a broad evaluation of these 3 TTS tools for all needs. Each tool includes other features (e.g. multiple voices or ‘seats’) which are not relevant to my current needs for audio voiceovers. Features I’m ignoring may be of value to people with other needs, so their evaluations may differ.
Speechify Voice Cloning
For TTS, Speechify Limited is free and Speechify Premium is $139/year. However, neither supports cloning and using your own voice, and the focus seems to be personal use.
TTS plans and prices are summarized in the endnotes7 for Speechify Studio, which includes additional capabilities beyond basic TTS. Discounts for annual plans are substantial (over 50%).
Based on the length of time needed to train the voice model (“in seconds”), Speechify Professional’s voice cloning seems comparable to ‘instant’ (non-PVC) offerings.
ElevenLabs Voice Cloning
ElevenLabs TTS plans and prices for basic and professional-quality voice cloning are summarized in the endnotes.8
Thank you to for sharing his example of a canned voice from his ElevenLabs trial9 👍
Wondercraft Voice Cloning
Wondercraft plans and pricing for basic and professional-quality voice cloning are summarized in the endnotes.10
It’s not clear if the Wondercraft Pro fee includes the ElevenLabs Creator subscription for PVC at $18.33/mo, or if the promo code offsets the cost (once or recurring).
Tool Comparisons
All costs and minute allocations are per month. Monthly prices assume a yearly plan. All plans considered allow commercial use of the generated audio recordings.
Basic cloning of one voice
The table below summarizes comparisons of these 3 tools for basic voice cloning. Tool advantages vary by audio recording time limits (30, 60, 120, or more minutes). Details are in the endnotes.11
Professional cloning for one voice
A key pricing factor is whether the Wondercraft Pro account subsidizes or covers use of the ElevenLabs Voice Lab for PVC (e.g. via the promo code).
If not, then the total Wondercraft Pro cost at all levels will be at least $18.33 more, for at least one month, to build the model and import it to Wondercraft.
It’s not clear whether the PVC model built in ElevenLabs can continue to be used in Wondercraft even if the ElevenLabs Creator account is no longer active.
If not, the additional $18.33/mo will boost the total Wondercraft Pro cost every month.
Assuming that Speechify Professional’s voice cloning is ‘instant’-quality, not professional-quality, only ElevenLabs and Wondercraft are considered here.
The table below summarizes comparisons of these 2 tools for professional voice cloning. As with basic voice cloning, tool advantages vary by audio recording time constraints (120, 150, 300, or more minutes). Detailed commentary is in the endnotes.12
TL;DR Summary of Pricing and Feature Comparisons for Voice Cloning
Assuming that Wondercraft Pro covers the ElevenLabs Creator PVC fee, the table below shows the overall results and costs per month by audio recording limits, up to 500 minutes per month.
Bottom line:
For low-volume basic voice cloning, up to 30 minutes/month, ElevenLabs Starter is the clear choice at $4.17, and ElevenLabs Creator will get you up to 120 minutes/month for $18.33.
For high-volume basic voice cloning, up to 500 minutes/month, Speechify Professional is the clear choice at $32.09
For professional voice cloning, ElevenLabs Creator or Pro plans look like the best choices, although Wondercraft Pro-150 has an edge on cost in the 121-150 minute window. Depending on Wondercraft promo codes and discounts, it could also be competitive in other windows.
Limitations
This evaluation is, so far, based on reading available online materials about each of the tools; no hands-on experience (yet).
Some of the tool providers may run promotions or referrals which significantly reduce the cost (I’ve heard that Wondercraft does). This could change the tradeoffs and outcomes. Only standard published prices were used for this analysis.
❓Do any of you who have worked with these tools:
notice any considerations I missed, or
see anything I’ve misinterpreted about the tools’ voice cloning capabilities or pricing models?
Conclusions
⚖️ My final evaluation will weigh the tradeoffs among audio quality, time limits, effort, and cost for options 1 & 2 - and option 3, IF higher quality is needed.
Option 1 (recording my own voice without AI) is my baseline. It’s “free” for me; I already have a Yeti microphone, a pop filter, and Audacity. And audio recording time per month is unlimited! The main cost will be my time.
For Option 2, the ElevenLabs Starter account looks affordable and provides the minimum voice cloning capability I’m looking for.
Evaluation Plan13:
Option 1: Make a few sample recordings manually; measure effort & quality
Option 2: Evaluate ElevenLabs with a Starter account; compare effort & quality
❓ Key question: is basic voice cloning ‘good enough’ to add value to my publications?
🤞At this point, I’m crossing my fingers that:
(1) 30 min/month will be enough for now, and
(2) ElevenLabs Starter quality, with basic voice cloning, will be good enough and save me enough time (vs. manual voiceover recordings) to justify the small monthly budget hit.
What’s Next?
🚨 WRITERS: If you are using one of these tools now for Substack audio voiceovers, or decide to try one, I’d love to join forces with you on this evaluation! Please share your experience and lessons learned (let me know if you’d like to guest-post here about it).
📣 I’ll post in a future article about the results of my experiments with ElevenLabs Starter and other options. Subscribe for FREE to be automatically notified when new posts are published (and to let me know this AI topic interests you)!
❓ READERS: What do YOU think about audio voiceovers?
Endnotes:
Examples of free TTS tools for canned voices (& limited character counts) from web searches and via MakeUseOf and TechRadar include Balabolka, Festvox, Hearling, Kukarella, Luvvoice, NaturalReaders, Panopreter Basic, Text2Speech.org, Text2Voice.org, TTSmp3, TTSMaker, and Zabaware. Speechify also offers a somewhat limited, but free, version.
Some tips for manually recording good-quality voiceovers (my Option 1):
https://speechify.com/pricing/
Summary of Speechify Studio pricing and plan capabilities:
Free plan: $0, 10 min - allows trying out the features with 200+ voices. Doesn’t allow downloads, so not a fit for my needs.
Basic plan: $69/mo ($24/mo if yearly), ~240 min - allows commercial use; up to 50 hours of voice generation per year (~4 hours/mo).
Professional plan: $99/mo ($32.08/mo if yearly), 500 min - adds “AI Avatars” and voice cloning, for up to 100 hours of audio generation per year (~8.3 hours/mo)
Based on the length of time needed to train the voice model (“in seconds”), Speechify Professional’s voice cloning seems comparable to ‘instant’ (non-PVC) offerings.
Enterprise plan: Negotiable pricing - can go up to 1000 hours/year per user
Summary of ElevenLabs TTS pricing and plan capabilities:
Free account: $0, 10 min - allows use of ‘thousands’ of synthetic voices or creating a custom, synthetic voice (assuming this does not support using your own voice). It also supports translation (automatic dubbing).
Starter account: $5/mo ($4.17/mo if yearly), 30 min - supports cloning your own voice “with as little as 1 min of audio”; recordings ok for commercial use.
Creator account: $22/mo ($18.33/mo if yearly), 120 min - enables Professional Voice Cloning, API support for higher quality, and Projects to create long form content with multiple speakers
Pro account: $99/mo ($82.50/mo if yearly), 600 min - similar to Creator, plus even higher quality
Scale account: $330/mo ($275/mo if yearly), 2400 min - similar to Pro, plus very high quality and priority support.
https://www.wondercraft.ai/pricing
Summary of Wondercraft.AI plans and capabilities:
Free account: $0, 4 min - 10 standard voices, watermarked recordings (assumed, since it’s stated that audio recordings in paid accounts don’t have watermarks).
Creator account - $34/mo ($29/mo if yearly), 60 min - 40 premium voices and ‘instant’ voice cloning for 1 voice; removes watermarking.
Pro account - Professional Voice Cloning + instant voice cloning for 5 voices
$64/mo ($59/mo if yearly), 150 min
$109/mo ($99/mo if yearly), 300 min
$209/mo ($199/mo if yearly), 600 min
$309/mo ($299/mo if yearly), 900 min
$509/mo ($499/mo if yearly), 1500 min
Notes on pairwise comparisons of basic voice cloning tools:
Unless otherwise noted, all options are for non-professional, ‘instant’ voice cloning.
>120 min: Speechify Professional, the least expensive Speechify option for basic cloning, is the only option for 121+ min of basic cloning.
$32.08 gets you 500 min with Speechify Professional.
Wondercraft Pro is $59+ for 150 min and ElevenLabs Pro is $82.50 for 600 min, both much more expensive than Speechify Professional. However, both of those plans support PVC, while Speechify does not. So these options aren’t really comparable apples-to-apples vs. Speechify Professional.
ElevenLabs vs Wondercraft:
<= 30 min: ElevenLabs Starter and Wondercraft Creator accounts appear similar. ElevenLabs offers a much lower entry point ($4.17 vs $29) but for only half the time (30 min vs. 60 min).
31-120 min: An ElevenLabs Creator account offers twice as much time as Wondercraft Creator (120 min vs. 60 min) and still costs less ($18.33 vs $29) - plus ElevenLabs Creator supports PVC.
Winner: ElevenLabs Starter or Creator (which plan to choose depends on time needs)
Speechify vs Wondercraft:
Speechify Professional is slightly higher cost than Wondercraft Creator ($32.09 vs. $29) but offers way more time (500 min vs. 60 min).
Winner: depends on time needs
31-60 min: slight edge to Wondercraft Creator if budget is tight
61+ minutes: Speechify Professional
ElevenLabs vs. Speechify:
<= 30 minutes: As long as the 30-min time constraint is ok, ElevenLabs Starter wins on cost vs. Speechify Professional ($4.17 vs $32.08 for 30 min vs. 500 min).
31-120 minutes: Although it still offers way less recording time (120 min vs. 500 min), ElevenLabs Creator is still significantly less expensive than Speechify Professional ($18.33 vs $32.08) and it includes higher quality PVC.
Winner: ElevenLabs (Starter or Creator - which plan to choose depends on time needs)
Notes on Professional Voice Cloning comparison:
<= 120 min: An ElevenLabs Creator account appears to be as capable as Wondercraft’s Pro account at less than half the cost ($18.33 vs $59), for slightly less recording time (120 min vs. 150 min).
Regardless of whether Wondercraft offsets PVC cost, ElevenLabs is a better deal.
121-150 min: Above 120 min, an ElevenLabs Pro account is needed (600 min). Within the narrow window of 120-150 min, Wondercraft Pro’s plan for 150 min is somewhat more affordable than ElevenLabs Pro ($59 vs $82.50).
If there is no PVC offset, then cost is closer ($77.33 vs $82.50), and going with ElevenLabs Pro to get the full 600 minutes instead of 150 would probably make the most sense.
151-300+ min: Above 150 min, $99/mo buys twice as much recording time with ElevenLabs Pro account (600 min for $82.50) vs. Wondercraft Pro (300 min for $99 or 600 min for $199).
Winner: ElevenLabs (Creator or Pro - which plan to choose depends on time needs) with the possible exception of the 121-150 minute budget window, where Wondercraft Pro has a small edge (larger advantage IF they offset PVC cost).
Evaluation Plan details:
✅ If quality of basic cloning is good enough (option 2):
for < 30 min/month, I’ll stay with ElevenLabs Starter
for 30-120 min/month, I’ll look at ElevenLabs Creator (bonus: PVC)
for 120-600 min, I’ll look at Speechify Professional
❎ If quality of basic cloning is not satisfactory, I’ll look at option 3, using my Option 1 recordings for training the PVC model. Then I’ll compare quality, cost, effort saved, and time limits vs. option 1.
for <120 min/month, the best Option 3 choice will be ElevenLabs Creator
for 120-600 min/month, the best Option 3 choice will be ElevenLabs Pro
Kits.ai and Jen have now been certified as Fairly Trained, too: https://musically.com/2024/08/05/fairly-trained-adds-more-supporters-and-updates-certification/
Right as this article was published, ElevenLabs released a preview of a genAI music tool. Their training dataset wasn't clear at first, and the company declined to comment on it (ref: https://venturebeat.com/ai/elevenlabs-previews-music-generating-ai-model/). With no further information forthcoming since then, it seems likely it *wasn't* ethically trained 😞 That takes them out of contention for my ethical shoestring list.
I'll be sticking with "Option 1: Make a few sample recordings manually; measure effort & quality" for now. If you find audio voiceovers helpful, please comment or DM to let me know!