Stability AI and ethics of genAI for music
Summary of Stability AI as an uncertified company working on generative AI for music. Includes some Harmonai and Stable Audio info, key features, and a few insights on ethics of genAI for music.
Our PART 3 article provides the big-picture view of 85 companies offering AI-based tools or platforms in this musical melody space. This summary covers Stability AI (which includes Stable Audio and Harmonai). This profile page is meant to provide insights into why the company was significant enough in mid-2024 to review, even though it wasn’t chosen for deeper dives.
Description from Harmonai site: “A Stability AI Lab releasing open-source generative audio tools to make music production more accessible and fun for everyone”
Descriptions from Stable Audio site:
“Stable Audio 2.0 builds upon Stable Audio 1.0, redefining AI music creation by offering high-quality tracks up to three minutes long with its innovative audio-to-audio generation. Users can now upload audio samples and, using natural language prompts, transform these samples into a wide array of sounds.”
“Stable Audio Open is an open-source model optimized for generating short audio samples, sound effects, and production elements using text prompts. Ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings, and other audio samples, the model was trained on data from Freesound and the Free Music Archive, respecting creator rights.”
Description from PitchBook on Stability AI: “Developer of an artificial intelligence tool designed to create images based on text input given. The company's tool implements given text commands into images and other forms of media using collective intelligence and augmented technology, enabling clients to develop cutting-edge open artificial intelligence models for image, language, audio, video, 3D and biology.”
History and Partnerships
Stability AI was founded in 2019 and is headquartered in London, England. They are privately held (later stage VC) with 24 investors, including Eric Schmidt. They have ~200 employees. Their latest funding deal (closed on June 21, 2024) was $102m.
Sept. 2022: Harmonai (Stability AI) released research project Dance Diffusion (site, repo, link)
Sept 8, 2023: Stability AI expands with new Leadership Team; Chief People Officer Ozden Onder, Vice President of Communications & Community Jordan Valdés, Vice President of Business Development Scott Trowbridge, and Global Head of Public Policy Ben Brooks. (Stability AI)
Sept. 14, 2023: Stable Audio v1.0 announced (ZDnet ref)
Nov. 23, 2023: Stable Audio head Ed Newton-Rex quit the company in protest over Stability AI’s handling of training data for images (TheNextWeb ref, Engadget)
April 3, 2024: Stable Audio v2.0 announced (ZDnet ref1, ZDnet ref2)
June 05, 2024: Stable Audio Open Announced (Stability.ai) - an open source text-to-audio model, for creation of short audio samples, sound effects using text prompts
Key Features
As of Aug. 2024, Stability AI has three major offerings relating to generating music.
The Stable Diffusion model (currently “Stable Diffusion 3 Medium”). It is available with open weights (not full open source).
Stable Audio Open
Stable Audio 2.
They also offer Stable Video Diffusion, Stable Video 3D, and Stable LM 2 1.6B (open access language models).
Other music generation tools, such as Riffusion, also use the Stable Diffusion model.
Both Stable Audio 2 and Stable Audio Open use Stable Diffusion. Stable Audio 2 is more capable than Stable Audio Open.
“Stable Audio Open allows anyone to generate up to 47 seconds of high-quality audio data from a simple text prompt. Its specialized training makes it ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design. … Our commercial Stable Audio product produces high-quality, full tracks with coherent musical structure up to three minutes in length, as well as advanced capabilities like audio-to-audio generation and coherent multi-part musical compositions.”
“Stable AI’s latest audio model, Stable Audio 2, creates longer songs of up to 3 minutes at 44.1kHz stereo, a high quality work, that includes elements needed in a song such as melodies, backing track, sound effects, and more. Stability AI collaborated with Audible Magic for identifying and blocking copyrighted content in real time.” (ZDNet)
“With both text-to-audio and audio-to-audio prompting, users can produce melodies, backing tracks, stems, and sound effects, thus enhancing the creative process” (Voicebot.ai)
Training Data & Technologies
“Dance Diffusion and Stable Diffusion are built on Stability-AI, an open source API. Dance Diffusion uses machine learning to generate music from scratch.” (AudioCipher)
The StableAudio FAQ says “The AI model behind Stable Audio was trained on music from our partners AudioSparx.” and “We will be open sourcing a music generation model soon, trained on different data.”
This Nov. 23, 2023 profile on Ed Newton-Rex’s departure from Stable Audio stated: “Stable Audio was trained on licensed music. The model was fed a dataset of over 800,000 files from the stock music library AudioSparx. Any copyrighted materials had been provided with permission.”
Permission is explained in this ZDnet article from April 3, which says that AudioSparx “artists were given the option to opt out of the Stable Audio model training.”
From “Stability AI's new audio model creates even longer songs - here's how to try it for free”, Sabrina Ortiz, ZDnet Editor, April 3, 2024:
“To protect creative integrity and artists' rights, the uploads have to be free of copyrighted material. The company uses content recognition technology from Audible Magic to prevent such infringement and ensure users are compliant.”
Ownership, Pricing, and Usage Rights
As of Aug. 2024, Stable Audio offers three types of Licenses:
Personal (for non-commercial projects)
Creator (for commercial projects)
Enterprise (for organizations).
Pricing for Personal is free with a monthly track generation limit to 20, with max of 3 min track duration and upload of 3 min cropped at 30 seconds.
Creator license has 3 types (Pro, Studio and Max) starting at $11.99 monthly subscription for a monthly track generation limit to 500, with max of 3 min track duration and upload of 30 min cropped at 3 min.
In alignment with their vision, Harmonai pricing is providing free of charge as an open source technology.
Ownership and usage rights: It’s not clear who owns the copyright on tracks created with Stable Audio or Harmonai, or whether there are any restrictions on commercial usage of the generated tracks, e.g. on the free Personal plan.
What’s Next?
This post is one of the last pieces in PART 3. PART 4 is coming up next. Subscribe for FREE to be notified automatically of new posts and support our work:
REFERENCES
See this “Ethical AI for Music” page for links to all posts related to this series, as well as bonus articles mentioned above on voice cloning and other music-related topics.
Supplemental References
Not fully vetted or integrated; provided for informational purposes only.
2022-10-07 "AI music generators could be a boon for artists — but also problematic", Kyle Wiggers, Amanda Silberling / TechCrunch - Harmonai is backed by Stability AI financially. In September of 2022, Harmonai released Dance Diffusion, an algorithm and tools that can be used to generate music clips.
2023-01-05 "How to Generate AI Music with Dance Diffusion by Harmonai", Ezra Sandzer-Bell / audiocipher - ”Dance Diffusion and Stable Diffusion are built on Stability-AI, an open source API. Dance diffusion uses machine learning to generate music from scratch.”
2023-06-29 "Getty Images also filed a lawsuit against the AI start-up Stability AI in February, accusing the company of illegally employing Getty’s photos to train its image-generating bot.” (from https://analyticsindiamag.com/ai-origins-evolution/openai-gets-slapped-with-another-class-action-lawsuit/ - note: specific to images, not music
2023-09-08 Expanding Our Leadership Team: Meet Some Of Our New Team Members — Stability AI
2023-09-13 Announcing Stable Audio, a product for music & sound generation — Stability AI
”Stable AI opened Stable Audio, one of the first high quality music generation capabilities built for commercial use. In June 2024, Stable Audio Open, an open source text-to-audio model, for creation of short audio samples, sound effects using text prompts”2024-04-02 "Stability AI releases-augmented text-to-music engine Stable Audio 2 with upload and style transfer features" voicebot.ai Stable Audio 2.0
2024-04-03 and 2024-04-10: Stability AI's Stable Audio was mentioned in these two ZDNET articles
2024-05-09 “12 Best AI Text To Music Apps for People of All Skill Levels” audiocipher Ezra Sandzer-Bell - “Stability AI is one of the best Gen AI text-to-music that leverages Stable Diffusion, and primarily stands out because of its high quality outputs”
2024-05-22 Harmonai Review 2024: What It Is, How to Use It & Is It Worth It? (aihungry.com) - “Harmonai is an open source generative audio tool allowing users to create unlimited sound library - developed by musicians for musicians.”