Riffusion and ethics of genAI for music
Summary of Riffusion as an uncertified company working on generative AI for music. Includes some product announcements, key features, and a few insights on ethics of genAI for music.
Our PART 3 article provides the big-picture view of 85 companies offering AI-based tools or platforms in this musical melody space. This profile page is meant to provide insights into why Riffusion is significant enough to review, even though it wasn’t chosen for deeper dives.
Acronyms
DAW - digital audio workstation
TTA - text to audio (more general than music; can include spoken vocals as well as sung vocals & instruments)
TTM - text to music
9. Riffusion
Description from PitchBook: “Developer of an open-source artificial intelligence model designed to generate images from text and audio clips from spectrograms. The company uses text prompts to generate image files, which can be put through an inverse Fourier transform and converted into audio files, enabling listeners to virtually visualize the music thoroughly.”
History and Partnerships
Riffusion was founded in 2022, headquartered in San Francisco. They are listed as privately held.
Dec. 2022: Riffusion introduced (site, repo)
https://techcrunch.com/2022/12/15/try-riffusion-an-ai-model-that-composes-music-by-visualizing-it/
(add date for V2)
Jan.2, 2024: Audacity partnership (also mentioned at https://www.audiocipher.com/post/ai-music-app)
The company raised $4m in Oct. 2023 and introduced an updated website with more tools. (refs: “Riffusion's Text-to-Song Generator: Tutorial & Competitors”, https://techcrunch.com/2023/10/17/ai-generating-music-app-riffusion-turns-viral-success-into-4m-in-funding/)
On Jan. 2, 2024, Audacity (audio editing tool) launched a new set of AI plugins (“OpenVINO AI Effects”) and partnered with Riffusion to “provide a text-to-music experience in app”. (refs: https://www.audiocipher.com/post/generative-audio-workstation, https://www.audacityteam.org/blog/openvino-ai-effects/, https://github.com/riffusion/riffusion).
Key Features
Riffusion is an AI-driven tool to generate a song based on user-provided lyrics. The Riffusion website claims “This AI will literally sing whatever you type”. It works by converting users’ text prompts into spectrograms (visual representations of sound), which are then converted into audio. Users are then given some options for refining the audio.
The primary UI feature is a TTM prompt box for entering text to be converted to music. It also includes buttons for “random prompt”, “play”, “share”, “settings”, “debug”, “seed image”, and “denoising”.
Training Data & Technologies
The Riffusion AI tool is open source (repo, link1). No information appears to be readily available on where the music was obtained for training the Riffusion models. One possible source mentioned is LAION-AI’s audio dataset, but this is unconfirmed.
From soundtechinsider.com/riffusion-the-stable-diffusion-of-music, 2023-03-07: “Riffusion is a groundbreaking neural network designed and developed by Seth Forsgren and Hayk Martiros that is capable of generating music using images of sound instead of audio. …
Riffusion, at its core, is essentially a fine-tuned version of Stable Diffusion, an open-source model for generating images from text prompts, but applied to spectrograms.
It applies the same principles of image generation utilized in Stable Diffusion (with the use of img2img functionality) but purely focused on the interpolation and looping of images.
The real magic happens when these generated images go through a computation process called "Short-time Fourier transform" which in conjunction with the use of Torchaudio can extrapolate all the information provided in the spectrograms into audio waves that translate into music.”
Ownership, Pricing, and Usage Rights
(see website)
What’s Next?
Riffusion will not be reviewed further in this series.
This post is one of the last big pieces in PART 3. Subscribe for FREE to be notified automatically of new posts and support our work:
REFERENCES
See this “AI for Music” page for links to all posts related to this series, as well as bonus articles mentioned above on voice cloning and other music-related topics.
If you enjoy my work, I’d love to have your support via a heart, share, restack, Note, one-time tip, or voluntary donation via paid subscription!