Part 2 on ethics of generative AI for music - how the 3 stakeholder groups in Part 1 are affected by ethical concerns (including model biases), and the “pro-choice” view of ownership rights
Interesting post by Johan Cedmar-Brandstedt on artists’ rights as humans, not just their IP ownership rights. To me the 1966 agreement feels like a universal agreement on the relevant ethics, even if our laws are still a lagging global patchwork.
“… a good international agreement to consider first is the International Covenant on Economic, Social and Cultural Rights that was ratified by the United Nations General Assembly on December 16, 1966. It is important to remember that human rights are fundamental, inalienable and universal entitlements belonging to individuals, individual artists in our case. As a legal matter, human rights can be distinguished from intellectual property rights as intellectual property rights are arguably subordinate to human rights and actually implement at the national level the human rights recognized as transcending international and national intellectual property laws.”
This note provides fresh insights into what musicians think, and on the proposed US bill, regarding disclosure of use of copyrighted music to train AI models:
Thank you for the thoughtful comment, Quentin! Traceability is super interesting and I completely agree that having a standard for embedding metadata is needed. Adobe is pushing an industry initiative in this area that we’ll cover in the article. :)
Wow, that was a great read. But very difficult to comment on, since every angle I could think of was covered in the article.
You mentioned traceability briefly, and I look forward to your thoughts on that in subsequent articles. One idea I have been thinking about is embedding machine-readable metadata into media that could potentially be used as input to an LLM or other AI tool. That could potentially mitigate some of the IP issues you identified, by maintaining a strong mapping between source- and derived material.
Steganography is a mature technology that allows embedding of data that is imperceptible to humans but readable by machines in certain types of media, including music and graphics.
It would be helpful if there were a standard for embedding copyright-and origin metadata into media. A human author would encode: "I am Joe Bloggs, I created this work with no help from AI. I retain full copyright, but I will allow reasonable re-sampling by human composers at no charge. AI authors must pay a royalty fee for using this work. "
An AI might encode: "I am DeepThought. I created this work based on works by the following artists: [Joe BLoggs, ...]. I paid them royalty as appropriate. Any re-use of this work must credit the artists listed above".
That would level the playing field and protect everyone. It would also solve a huge problem for the AI industry. Namely, it would allow AI-generated works to be identified *by AIs* and would therefore help prevent the garbage-in-garbage-out threat presented by AIs getting trained on AI-generated works.
The tricky bit would be preserving the metadata throughout the mangling of an LLM. Ideally you want ALL the sources to be credited in the end product. I suspect that adding metadata vectors in the Attention block might be the solution.
Interesting post by Johan Cedmar-Brandstedt on artists’ rights as humans, not just their IP ownership rights. To me the 1966 agreement feels like a universal agreement on the relevant ethics, even if our laws are still a lagging global patchwork.
https://www.linkedin.com/posts/johan-cedmar-brandstedt-a77b311_artist-rights-are-human-rights-activity-7189354525999185920-g-0I
“… a good international agreement to consider first is the International Covenant on Economic, Social and Cultural Rights that was ratified by the United Nations General Assembly on December 16, 1966. It is important to remember that human rights are fundamental, inalienable and universal entitlements belonging to individuals, individual artists in our case. As a legal matter, human rights can be distinguished from intellectual property rights as intellectual property rights are arguably subordinate to human rights and actually implement at the national level the human rights recognized as transcending international and national intellectual property laws.”
This note provides fresh insights into what musicians think, and on the proposed US bill, regarding disclosure of use of copyrighted music to train AI models:
https://substack.com/@luizajarovsky/note/c-53711552?r=3ht54r
Thank you for the thoughtful comment, Quentin! Traceability is super interesting and I completely agree that having a standard for embedding metadata is needed. Adobe is pushing an industry initiative in this area that we’ll cover in the article. :)
Wow, that was a great read. But very difficult to comment on, since every angle I could think of was covered in the article.
You mentioned traceability briefly, and I look forward to your thoughts on that in subsequent articles. One idea I have been thinking about is embedding machine-readable metadata into media that could potentially be used as input to an LLM or other AI tool. That could potentially mitigate some of the IP issues you identified, by maintaining a strong mapping between source- and derived material.
Steganography is a mature technology that allows embedding of data that is imperceptible to humans but readable by machines in certain types of media, including music and graphics.
It would be helpful if there were a standard for embedding copyright-and origin metadata into media. A human author would encode: "I am Joe Bloggs, I created this work with no help from AI. I retain full copyright, but I will allow reasonable re-sampling by human composers at no charge. AI authors must pay a royalty fee for using this work. "
An AI might encode: "I am DeepThought. I created this work based on works by the following artists: [Joe BLoggs, ...]. I paid them royalty as appropriate. Any re-use of this work must credit the artists listed above".
That would level the playing field and protect everyone. It would also solve a huge problem for the AI industry. Namely, it would allow AI-generated works to be identified *by AIs* and would therefore help prevent the garbage-in-garbage-out threat presented by AIs getting trained on AI-generated works.
The tricky bit would be preserving the metadata throughout the mangling of an LLM. Ideally you want ALL the sources to be credited in the end product. I suspect that adding metadata vectors in the Attention block might be the solution.