Hi, The EFF, Creative Commons, Wikimedia, World Leaders, and whoever else,
Do you want to write a license for machine vision models and AI-generated images, but you’re tired of listening to lawyers, legal scholars, intellectual property experts, media rightsholders, or even just people who use any of the tools in question even occasionally?
You need a real expert: me, a guy whose entire set of relevant qualifications is that he owns a domain name. Don’t worry, here’s how you do it:
Given our current system of how AI models are trained and how people can use them to generate new art, which is this:
sequenceDiagram Alice->>Model: Hello. Here are N images and<br>text descriptions of what they contain. Model->>Model: Training (looks at images, "makes notes", discards originals) Model->>Alice: OK. I can try to make similar images from my notes,<br>if you tell me what you want. Curio->>Model: Hello. I would like a depiction of this new <br>thing you've never seen before. Model->>Curio: OK. Here are some possibilites.
The model and the works produced with the model are both distinct products. The model is more like processing software or tooling, while the artistic works created with the model are distinctly artistic/creative output.
Models do not keep the original images they were trained on in any capacity. The only keep mathematical notes about their properties. You (almost always) cannot retrieve the original image data used from the model after training.
sequenceDiagram Curio->>Model: Send me a copy of one of the images you were trained on Model->>Curio: Sorry, I do not remember any of them exactly,<br>only general ideas on how to make art.
There is a lot of misinformation about this, but it is simply, literally the case that a model does not include the training material, and cannot reproduce its training material. While not trivial (you can’t have a model if you can’t train it at all), when done properly, the specific training data is effectively incidental.
AI-generated art should be considered new craftsmanship — specifically, under copyright law, it is new creative output with its own protections — and not just a trivial product of its inputs.
The fact that AI art is new creative output doesn’t mean AI art can’t be plagiarism.
Just like with traditional art, it’s completely possible for specific products to be produced to be copies, but that doesn’t make that the case for all works in the medium. You can trace someone else’s artwork, but that doesn’t make all sketches automatically meritless works.
The inner workings of tools used in the creation of an artistic work are not what determines if a given product is plagiarism, or if it infringes on a copyright. Understanding the workings of the tool can be used in determining if a work is an infringement, but it is not the deciding factor.
To use a trivial example, if I copy an image to use in an advertisement, the copyright violation is in the use of the material, and the fact that the material is, in practice, a replica of existing copyrighted work. The “copy” program isn’t the infringement, it just informs our understanding of the infringement. Monkeys on typewriters can make something that infringes copyright too.
Is using an AI model as a step in the artistic process prima facie sufficient evidence that any work generated by it is an infringement of someone else’s copyright? The answer — based on an understanding of the tools and the range of the output space — is no.
Like all new and more efficient tools, AI art tools can be used to efficiently create new work more efficiently or copy old work efficiently. Both of those cases worry certain groups, but the fact is the technology can both create new work and copy existing work.
Don’t break everything🔗
It would be monumentally terrible for the general “right for someone to use their experience of a published work” to be codified as an idiosyncratic property right that is assumed to be reserved to the copyright holder unless they specifically license it out.
Using “an experience of a published work of art to infer what art looks like” is exactly how the AI model training that people are worried about works, and that model training runs as a user-agent, so an attempt to differentiate “tool-assisted learning” from “unassisted human learning” is also a dangerous avenue. (I reject the idea that there is a meaningful distinction between “natural” and “technologically assisted” human action, in favour of network theory.)
Creating implicit or explicit “style rights” that would give artists/companies/rightsholders legal leverage against people (AI assisted or otherwise) who make works that “feel similar”, even if aspects like the subject are materially different from anything the rightsholder has copyright to, is an even-worse-but-still monumentally terrible idea.
Possibly good goals🔗
So what do actual AI artists (like the fine folks over at the AWAY collective) want to see in copyright? I think the following are safe to describe as goals:
- Ensuring that artists — both “traditional” and tool-assisted — are free to create and share their work without endangering themselves in the process.
- Preventing the mass-replacement of traditional artists with systems that output cheap, mass-produced works, especially if those works are derived in part from the artists this system harms.
- Preventing a fear-induced expansion of copyright that creates new rights that ultimately only benefit corporations that stockpile the new rights and use them against artists, the way music sampling rights work today.
These seem at odds with each other.🔗
How can you retain meaningful control over your work if making it publicly visible on the internet grants corporations rights over most of its value? How can copyright distinguish between what we consider “constructive” educational use of public information (human education, as the most trivial example) and uses we would see as exploitative, like training an AI on the works made by a particular author in order to produce facsimiles of their work without compensating the original artist?
I believe mass and corporate use of AI-generated work exploiting the creative output of humans is a real danger in a way that individual artists using AI for individual works isn’t. But how do we make that distinction in a meaningful way within the framework of copyright? What, specifically, is the distinction that makes the former a serious threat to the wellbeing of both real humans and the creative market, but the latter actively beneficial to the artistic community?
The distinction cannot simply be “commercial” use, because restrictions on commercial use penalize the independent artist as much as the would-be exploiter. An artist (again, tool-assisted or human) needs to retain creative rights over their work and be able to sell it without being permanently indentured to their educators.
Nor should it be based on some arbitrary threshold like the income of the artist, or their incorporation status. Those are empty distinctions; that’s fitting the available data points to the “model” of how I feel the world should look instead of drilling down and finding what the real distinction is.
This is a hard problem, and not one I’ve solved (yet). The above are some thoughts I’ve been chewing on — I have another article I’m working on where I go into more detail on that. But there are some moves that seem like clear steps in the right direction, like licenses incorporating Creative Commons-style share-alike principles.
Possibly good ideas🔗
Licensing models (“understandings” of art) with a requirement that art generated using that model must be attributed back to the model (and, transitively, the model’s source information) is probably a good idea and something that people (model-creators) should be able to do if they want.
Another licensing requirement that makes for a CC-type AI work is applying the principle of share-alike to the prompt settings: you could license a model such that works generated with the model must be shared with both a reference to the model and the prompt/settings used in creation (usually about a sentence of plain text).
This would not allow people to scientifically recreate exactly the same output, but it is a significant step towards identifying which source images in the data set used to train the model impacted the final product.
This “prompt sharing” is a thing AI artists are already doing, with the explicit intent of sharing insight into their work and making it easier to build on creatively; so this would not be a new invention of a license, but rather a codification of what is already the best practice for knowledge sharing.
It is also possible to create models by merging/processing existing models instead of images.
sequenceDiagram Alice->>Model: Hello. Here are N models, instead of images.
The share-alike principle should apply here. CC-ish licensed models should require that any models made from it is licensed under the same license (or one more permissive) to ensure the work is shared-alike and to prevent trivial laundering.
Other interfaces, tooling🔗
There is also software that provides an interface to an existing model so people can more easily use them. These can range from anything from scratch python code to Google Colab notebooks to polished mobile apps.
There isn’t anything much novel about them, from a copyright perspective: they’re pieces of interface software, and shouldn’t have much to do with the copyright status of the models they use or the outputs they generate unless they’re actively violating an existing license.