blogs by Gio

Tagged: technical

cyber So you want to write an AI art license

  • Posted in cyber

Hi, The EFF, Creative Commons, Wikimedia, World Leaders, and whoever else,

Do you want to write a license for machine vision models and AI-generated images, but you’re tired of listening to lawyers, legal scholars, intellectual property experts, media rightsholders, or even just people who use any of the tools in question even occasionally?

You need a real expert: me, a guy whose entire set of relevant qualifications is that he owns a domain name. Don’t worry, here’s how you do it:

This is an extremely condensed set of notes, designed as a high-level overview for thinking about the problem

Given our current system of how AI models are trained and how people can use them to generate new art, which is this:

    Alice->>Model: Hello. Here are N images and<br>text descriptions of what they contain.
    Model->>Model: Training (looks at images, "makes notes", discards originals)
    Model->>Alice: OK. I can try to make similar images from my notes,<br>if you tell me what you want.
    Curio->>Model: Hello. I would like a depiction of this new <br>thing you've never seen before.
    Model->>Curio: OK. Here are some possibilites.

The works🔗

The model and the works produced with the model are both distinct products. The model is more like processing software or tooling, while the artistic works created with the model are distinctly artistic/creative output.

Models do not keep the original images they were trained on in any capacity. The only keep mathematical notes about their properties. You (almost always) cannot retrieve the original image data used from the model after training.

    Curio->>Model: Send me a copy of one of the images you were trained on
    Model->>Curio: Sorry, I do not remember any of them exactly,<br>only general ideas on how to make art.

There is a lot of misinformation about this, but it is simply, literally the case that a model does not include the training material, and cannot reproduce its training material. While not trivial (you can’t have a model if you can’t train it at all), when done properly, the specific training data is effectively incidental.

AI-generated art should be considered new craftsmanship — specifically, under copyright law, it is new creative output with its own protections — and not just a trivial product of its inputs.


The fact that AI art is new creative output doesn’t mean AI art can’t be plagiarism.

Just like with traditional art, it’s completely possible for specific products to be produced to be copies, but that doesn’t make that the case for all works in the medium. You can trace someone else’s artwork, but that doesn’t make all sketches automatically meritless works.

The inner workings of tools used in the creation of an artistic work are not what determines if a given product is plagiarism, or if it infringes on a copyright. Understanding the workings of the tool can be used in determining if a work is an infringement, but it is not the deciding factor.

To use a trivial example, if I copy an image to use in an advertisement, the copyright violation is in the use of the material, and the fact that the material is, in practice, a replica of existing copyrighted work. The “copy” program isn’t the infringement, it just informs our understanding of the infringement. Monkeys on typewriters can make something that infringes copyright too.

Is using an AI model as a step in the artistic process prima facie sufficient evidence that any work generated by it is an infringement of someone else’s copyright? The answer — based on an understanding of the tools and the range of the output space — is no.

Like all new and more efficient tools, AI art tools can be used to efficiently create new work more efficiently or copy old work efficiently. Both of those cases worry certain groups, but the fact is the technology can both create new work and copy existing work.

Don’t break everything🔗

It would be monumentally terrible for the general “right for someone to use their experience of a published work” to be codified as an idiosyncratic property right that is assumed to be reserved to the copyright holder unless they specifically license it out.

Using “an experience of a published work of art to infer what art looks like” is exactly how the AI model training that people are worried about works, and that model training runs as a user-agent, so an attempt to differentiate “tool-assisted learning” from “unassisted human learning” is also a dangerous avenue. (I reject the idea that there is a meaningful distinction between “natural” and “technologically assisted” human action, in favour of network theory.)

Creating implicit or explicit “style rights” that would give artists/companies/rightsholders legal leverage against people (AI assisted or otherwise) who make works that “feel similar”, even if aspects like the subject are materially different from anything the rightsholder has copyright to, is an even-worse-but-still monumentally terrible idea.

Possibly good goals🔗

So what do actual AI artists (like the fine folks over at the AWAY collective) want to see in copyright? I think the following are safe to describe as goals:

  • Ensuring that artists — both “traditional” and tool-assisted — are free to create and share their work without endangering themselves in the process.
  • Preventing the mass-replacement of traditional artists with systems that output cheap, mass-produced works, especially if those works are derived in part from the artists this system harms.
  • Preventing a fear-induced expansion of copyright that creates new rights that ultimately only benefit corporations that stockpile the new rights and use them against artists, the way music sampling rights work today.

These seem at odds with each other.🔗

How can you retain meaningful control over your work if making it publicly visible on the internet grants corporations rights over most of its value? How can copyright distinguish between what we consider “constructive” educational use of public information (human education, as the most trivial example) and uses we would see as exploitative, like training an AI on the works made by a particular author in order to produce facsimiles of their work without compensating the original artist?

I believe mass and corporate use of AI-generated work exploiting the creative output of humans is a real danger in a way that individual artists using AI for individual works isn’t. But how do we make that distinction in a meaningful way within the framework of copyright? What, specifically, is the distinction that makes the former a serious threat to the wellbeing of both real humans and the creative market, but the latter actively beneficial to the artistic community?

The distinction cannot simply be “commercial” use, because restrictions on commercial use penalize the independent artist as much as the would-be exploiter. An artist (again, tool-assisted or human) needs to retain creative rights over their work and be able to sell it without being permanently indentured to their educators.

Nor should it be based on some arbitrary threshold like the income of the artist, or their incorporation status. Those are empty distinctions; that’s fitting the available data points to the “model” of how I feel the world should look instead of drilling down and finding what the real distinction is.

This is a hard problem, and not one I’ve solved (yet). The above are some thoughts I’ve been chewing on — I have another article I’m working on where I go into more detail on that. But there are some moves that seem like clear steps in the right direction, like licenses incorporating Creative Commons-style share-alike principles.

Possibly good ideas🔗

Licensing models (“understandings” of art) with a requirement that art generated using that model must be attributed back to the model (and, transitively, the model’s source information) is probably a good idea and something that people (model-creators) should be able to do if they want.

Another licensing requirement that makes for a CC-type AI work is applying the principle of share-alike to the prompt settings: you could license a model such that works generated with the model must be shared with both a reference to the model and the prompt/settings used in creation (usually about a sentence of plain text).

This would not allow people to scientifically recreate exactly the same output, but it is a significant step towards identifying which source images in the data set used to train the model impacted the final product.

This “prompt sharing” is a thing AI artists are already doing, with the explicit intent of sharing insight into their work and making it easier to build on creatively; so this would not be a new invention of a license, but rather a codification of what is already the best practice for knowledge sharing.

Derivative models🔗

It is also possible to create models by merging/processing existing models instead of images.

    Alice->>Model: Hello. Here are N models, instead of images.

The share-alike principle should apply here. CC-ish licensed models should require that any models made from it is licensed under the same license (or one more permissive) to ensure the work is shared-alike and to prevent trivial laundering.

Other interfaces, tooling🔗

There is also software that provides an interface to an existing model so people can more easily use them. These can range from anything from scratch python code to Google Colab notebooks to polished mobile apps.

There isn’t anything much novel about them, from a copyright perspective: they’re pieces of interface software, and shouldn’t have much to do with the copyright status of the models they use or the outputs they generate unless they’re actively violating an existing license.

horizontal rule

dev Jinja2 as a Pico-8 Preprocessor

  • Posted in dev

Pico-8 needs constants🔗

The pico-8 fantasy console runs a modified version of lua that imposes limits on how large a cartridge can be. There is a maximum size in bytes, but also a maximum count of 8192 tokens. Tokens are defined in the manual as

The number of code tokens is shown at the bottom right. One program can have a maximum of 8192 tokens. Each token is a word (e.g. variable name) or operator. Pairs of brackets, and strings each count as 1 token. commas, periods, LOCALs, semi-colons, ENDs, and comments are not counted.

The specifics of how exactly this is implemented are fairly esoteric and end up quickly limiting how much you can fit in a cart, so people have come up with techniques for minimizing the token count without changing a cart’s behaviour. (Some examples in the related reading.)

But, given these limitations on what is more or less analogous to the instruction count, it would be really handy to have constant variables, and here’s why:

-- 15 tokens (clear, expensive)
sfx_ding = 024
function on_score()

function on_menu()
-- 12 tokens (unclear, cheap)

function on_score()

function on_menu()

The first excerpt is a design pattern I use all the time. You’ll probably recognize it as the simplest possible implementation of an enum, using global variables. All pico-8’s data — sprites and sounds, and even builtins like colors — are keyed to numerical IDs, not names. If you want to draw a sprite, you can put it in the 001 “slot” and then make references to sprite 001 in your code, but if you want to name the sprite you have to do it yourself, like I do here with the sfx.

Using a constant as an enumerated value is good practice; it allows us to adjust implementation details later without breaking all the code (e.g. if you move an sfx track to a new ID, you just have to change one variable to update your code) and keeps code readable. On the right-hand side you have no idea what sound 024 was supposed to map to unless you go and play the sound, or label every sfx call yourself with a comment.

But pico-8 punishes you for that. That’s technically a variable assignment with three tokens (name, assignment, value), even though it can be entirely factored out. That means you incur the 3-token overhead every time you write clearer code. There needs to be a better way to optimize variables that are known to be constant.

What constants do and why they’re efficient in C🔗

I’m going to start by looking at how C handles constants, because C sorta has them and lua doesn’t at all. Also, because the “sorta” part in “C sorta has them” is really important, because the c language doesn’t exactly support constants, and C’s trick is how I do the same for pico-8.

In pico-8 what we’re trying to optimize here is the token count, while in C it’s the instruction count, but it’s the same principle. (Thinking out loud, a case could be made that assembly instructions are just a kind of token.) So how does C do it?

dev Gio Flavoured Markdown

  • Posted in dev

“How can I show someone how my blog articles actually render?”

It sounds like it should be super easy, but it turns out it really isn’t. I write in Markdown (and attach the source to all my posts if you’re interested) that then gets rendered as HTML on-demand by Pelican. (More on this on the thanks page.) But that means there’s no quick way to demo what any given input will render as: it has to run through the markdown processor every time. Markdown is a fairly standard language, but I have a number of extensions I use — some of which I wrote myself — which means to get an authoritative rendering, it has to actually render.

But I want to be able to demo the full rendered output after all the various markdown extensions process. I want a nice simple way to render snippets and show people how that works, like a live editor does. The CSS is already portable by default, but the markdown rendering is done with python-markdown, which has to run server-side somewhere, so that’s much less portable.

So I spent two evenings and wrote up, which does exactly that. You can view the live source code here if you want to follow along.


dev ACNH Printer - a writeup!

  • Posted in dev

This is a writeup of a project I did in April but never released. Well, I’ve definitely released it now, if you want to give it a try!

Instead of a real introduction, here’s a video demo, with camcorder LP technology from 2005:

I am not going to buy a capture card

Ever since Wild World, Animal Crossing has had a pattern system, where players can design their own textures and use them as clothes or decoration. New Horizons has one, but since it doesn’t have a stylus you have to either use the directional pad to mark individual pixels or draw with your fingertip.

I thought it would be fun to find a way to automate that. Now, granted, it takes a while, but it’s still much faster than trying to copy pixels over by hand.