Why training AI can't be IP theft

44 min read (1+ hr w/ quotes)
Tagged: #AI #publication #IP #enforcement #prosthesis #rhetoric #plagiarism

Posted Thu Apr 03, 2025 in cyber

AI is a huge subject, so it’s hard to boil my thoughts down into any single digestible take. That’s probably a good thing. As a rule, if you can fit your understanding of something complex into a tweet, you’re usually wrong. So I’m continuing to divide and conquer here, eat the elephant one bite at a time, etc.

Right now I want to address one specific question: whether people have the right to train AI in the first place. The argument that they do not¹ goes like this:

When a corporation trains generative AI they have unfairly used other people’s work without consent or compensation to create a new product they own. Worse, the new product directly competes with the original workers. Since the corporations didn’t own the original material and weren’t granted any specific rights to use it for training, they did not have the right to train with it. When the work was published, there was no expectation it would be used like this, as the technology didn’t exist and people did not even consider “training” as a possibility. Ultimately, the material is copyrighted, and this action violates the authors’ copyright.

I have spent a lot of time thinking about this argument and its implications. Unfortunately, even though I think that while this identifies a legitimate complaint, the argument is dangerously wrong, and the consequences of acting on it (especially enforcing a new IP right) would be disastrous. Let me work through why:

The complaint is real

Artists wanting to use copyright to limit the “right to train” isn’t the right approach, but not because their complaint isn’t valid. Sometimes a course of action is bad because the goal is bad, but in this case I think people making this complaint are trying to address a real problem.

I agree that the dynamic of corporations making for-profit tools using previously published material to directly compete with the original authors, especially when that work was published freely, is “bad.” This is also a real thing companies want to do. Replacing labor that has to be paid wages with capital that can be owned outright increases profits, which is every company’s purpose. And there’s certainly a push right now to do this. For owners and executives production without workers has always been the dream. But even though it’s economically incentivized for corporations, the wholesale replacement of human work in creative industries would be disastrous for art, artists, and society as a whole.

So there’s a fine line to walk here, because I don’t want to dismiss the fear. The problem is real and the emotions are valid, but that doesn’t mean none of the reactions are reactionary and dangerous. And the idea that corporations training on material is copyright infringement is just that.

The learning rights approach

So let me focus in on the idea that one needs to license a “right to train”, especially for training that uses copyrighted work. Although I’m ultimately going to argue against it, I think this is a reasonable first thought. It’s also a very serious proposal that’s actively being argued for in significant forums.

Copyright isn’t a stupid first thought. Copyright (or creative rights in general) intuitively seems like the relevant mechanism for protecting work from unauthorized uses and plagiarism, since the AI models are trained using copyrighted work that is licensed for public viewing but not for commercial use. Fundamentally, the thing copyright is “for” is making sure artists are paid for their work.

This was one of my first thoughts too. Looking at the inputs and outputs, as well as the overall dynamic of unfair exploitation of creative work, “copyright violation” is a good place to start. I even have a draft article where I was going to argue for this same point myself. But as I’ve thought through the problem further, that logic breaks down. And the more I work through it, every IP-based argument I’ve seen to try to support artists has massively harmful implications that make the cure worse than the disease.

Definition, proposals, assertions

The idea of a learning right is this: in addition to the traditional reproduction right copyright reserves to the author, authors should be able to prevent people from training AI on their work by withholding the right.

This learning right would be parallel to other reservable rights, like reproduction: it could be denied outright, or licensed separately from both viewing and reproduction rights at the discretion of the rightsholder. Material could be published such that people were freely able to view it but not able to use it as part of a process that would eventually create new work, including training AI. The mechanical ability to train data is not severable from the ability to view it, but the legal right would be.

This is already being widely discussed in various forms, usually as a theory of legal interpretation or a proposal for new policy.

Asserting this right already exists

Typically, when the learning rights theory is seen in the wild it’s being pushed by copyright rightsholders who are asserting that the right to restrict others from training on their works already exists.

A prime example of this is the book publishing company Penguin Random House, which asserts that the right to train an AI from a work is already a right that they can reserve:

Penguin Random House Copyright Statement (Oct 2024) No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems. In accordance with Article 4(3) of the Digital Single Market Directive 2019/790, Penguin Random House expressly reserves this work from the text and data mining exception.

In the same story, the Society of Authors explicitly affirms the idea that AI training cannot be done without a license, especially if that right is explicitly claimed:

The ambiguous "use"

6 min read
Tagged: #language #rhetoric #ai #enforcement #ip

Posted Fri Nov 15, 2024 in politics

I keep seeing people make this error, especially in social media discourse. Somebody wants to “use” something. Except obviously, it’s not theirs, and so it’s absurd for them to make that demand, right?

Quick examples

I’m not trying to pick on this person at all: they’re not a twitter main character, they’re not expressing an unusual opinion here, they seem completely nice and cool. But I think this cartoon they drew does a good job of capturing this sort of argument-interaction, which I’ve seen a lot:

crumb@cuptoast
???

Sun Sep 22 22:20:02 +0000 2024

I’ve also seen the exact inverse of this: people getting upset at artists because once the work is “out there” anyone should be able to “use” it. (But I don’t have a cartoon of this.)

There is an extremely specific error being made in both cases here, and if you can learn to spot it, you can save yourself some grief. What misuse is being objected to? What are the rights to “certain things” being claimed?

The problem is that “use” is an extremely ambiguous word that can mean anything from “study” to “pirate” to “copy and resell”. It can also cover particularly sensitive cases, like creating pornography or editing it to make a political argument.

But everything people do is “using” something. By itself, “use” is not a meaningful category or designation. Say you buy a song — listening to it, sampling it, sharing it, performing it, discussing it, and using it in a video are all “uses”, but the conversations about whether each is appropriate or not are extremely distinct. If you have an objection, it matters a lot what specific use you’re talking about.

But if you’re not specific, there are unlimited combinations of “uses” you could be talking about, and you could mean any of them. And when people respond, they could be responding to any of those interpretations. There’s no coherent argument in any sweeping statement about “use”; the only things being communicated are frustration and a team-sports-style siding with either “artists” or “consumers” (which is a terrible distinction to make!).

Formal logic

This is not a new problem. This is the Fallacy of Equivocation, which is a subcategory of Fallacies of Ambiguity. This is when a word (in this case, “use”) has more than one meaning, and an argument uses the word in such a way that the entire position and its validity hinge on which definition the reader assumes.

The example of this that always comes to my mind first is “respect”, because this one tumblr post from 2015 said it so well:

flyingpurplepizzaeater Sometimes people use “respect” to mean “treating someone like a person” and sometimes they use “respect” to mean “treating someone like an authority”

and sometimes people who are used to being treated like an authority say “if you won’t respect me I won’t respect you” and they mean “if you won’t treat me like an authority I won’t treat you like a person”

and they think they’re being fair but they aren’t, and it’s not okay.

See, here the “argument” relies on implying a false symmetry between two clauses that use the same word but with totally different meanings. And, in disambiguating the word, the problem becomes obvious.

Short-form social media really exacerbates the equivocation problem by encouraging people to be concise, which leads to accidental ambiguity. But social media also encourages people to take offense at someone else being wrong as the beginning of a “conversation”, which encourages people to use whatever definition of other people’s words makes them the wrongest.

Copyright examples

Since I’m already aware that copyright is a special interest of mine, I try to avoid falling into the trap of modeling everything in terms of copyright by default, Boss Baby style. But this is literally the case of a debate over who has the “right” to various “uses” of things that are usually intangible ideas, so I think it’s unavoidably copyright time again.

Game patent grab bag

4 min read
Tagged: #ip #feudalism #enforcement #buying-the-euphemism #nintendo

Posted Wed Sep 18, 2024 in politics

This was originally something I was going to talk about in Corporations have Rejected Copyright, back when that series was going to just be one long post (really!). But since I saw Nintendo apparently sued Palworld today, I wanted to put this up as background information.

You should definitely read You’ve Never Seen Copyright first, particularly the explanation of what patents are, because this conversation directly follows from that. The most important thing to pick up on is how the Doctrine of Equivalents lets companies use patents that are supposedly very specific to threaten other implementations that are similar, even if they aren’t using the patented design.

Game patents are revelatory, because game rules as a category explicitly do not fall within the realm of patent rights, but companies have managed to file and defend fraudulent patents anyway.

Copyright abusers lost their claim

18 min read
Tagged: #ip #enforcement #fanwork #piracy

Posted Sun Jul 14, 2024 in politics

or, the many people who said movies like Coyote v. Acme that were killed for a tax write-off should be forced into the public domain were right, and here’s why

Copyright is busted, now what?

A healthy system of creative rights, including a balanced form of copyright, is a reciprocal arrangement between creators, consumers, and the commons. Creators are granted some temporary exclusive rights by the government over qualifying intellectual work in order to incentivize creativity. These privileges are granted in exchange for creating valuable new information — the existence of which is a contribution to the public good — and for providing it in such a way that others will be able to build on it in the future. It’s an incentive for providing a specific social good, one which the market alone might not reward otherwise. Fortunately, this is actually how US copyright was designed; see You’ve Never Seen Copyright.

The takeaway from that, though, is not just that there’s a fair version of copyright, but that copyright must look like that fair model. The fact that such a thing as “good copyright” exists as a sound philosophy is not a broad defense of the word “copyright” itself, it’s an imperative requirement for the legitimacy of any system of power that claims to enforce copyright. The soundness of the philosophy doesn’t legitimate the system of power that shares its name, it damns it for failing its requirements.

When they invoke the philosophy of copyright to justify thuggery, it matters that they’re wrong.

The requirements for reciprocity intrinsic in copyright are how the system must work, but it’s not what actually happens today. In practice, corporations regularly violate the fundamental principles of creative rights — both in letter and in spirit — and use copyright protections to profit without showing the required reciprocity.

I can’t possibly list all the stories of what these violations look like. Seriously, just the thought of me having to give a representative sample of companies abusing IP law made me dread writing this series, it’s such a prolific problem. But I have shown a sample: Nintendo using copyright to kill new creative work, Apple using trademarks to keep competitors from conducting trade at all, book publishers trying to destroy the idea of buying and selling books… they’re all examples of how companies do everything they possibly can to get out of fulfilling their side of the bargain.

Case studies are fun, but just listing out a bunch of horrors isn’t what I set out to do; that’s just groundwork for thinking about the problem. What’s important is that they’re a representative sample of a kind of behavior. With all that established, you can read this with the knowledge that yes, they violate the purpose of the law as written and yes, violations are so regular they seem to define the practice.

So what does it all add up to?

Here’s what I say: If you want out of the deal, so be it. When someone won’t participate constructively — if they don’t work in good faith, or at least begrudgingly accept the limits the system of copyright puts on them — we stop respecting their claim to special privileges within it as legitimate, and understand it as the double-dealing overreach it is.

As self-evident as it sounds when I say it out loud, this argument is my nuclear option. This is what I would have to say if it ever got this bad; if, between the two of them, the courts and the corps ever broke the system beyond my last bit of tolerance. And I’ll be damned if they haven’t done just that.

Legitimacy

In You’ve Never Seen Copyright, I talked about how the word “copyright” can refer to two very different things: either a philosophical basis that justifies copyright as a legal doctrine, or the system of power that describes how copyright is actually enforced, what enforcement looks like, and who it benefits.

But the fact that the power structure has diverged from the original philosophical intent doesn’t just create a communication issue. Yes, it becomes increasingly unclear what people who say “copyright” are talking about, but the legitimacy of the power structure depends entirely on being an implementation of a sound legal doctrine.

CDL: The AAP is Wrong About Everything

38 min read (58 min w/ quotes)
Tagged: #ip #feudalism #enforcement

Posted Mon Mar 04, 2024 in politics

In going through these arguments, I’ll also be drawing from a few other sources, in order to give a more comprehensive description of the arguments being made.

The Authors Guild Amici Curiae Brief is a document submitted to the court by The Authors Guild in support of the plaintiff’s argument.

Reflections from the Association of American Publishers on Hachette Book Group v. Internet Archive: An Affirmation of Publishing is a victory-lap publication from the AAP, published after the summary judgement in favor of the plaintiffs.

And there’s also EFF, Redacted Memorandum of Law In Support of Defendant Internet Archive’s Motion for Summary Judgment, written by the EFF in support of the Internet Archive, and whose arguments overlap a lot with mine.

Alright, there’s never anything more damning than their own words, so let’s just look at what it is they said here.

CDL: Publishers Against Books

51 min read (1+ hr w/ quotes)
Tagged: #ip #feudalism #enforcement #drm #rhetoric #archival

Posted Sun Mar 03, 2024 in politics

Combining lending with digital technology is tricky to do within the constraints of copyright. But it’s important to still be able to lend, especially for libraries. With a system called Controlled Digital Lending, libraries like the Internet Archive (IA) made digital booklending work within the constraints of copyright, but publishers still want to shut it down. It’s a particularly ghoulish example of companies rejecting copyright and instead pursuing their endless appetite for profit at the expense of everything worthwhile about the industry.

Notes on the VRC Creator Economy

4 min read
Tagged: #platforms #feudalism #rug-pull #user-agent #enforcement #fanwork

Posted Wed Nov 22, 2023 in gaming

My friend Floober brought some recent changes VRChat is making in chat, and I thought I’d jot down my thoughts.

The problem with the VRC economy is the same problem as with most “platform economies”: everyone is buying lots in a company town.

The Store

This was the precipitating announcement: VRChat releasing a beta for an in-game real-money store.

Paid Subscriptions: Now in Open Beta! — VRChat Over the last few years, we’ve talked about introducing something we’ve called the “Creator Economy,” and we’re finally ready to reveal what the first step of that effort is going to look like: Paid Subscriptions!

As it stands now, creators within VRChat have to jump through a series of complicated, frustrating hoops if they want to make money from their creations. For creators, this means having to set up a veritable Rube Goldberg machine, often requiring multiple external platforms and a lot of jank. For supporters, it means having to sign up for those same platforms… and then hope that the creator you’re trying to support set everything up correctly.

(The problem, of course, is that “frustrating jank” was designed by VRChat, and their “solution” is rentiering.)

Currently, the only thing to purchase is nebulous “subscriptions” that would map to different world or avatar features depending on the content. But more importantly, this creates a virtual in-game currency, and opens the door to future transaction opportunities. I’m especially thinking of something like an avatar store.

I quit playing VRChat two years ago, when they started to crack down on client-side modifications (which are good) by force-installing malware (which is bad) on players’ computers. Since then I’ve actually had a draft sitting somewhere about software architecture in general, and how you to evaluate whether it’s safe or a trap. And, how just by looking at the way VRChat is designed, you can tell it’s a trap they’re trying to spring on people.

The Store of Tomorrow

Currently, the VRC Creator Economy is just a currency store and a developer api. Prior to this, there was no way for mapmakers to “charge users” for individual features; code is sandboxed, and you only know what VRC tells you, so you can’t just check against Patreon from within the game¹.

But the real jackpot for VRC is an avatar store. Currently, the real VRC economy works by creators creating avatars, maps, and other assets in the (mostly-)interchangeable Unity format, and then selling those to people. Most commonly this is seen in selling avatars, avatar templates, or custom commissioned avatars. Users buy these assets peer-to-peer.

This is the crucial point: individuals cannot get any content in the game without going through VRC. When you play VRChat, all content is streamed from VRChat’s servers anonymously by the proprietary client. There are no URLs, no files, no addressable content of any kind. (In fact, in the edge cases where avatars are discretely stored in files, in the cache, users get angry because of theft!) VRChat isn’t a layer over an open protocol, it’s its own closed system. Even with platforms like Twitter, at least there are files somewhere. But VRChat attacks the entire concept of files, structurally. The user knows nothing and trusts the server, end of story.

How Nintendo Misuses Copyright

43 min read (52 min w/ quotes)
Tagged: #ip #feudalism #enforcement #buying-the-euphemism #Nintendo #fanwork

Posted Tue Nov 21, 2023 in politics

When I’m looking for an example of copyright abuse, I find myself returning to Nintendo a lot on this blog. Nintendo is a combination hardware/software/media franchise company, so they fit a lot of niches. They’re a particularly useful when talking about IP because the “big N” is both very familiar to people and also egregiously bad offenders, especially given their “friendly” reputation.

Nintendo has constructed a reputation for itself as a “good” games company that still makes genuinely fun games with “heart”. Yet it’s also infamously aggressive in executing “takedowns”: asserting property ownership of creative works other people own and which Nintendo did not make.

You’d think a company like Nintendo — an art creation studio in the business of making and selling creative works — would be proponents of real, strong, immutable creative rights. That, as creators, they’d want the sturdiest copyright system possible, not one compromised (or that could be compromised) to serve the interests of any one particular party. This should be especially true for Nintendo even compared to other studios, given Nintendo’s own fight-for-its-life against Universal, its youth, and its relatively small position¹ in the market compared to its entertainment competitors Disney, Sony, and Microsoft.

But no, Nintendo takes the opposite position. When it comes to copyright, they pretty much exclusively try to compromise it in the hopes that a broken, askew system will end up unfairly favoring them. And so they attack the principles of copyright, viciously, again and again, convinced that the more broken the system is, the more they stand to profit.

Introducing Nintendo

Nintendo, even compared to its corporate contemporaries, has a distinctly hostile philosophy around art: if they can’t control something themselves, they tend to try to eliminate it entirely. What Nintendo uses creative rights to protect is not the copyright of their real creative works, it’s their control over everything they perceive to be their “share” of the gaming industry.

Let me start with a quick history, in case you’re not familiar with the foundation Nintendo is standing on.

Nintendo got its start in Japan making playing cards for the mob to commit crimes with. It only pivoted to “video games” after manufacturing playing cards for the Yakuza to use for illegal gambling dens. (I know it sounds ridiculous, but that’s literally what happened.)

Nintendo got its footing overseas by looking to see what video game was making the most money in America, seeing it was Space Invaders, and copying that verbatim with a clone game they called “Radar Scope”:

Then, when that was a commercial failure, they wrote “conversion kit” code to turn those cabinets into a Popeye game, failed to get the Popeye rights they needed, and released it anyway. They kept the gameplay and even the character archetypes the same, they just reskinned it with King Kong. They didn’t even name the protagonist after they swapped out the Popeye idea, so he was just called Jumpman.

Popeye/Donkey Kong comparison

But then Nintendo was almost itself the victim of an abuse of IP law. “Donkey Kong” derived from King Kong, and even though the character was in the public domain, Universal Studios still sued Nintendo over the use. Ultimately the judge agreed with the Nintendo team and threw out the lawsuit, in an example of a giant corporation trying to steamroll what was at the time a small business with over-aggressive and illegitimate IP enforcement.

This was such an impactful moment for Nintendo that they took the name of their lawyer in the Universal Studios case — Kirby — and used it for the mascot of one of their biggest franchises. It was a significant move that demonstrates Nintendo’s extreme gratefulness — or even idolization — of the man who defended them against abuse of IP law.

You would hope the lesson Nintendo learned here would be from the perspective of the underdog, seeing as they were almost the victim of the kinds of tactics they would later become famous for using themselves. But no, it seems they were impressed by the ruthlessness of the abusers instead, and so copied their playbook.

Apple's Trademark Exploit

10 min read
Tagged: #ip #enforcement #big-tech #antitrust #buying-the-euphemism #feudalism

Posted Thu Oct 26, 2023 in politics

Apple puts its logo on the devices it sells. Not just on the outer casing, but also each internal component. The vast majority of these logos are totally enclosed and invisible to the naked eye. This seems like an incredibly strange practice — especially since Apple doesn’t sell these parts separately — except it turns out to be part of a truly convoluted rules-lawyering exploit only a company like Apple could pull off and get away with.

Remember, trademarks are a consumer protection measure to defend against counterfeits. Apple’s registered logo trademark protects consumers from being tricked into buying fake products, and deputizes Apple to defend its mark against counterfeits.

But Apple has perfected the art of twisting this system to use it as a weapon against their opponents, and it is a nightmare. (And I don’t just mean Apple asserting a monopoly over the concept of fruit, although it does do that also, all the time.)

The Loaded Gun

While some counterfeiting happens domestically the major concern is usually counterfeits imported from foreign trade. This brings us to Customs and Border Patrol, which you might know as the other side of the ICE/CBP border control system. You might be surprised to see them involved with this, since Border Patrol agents are fully-militarized police outfitted to combat armed drug cartels.

But among its other duties, Border Patrol takes a proactive role in enforcing intellectual property protection at ports of trade — backed by the full force of the Department of Homeland Security — by seizing goods it identifies as counterfeit and either destroying them outright or else selling them themselves at auction.¹ To get your property back, you have to sue Border Patrol — an infamously untouchable police force — and win.

So you want to write an AI art license

6 min read
Tagged: #ai #ip #technical #enforcement #plagiarism #publication

Posted Sat Apr 08, 2023 in cyber

Hi, The EFF, Creative Commons, Wikimedia, World Leaders, and whoever else,

Do you want to write a license for machine vision models and AI-generated images, but you’re tired of listening to lawyers, legal scholars, intellectual property experts, media rightsholders, or even just people who use any of the tools in question even occasionally?

You need a real expert: me, a guy whose entire set of relevant qualifications is that he owns a domain name. Don’t worry, here’s how you do it:

Given our current system of how AI models are trained and how people can use them to generate new art, which is this:

Ethical Source is a Crock of Hot Garbage

15 min read (24 min w/ quotes)
Tagged: #culture-war #alt-right #tech-culture #platforms #big-tech #enforcement

Posted Fri Oct 29, 2021 in cyber

There’s this popular description of someone “having brain worms”. It invokes the idea of having your mind so thoroughly infested with an idea to the point of disease. As with the host of an infestation, such a mind is poor-to-worthless at any activity other than sustaining and spreading the parasite.

A “persistent delusion or obsession”. You know, like when you think in terms of legality so much you can’t even make ethical evaluations anymore, or when you like cops so much you stop being able to think about statistics, or the silicon valley startup people who try to solve social problems with bad technology, or the bitcoin people who responded to the crisis in Afghanistan by saying they should just adopt bitcoin. “Bad, dumb things”. You get the idea.

And, well.

Mastodon@joinmastodon
Think of it this way, if *he* manages to use Mastodon, you have no excuse saying it's too complicated
9:30 AM - 21 Oct 2021

Coraline Ada Ehmke@CoralineAda
Or this way: if *he* manages to use Mastodon, how can you possibly ensure anyone's safety?
Y'all have some very fucked up success criteria. twitter.com/joinmastodon/s…
Fri Oct 22 03:47:26 +0000 2021

Okay, so let’s back way up here, because this is just the tip of the iceberg of a story that needs years of context. I’ll start with the most recent event here, the Mastodon tweet.

The Mastodon Context

The “he” Mastodon is referring to is ex-president-turned-insurrectionist Donald Trump, who, because his fellow-insurrectionist friends and fans are subject to basic moderation policies on most of the internet, decided to start his own social network, “Truth Social”. In contrast to platforms moderated by the “tyranny of big tech”, Truth Social would have principles of Free Speech, like “don’t read the site”, “don’t link to the site”, “don’t criticise the site”, “don’t use all-caps”, and “don’t disparage the site or us”. There are a lot of problems here already, but because everything Trump does is terrible and nobody who likes him can create anything worthwhile, instead of actually making a social networking platform, they just stole Mastodon wholesale.

Mastodon is an open-source alternative social networking platform. It’s licensed under an open license (the AGPLv3), so you are allowed to clone it and even rebrand it for your own purposes as was done here. What you absolutely are not allowed to do is claim the codebase is your own proprietary work, deliberately obscure the changes you made to the codebase, or make any part of the AGPL-licensed codebase (including your modifications) unavailable to the public. All of which Truth Social does.

So that’s the scandal. And so here’s Mastodon poking some fun at that.

GioCities

blogs by Gio

Tagged: enforcement