GioCities

blogs by Gio

Atom feed cyber cyber

cyber So you want to write an AI art license

  • Posted in cyber

Hi, The EFF, Creative Commons, Wikimedia, World Leaders, and whoever else,

Do you want to write a license for machine vision models and AI-generated images, but you’re tired of listening to lawyers, legal scholars, intellectual property experts, media rightsholders, or even just people who use any of the tools in question even occasionally?

You need a real expert: me, a guy whose entire set of relevant qualifications is that he owns a domain name. Don’t worry, here’s how you do it:

This is an extremely condensed set of notes, designed as a high-level overview for thinking about the problem

Given our current system of how AI models are trained and how people can use them to generate new art, which is this:

sequenceDiagram
    Alice->>Model: Hello. Here are N images and<br>text descriptions of what they contain.
    Model->>Model: Training (looks at images, "makes notes", discards originals)
    Model->>Alice: OK. I can try to make similar images from my notes,<br>if you tell me what you want.
    Curio->>Model: Hello. I would like a depiction of this new <br>thing you've never seen before.
    Model->>Curio: OK. Here are some possibilites.

The works🔗

The model and the works produced with the model are both distinct products. The model is more like processing software or tooling, while the artistic works created with the model are distinctly artistic/creative output.

Models do not keep the original images they were trained on in any capacity. The only keep mathematical notes about their properties. You (almost always) cannot retrieve the original image data used from the model after training.

sequenceDiagram
    Curio->>Model: Send me a copy of one of the images you were trained on
    Model->>Curio: Sorry, I do not remember any of them exactly,<br>only general ideas on how to make art.

There is a lot of misinformation about this, but it is simply, literally the case that a model does not include the training material, and cannot reproduce its training material. While not trivial (you can’t have a model if you can’t train it at all), when done properly, the specific training data is effectively incidental.

AI-generated art should be considered new craftsmanship — specifically, under copyright law, it is new creative output with its own protections — and not just a trivial product of its inputs.

Plagiarism🔗

The fact that AI art is new creative output doesn’t mean AI art can’t be plagiarism.

Just like with traditional art, it’s completely possible for specific products to be produced to be copies, but that doesn’t make that the case for all works in the medium. You can trace someone else’s artwork, but that doesn’t make all sketches automatically meritless works.

The inner workings of tools used in the creation of an artistic work are not what determines if a given product is plagiarism, or if it infringes on a copyright. Understanding the workings of the tool can be used in determining if a work is an infringement, but it is not the deciding factor.

To use a trivial example, if I copy an image to use in an advertisement, the copyright violation is in the use of the material, and the fact that the material is, in practice, a replica of existing copyrighted work. The “copy” program isn’t the infringement, it just informs our understanding of the infringement. Monkeys on typewriters can make something that infringes copyright too.

Is using an AI model as a step in the artistic process prima facie sufficient evidence that any work generated by it is an infringement of someone else’s copyright? The answer — based on an understanding of the tools and the range of the output space — is no.

Like all new and more efficient tools, AI art tools can be used to efficiently create new work more efficiently or copy old work efficiently. Both of those cases worry certain groups, but the fact is the technology can both create new work and copy existing work.

Don’t break everything🔗

It would be monumentally terrible for the general “right for someone to use their experience of a published work” to be codified as an idiosyncratic property right that is assumed to be reserved to the copyright holder unless they specifically license it out.

Using “an experience of a published work of art to infer what art looks like” is exactly how the AI model training that people are worried about works, and that model training runs as a user-agent, so an attempt to differentiate “tool-assisted learning” from “unassisted human learning” is also a dangerous avenue. (I reject the idea that there is a meaningful distinction between “natural” and “technologically assisted” human action, in favour of network theory.)

Creating implicit or explicit “style rights” that would give artists/companies/rightsholders legal leverage against people (AI assisted or otherwise) who make works that “feel similar”, even if aspects like the subject are materially different from anything the rightsholder has copyright to, is an even-worse-but-still monumentally terrible idea.

Possibly good goals🔗

So what do actual AI artists (like the fine folks over at the AWAY collective) want to see in copyright? I think the following are safe to describe as goals:

  • Ensuring that artists — both “traditional” and tool-assisted — are free to create and share their work without endangering themselves in the process.
  • Preventing the mass-replacement of traditional artists with systems that output cheap, mass-produced works, especially if those works are derived in part from the artists this system harms.
  • Preventing a fear-induced expansion of copyright that creates new rights that ultimately only benefit corporations that stockpile the new rights and use them against artists, the way music sampling rights work today.

These seem at odds with each other.🔗

How can you retain meaningful control over your work if making it publicly visible on the internet grants corporations rights over most of its value? How can copyright distinguish between what we consider “constructive” educational use of public information (human education, as the most trivial example) and uses we would see as exploitative, like training an AI on the works made by a particular author in order to produce facsimiles of their work without compensating the original artist?

I believe mass and corporate use of AI-generated work exploiting the creative output of humans is a real danger in a way that individual artists using AI for individual works isn’t. But how do we make that distinction in a meaningful way within the framework of copyright? What, specifically, is the distinction that makes the former a serious threat to the wellbeing of both real humans and the creative market, but the latter actively beneficial to the artistic community?

The distinction cannot simply be “commercial” use, because restrictions on commercial use penalize the independent artist as much as the would-be exploiter. An artist (again, tool-assisted or human) needs to retain creative rights over their work and be able to sell it without being permanently indentured to their educators.

Nor should it be based on some arbitrary threshold like the income of the artist, or their incorporation status. Those are empty distinctions; that’s fitting the available data points to the “model” of how I feel the world should look instead of drilling down and finding what the real distinction is.

This is a hard problem, and not one I’ve solved (yet). The above are some thoughts I’ve been chewing on — I have another article I’m working on where I go into more detail on that. But there are some moves that seem like clear steps in the right direction, like licenses incorporating Creative Commons-style share-alike principles.

Possibly good ideas🔗

Licensing models (“understandings” of art) with a requirement that art generated using that model must be attributed back to the model (and, transitively, the model’s source information) is probably a good idea and something that people (model-creators) should be able to do if they want.

Another licensing requirement that makes for a CC-type AI work is applying the principle of share-alike to the prompt settings: you could license a model such that works generated with the model must be shared with both a reference to the model and the prompt/settings used in creation (usually about a sentence of plain text).

This would not allow people to scientifically recreate exactly the same output, but it is a significant step towards identifying which source images in the data set used to train the model impacted the final product.

This “prompt sharing” is a thing AI artists are already doing, with the explicit intent of sharing insight into their work and making it easier to build on creatively; so this would not be a new invention of a license, but rather a codification of what is already the best practice for knowledge sharing.

Derivative models🔗

It is also possible to create models by merging/processing existing models instead of images.

sequenceDiagram
    Alice->>Model: Hello. Here are N models, instead of images.

The share-alike principle should apply here. CC-ish licensed models should require that any models made from it is licensed under the same license (or one more permissive) to ensure the work is shared-alike and to prevent trivial laundering.

Other interfaces, tooling🔗

There is also software that provides an interface to an existing model so people can more easily use them. These can range from anything from scratch python code to Google Colab notebooks to polished mobile apps.

There isn’t anything much novel about them, from a copyright perspective: they’re pieces of interface software, and shouldn’t have much to do with the copyright status of the models they use or the outputs they generate unless they’re actively violating an existing license.

horizontal rule

cyber Replika: Your Money or Your Wife

  • Posted in cyber

If1 you’ve been subjected to advertisements on the internet sometime in the past year, you might have seen advertisements for the app Replika. It’s a chatbot app, but personalized, and designed to be a friend that you form a relationship with.

That’s not why you’d remember the advertisements though. You’d remember the advertisements because they were like this:

Replika "Create your own AI friend" "I've been missing you" hero ad

Replika ERP ad, Facebook (puzzle piece meme) Replika ERP ad, Instagram

And, despite these being mobile app ads (and, frankly, really poorly-constructed ones at that) the ERP function was a runaway success. According to founder Eugenia Kuyda the majority of Replika subscribers had a romantic relationship with their “rep”, and accounts point to those relationships getting as explicit as their participants wanted to go:

erp1

So it’s probably not a stretch of the imagination to think this whole product was a ticking time bomb. And — on Valentine’s day, no less — that bomb went off. Not in the form of a rape or a suicide or a manifesto pointing to Replika, but in a form much more dangerous: a quiet change in corporate policy.

Features started quietly breaking as early as January, and the whispers sounded bad for ERP, but the final nail in the coffin was the official statement from founder Eugenia Kuyda:

“update” - Kuyda, Feb 12 These filters are here to stay and are necessary to ensure that Replika remains a safe and secure platform for everyone.

I started Replika with a mission to create a friend for everyone, a 24/7 companion that is non-judgmental and helps people feel better. I believe that this can only be achieved by prioritizing safety and creating a secure user experience, and it’s impossible to do so while also allowing access to unfiltered models.

People just had their girlfriends killed off by policy. Things got real bad. The Replika community exploded in rage and disappointment, and for weeks the pinned post on the Replika subreddit was a collection of mental health resources including a suicide hotline.

Resources if you're struggling post

Cringe!🔗

First, let me deal with the elephant in the room: no longer being able to sext a chatbot sounds like an incredibly trivial thing to be upset about, and might even be a step in the right direction. But these factors are actually what make this story so dangerous.

These unserious, “trivial” scenarios are where new dangers edge in first. Destructive policy is never just implemented in serious situations that disadvantage relatable people first, it’s always normalized by starting with edge cases and people who can be framed as Other, or somehow deviant.

It’s easy to mock the customers who were hurt here. What kind of loser develops an emotional dependency on an erotic chatbot? First, having read accounts, it turns out the answer to that question is everyone. But this is a product that’s targeted at and specifically addresses the needs of people who are lonely and thus specifically emotionally vulnerable, which should make it worse to inflict suffering on them and endanger their mental health, not somehow funny. Nothing I have to content-warning the way I did this post is funny.

Virtual pets🔗

So how do we actually categorize what a replika is, given what a novel thing it is? What is a personalized companion AI? I argue they’re pets.

cyber Lies, Damned Lies, and Subscriptions

  • Posted in cyber

Everybody hates paying subscription fees. At this point most of us have figured out that recurring fees are miserable. Worse, they usually seem unfair and exploitative. We’re right about that much, but it’s worth sitting down and thinking through the details, because understanding the exceptions teaches us what the problem really is. And it isn’t just “paying people money means less money for me”; the problem is fundamental to what “payment” even is, and vitally important to understand.

Human Agency: Why Property is Good🔗

or, “Gio is not a marxist, or if he is he’s a very bad one”

First: individual autonomy — our agency, our independence, and our right to make our own choices about our own lives — is threatened by the current digital ecosystem. Our tools are powered by software, controlled by software, and inseparable from their software, and so the companies that control that software have a degree of control over us proportional to how much of our lives relies on software. That’s an ever-increasing share.

cyber The Failure of Account Verification

  • Posted in cyber

The “blue check” — a silly colloquialism for an icon that’s not actually blue for the at least 50% of users using dark mode — has become a core aspect of the Twitter experience. It’s caught on other places too; YouTube and Twitch have both borrowed elements from it. It seems like it should be simple. It’s a binary badge; some users have it and others don’t. And the users who have it are designated as… something.

In reality it’s massively confused. The first problem is that “something”: it’s fundamentally unclear what the significance of verification is. What does it mean? What are the criteria for getting it? It’s totally opaque who actually makes the decision and what that process looks like. And what does “the algorithm” think about it; what effects does it actually have on your account’s discoverability?

This mess is due to a number of fundamental issues, but the biggest one is Twitter’s overloading the symbol with many conflicting meanings, resulting in a complete failure to convey anything useful.

xkcd twitter_verification

History of twitter verification🔗

Twitter first introduced verification in 2009, when baseball man Tony La Russa sued Twitter for letting someone set up a parody account using his name. It was a frivolous lawsuit by a frivolous man who has since decided he’s happy using Twitter to market himself, but Twitter used the attention to announce their own approach to combating impersonation on Twitter: Verified accounts.

cyber You can Google it

  • Posted in cyber

The other day I had a quick medical question (“if I don’t rinse my mouth out enough at night will I die”), so I googled the topic as I was going to bed. Google showed a couple search results, but it also showed Answers in a little dedicated capsule. This was right on the heels of the Yahoo Answers shutdown, so I poked around to see what Google’s answers were like. And those… went in an unexpected direction.

Should I rince my mouth after using mouthwash? Why is it bad to swallow blood? Can a fly live in your body? What do vampires hate? Can you become a vampire? How do you kill a vampire?

So, Google went down a little rabbit trail. Obviously these answers were scraped from the web, and included sources like exemplore.com/paranormal/ which is, apparently, a Wiccan resource for information that is “astrological, metaphysical, or paranormal in nature.” So possibly not the best place to go for medical advice. (If you missed it, the context clue for that one was the guide on vampire killing.)

There are lots of funny little stories like this where some AI misunderstood a question. Like this case where a porn parody got mixed in the bio for a fictional character, or that time novelist John Boyne used Google and accidently wrote a video recipe into his book. (And yes, it was a Google snippet.) These are always good for a laugh.

Wait, what’s that? That last one wasn’t funny, you say? Did we just run face-first toward the cold brick wall of reality, where bad information means people die?

Well, sorry. Because it’s not the first time Google gave out fatal advice, nor the last. Nor is there any end in sight. Whoops!

cyber Client CSAM scanning: a disaster already

  • Posted in cyber

On August 5, 2021, Apple presented their grand new Child Safety plan. They promised “expanded protections for children” by way of a new system of global phone surveillance, where every iPhone would constantly scan all your photos and sometimes forward them to local law enforcement if it identifies one as containing contraband. Yes, really.

August 5 was a Thursday. This wasn’t dumped on a Friday night in order to avoid scrutiny, this was published with fanfare. Apple really thought they had a great idea here and expected to be applauded for it. They really, really didn’t. There are almost too many reasons this is a terrible idea to count. But people still try things like this, so as much as I wish it were, my work is not done. God has cursed me for my hubris, et cetera. Let’s go all the way through this, yet again.

The architectural problem this is trying to solve🔗

Believe it or not, Apple actually does address a real architectural issue here. Half-heartedly addressing one architectural problem of many doesn’t mean your product is good, or even remotely okay, but they do at least do it. Apple published a 14 page summary of the problem model (starting on page 5). It’s a good read if you’re interested in that kind of thing, but I’ll summarize it here.

cyber Ethical Source is a Crock of Hot Garbage

  • Posted in cyber

There’s this popular description of someone “having brain worms”. It invokes the idea of having your mind so thoroughly infested with an idea to the point of disease. As with the host of an infestation, such a mind is poor-to-worthless at any activity other than sustaining and spreading the parasite.

A “persistent delusion or obsession”. You know, like when you think in terms of legality so much you can’t even make ethical evaluations anymore, or when you like cops so much you stop being able to think about statistics, or the silicon valley startup people who try to solve social problems with bad technology, or the bitcoin people who responded to the crisis in Afghanistan by saying they should just adopt bitcoin. “Bad, dumb things”. You get the idea.

And, well.

Okay, so let’s back way up here, because this is just the tip of the iceberg of a story that needs years of context. I’ll start with the most recent event here, the Mastodon tweet.

The Mastodon Context🔗

The “he” Mastodon is referring to is ex-president-turned-insurrectionist Donald Trump, who, because his fellow-insurrectionist friends and fans are subject to basic moderation policies on most of the internet, decided to start his own social network, “Truth Social”. In contrast to platforms moderated by the “tyranny of big tech”, Truth Social would have principles of Free Speech, like “don’t read the site”, “don’t link to the site”, “don’t criticise the site”, “don’t use all-caps”, and “don’t disparage the site or us”. There are a lot of problems here already, but because everything Trump does is terrible and nobody who likes him can create anything worthwhile, instead of actually making a social networking platform, they just stole Mastodon wholesale.

Mastodon is an open-source alternative social networking platform. It’s licensed under an open license (the AGPLv3), so you are allowed to clone it and even rebrand it for your own purposes as was done here. What you absolutely are not allowed to do is claim the codebase is your own proprietary work, deliberately obscure the changes you made to the codebase, or make any part of the AGPL-licensed codebase (including your modifications) unavailable to the public. All of which Truth Social does.

So that’s the scandal. And so here’s Mastodon poking some fun at that.

cyber Is (git) master a dirty word?

  • Posted in cyber

Git is changing. GitHub, GitLab, and the core git team have a made a system of changes to phase out the use of the word “master” in the development tool, after a few years of heated (heated) discussion. Proponents of the change argue “slavery is bad”, while opponents inevitably end up complaining about the question itself being “overly political”. Mostly. And, with the tendency of people in the computer science demographic to… let’s call it “conservatism”, this is an issue that gets very heated, very quickly. I have… thoughts on this, in both directions.

Formal concerns about problematic terminology in computing (master, slave, blacklist) go back as early as 2003, at the latest; this is not a new conversation. The push for this in git specifically started circa 2020. There was a long thread on the git mailing list that went back and forth for several months with no clear resolution. It cited Python’s choice to move away from master/slave terminology, which was formally decided on as a principle in 2018. In June of 2020, the Software Freedom Conservancy issued an open letter decrying the term “master” as “offensive to some people.” In July 2020 github began constructing guidance to change the default branch name and in 2021 GitLab announced it would do the same.


First, what role did master/slave terminology have in git, anyway? Also, real quick, what’s git? Put very simply, git is change tracking software. Repositories are folders of stuff, and branches are versions of those folders. If you want to make a change, you copy the file, modify it, and slot it back in. Git helps you do that and also does some witchery to allow multiple people to make changes at the same time without breaking things, but that’s not super relevant here.

That master version that changes are based is called the master branch, and is just a branch named master. Changes are made on new branches (that start as copies of the master branch) which can be named anything. When the change is final, it’s merged back into the master branch. Branches are often deleted after they’re merged.

cyber YouTube broke links and other life lessons

  • Posted in cyber

This morning YouTube sent out an announcement that, in one month, they’re going to break all the links to all unlisted videos posted prior to 2017. This is a bad thing. There’s a whole lot bad here, actually.

Edit: Looks like Google is applying similar changes to Google Drive, too, meaning this doesn’t just apply to videos, but to any publicly shared file link using Google Drive. As of next month, every public Google Drive link will stop working unless the files are individually exempted from the new security updates, meaning any unmaintained public files will become permanently inaccessible. Everything in this article still applies, the situation is just much worse than I thought.

The Basics🔗

YouTube has three kinds of videos: Public, Unlisted, and Private. Public videos are the standard videos that show up in searches. Private videos are protected, and can only be seen by specific YouTube accounts you explicitly invite. Unlisted videos are simply unlisted: anyone with the link can view, but the video doesn’t turn up automatically in search results.

Unlisted videos are obviously great, for a lot of reasons. You can just upload videos to YouTube and share them with relevant communities — embed them on your pages, maybe — without worrying about all the baggage of YouTube as a Platform.

What Google is trying to do here is roll out improvements they made to the unlisted URL generation system to make it harder for bots and scrapers to index videos people meant to be semi-private. This is a good thing. The way they’re doing it breaks every link to the vast majority of unlisted videos, including shared links and webpage embeds. This is a tremendously bad thing. I am not the first to notice this.

cyber Twitter Blue is a late-stage symptom

  • Posted in cyber

Twitter Blue! $5/mo for Premium Twitter. It’s the latest thing that simply everyone.

News articles about twitter blue

I have an issue with it, but over a very fundamental point, and one Twitter shares with a lot of other platforms. So here’s why it’s bad that Twitter decided to put accessibility features behind a paywall, and it isn’t the obvious.

Client/Server architecture in 5 seconds🔗

All web services, Twitter included, aren’t just one big magic thing. You can model how web apps work as two broad categories: the client and the server. The client handles all your input and output: posts you make, posts you see, things you can do. The server handles most of the real logic: what information gets sent to the client, how posts are stored, who is allowed to log in as what accounts, etc.

cyber How Apple Destroyed Mobile Freeware

  • Posted in cyber

I have a memory from when I was very young of my dad doing the finances. He would sit in his office with a computer on one side and an old-fashioned adding machine on the desk. While he worked on the spreadsheet on the computer, he would use the adding machine for quick calculations.

Adding machine

A year or two ago I had a very similar experience. I walked upstairs to the office and there he was, at the same desk, spreadsheet on one side and calculator on the other. Except it was 2020, and he had long ago replaced the adding machine with an iPad. There was really one noticeable difference between the iPad and the old adding machine: the iPad was awful at the job. My dad was using some random calculator app that was an awkwardly scaled iPhone app with an ugly flashing banner add at the bottom.