Atom feed Category: cyber

Identity Verification is as Bad as It Can Be

  • Posted in cyber

This is an addendum to OS-Level Age Attestation is the Good One, where I talk about the potential of legal standards for age attestation as an alternative to age verification. Not already convinced of the dangers of age verification? The extent of the evil waiting behind identification systems and deanonymization is unspeakably vast, and fortunately it’s getting extensive coverage. Here’s a quick look to get you up to speed.

Direct digital censorship

A lot of the energy behind age verification comes from authoritarians eager to censor political dissent, promote propaganda and retaliate against critics. This is a power grab, with bills designed to seize power over specific content the government objects to:

Governments are, of course, trying to claim control over “public discourse”. Like all seizing of arbitrary power, the risks associated with this are volatile and unbounded, because they depend on who holds power at any given moment in a political system where power is expected to rotate.

Discord

As a case study, let’s take a look at one of the latest major services to attempt age verification: Discord. At time of writing, Discord is in the process of trying to switch to a “Teen Default” system, where every user is assumed to be a minor unless they can prove their age to Discord. Discord is a communications platform used widely by adults, and during COVID Discord very intentionally expanded their market domain beyond gaming to focus on being a global platform, so the assumption that all spaces are for kids is clearly incorrect.1 But Discord is sometimes used by children, and since it’s a communications platform people can use it to communicate horrible things. Boomers have learned they can be insane about this, so Discord is under significant pressure to balance its goal of being a universal communications platform with child safety.

OS-Level Age Attestation is the Good One

  • Posted in cyber

There’s a coordinated effort to use the “child safety” euphemism to cripple the internet with identity verification mandates. That’s bad. But buried in the mix there’s a genuinely good idea with enough political capital that it might stick around and do some good.

Every time I’ve tried to write an article on the topic of child internet safety my energy has fizzled into depression, because as one researches the topic it becomes obvious that everyone with any relevant power is refusing to solve the problem on purpose. It’s demoralizing and it’s been mostly useless for me to do any thought work in this area.

But California’s age attestation bill might be an exception to this. Because it’s age attestation, not age verification, it looks like a significant political step in the right direction, and with the right focus it could do a lot of good. A lot of people have (fairly!) assumed attestation was age verification or at least lays the groundwork, but I think this isn’t the case. There is always the danger of future bad legislation, but OS attestation doesn’t pave the way for it, it provides a strong defense against it. We need a good idea to win the child safety war, not because we’re in dire need of more online child safety, but because addressing the real concerns correctly blocks a whole slew of impossibly dangerous policies.

My ideal age filtering tool is a system of client attestation with trust rooted in the adult administrator, provided by an OS-level API provided as preemptive verification, enforced by compliant browsers and application stores. And we’re shockingly close to that.

There is room for improvement

People on the privacy side of the age verification war — my side — will argue that parents already have everything they need for comprehensive web filtering if they want to use it. I think this isn’t quite true; there’s one notable architectural gap that a technical solution could meaningfully fill.

There are many existing content filtering tools geared toward child safety but their weakness is that they’re reactive. Traffic filters can identify and block traffic from known websites and on-device content filters can try to detect and block specific content. But this requires the user reacting and defending against every possible source and behavior. It’s the same cat-and-mouse game as adblockers. And like adblockers, the more closed down the system is — like iOS or gaming consoles — the harder it is for developers to make exactly the right product.

The internet sometimes assumes minors are supervised — since they have parental consent to have the device in the first place — but this often isn’t the case. It’s very common for minors to have their own phones or tablets with unsupervised access. When they’re online or downloading apps, they’re not sitting with a parent, they’re unsupervised, roaming children. Parents are dropping their kids off in the city.

This isn’t inherently bad; it seems like parents and children both want children to be able to exist independently without granular supervision, and so there’s a desire to make that situation safer. That shouldn’t come at the cost of any adult liberty or even the liberty of children with parental consent; it just means we want an ecosystem that allows for unsupervised children to exist within it.

Right now the burden is on parents to be active defenders protecting their children from a vast ecosystem of companies investing research and capital into optimizing how efficiently they can exploit money and data out of everyone in the world. It would be a meaningful improvement if there were a safe way to prevent some of this exploitation by putting reasonable requirements on providers, so long as this can be done in a way that doesn’t cause more problems.

Political pressure for “child safety” is exploitable

But the lack of a perfect parental control system isn’t the main problem here. The real danger is the push for online identity verification using child safety as a justification.

Smart and privacy conscious people demand “No age verification” (quite reasonably!), but that doesn’t offer the quick fix people are looking for. More importantly, it doesn’t relieve the political pressure and so doesn’t take away the excuses of tyrants.

Normally “do nothing” would be the safest option here, but the danger of uninformed and reactionary voters means there is a great deal to gain by satisfying the concerns safely instead of letting the solution be evil. A technical standard for parents to somehow identify their children as children is the relief valve for dangerous political pressure. This doesn’t appease the fascists and censors. This doesn’t cede them any ground and it’d be wrong to try to; there’s no satisfying that hunger and it’s a dangerous mistake to feed it. What it does is actually improve the material conditions for the people they’re trying to trick.

A proactive system that puts some of the burden for protecting children on those companies is a real relief to this, and it would be a meaningful improvement if something could address this without causing bigger problems.

Taxonomy

There are three basic categories of age filtering: nothing, client attestation, and client verification. These provide services varying levels of confidence in their knowledge of users. (It’s tempting to simplify confidence to labels like “strong” or “weak” but it’s important to think about what’s actually being secured, and from who.) Different people call these different things, but here’s my taxonomy with the labels I’ll use.

A Hack is Not Enough

  • Posted in cyber

Recently we’ve seen sweeping attempts to censor the internet. The UK’s “Online Safety Act” imposes sweeping restrictions on speech and expression. It’s disguised a child safety measure, but its true purpose is (avowedly!) intentional control over “services that have a significant influence over public discourse”. And similar trends threaten the US, especially as lawmakers race to more aggressively categorize more speech as broadly harmful.

A common response to these restrictions has been to dismiss them as unenforceable: that’s not how the internet works, governments are foolish for thinking they can do this, and you can just use a VPN to get around crude attempts at content blocking.

But this “just use a workaround” dismissal is a dangerous, reductive mistake. Even if you can easily defeat an attempt to impose a restriction right now, you can’t take that for granted.

Dismissing technical restrictions as unenforceable

There is a tendency, especially among technically competent people, to use the ability to work around a requirement as an excuse to avoid dealing with it. When there is a political push to enforce a particular pattern of behavior — discourage or ban something, or make something socially unacceptable — there is an instinct for clever people with workarounds to respond with “you can just use my workaround”.

I see this a lot, in a lot of different forms:

  • “Geographic restrictions don’t matter, just use a VPN.”
  • “Media preservation by the industry doesn’t matter, just use pirated copies.”
  • “The application removing this feature doesn’t matter, just use this tool to do it for you.”
  • “Don’t pay for this feature, you can just do it yourself for free.1
  • “It’s “inevitable” that people will use their technology as they please regardless of the EULA.”
  • “Issues with digital ownership? Doesn’t affect me, I just pirate.”

Why training AI can't be IP theft

  • Posted in cyber

AI is a huge subject, so it’s hard to boil my thoughts down into any single digestible take. That’s probably a good thing. As a rule, if you can fit your understanding of something complex into a tweet, you’re usually wrong. So I’m continuing to divide and conquer here, eat the elephant one bite at a time, etc.

Right now I want to address one specific question: whether people have the right to train AI in the first place. The argument that they do not1 goes like this:

When a corporation trains generative AI they have unfairly used other people’s work without consent or compensation to create a new product they own. Worse, the new product directly competes with the original workers. Since the corporations didn’t own the original material and weren’t granted any specific rights to use it for training, they did not have the right to train with it. When the work was published, there was no expectation it would be used like this, as the technology didn’t exist and people did not even consider “training” as a possibility. Ultimately, the material is copyrighted, and this action violates the authors’ copyright.

I have spent a lot of time thinking about this argument and its implications. Unfortunately, even though I think that while this identifies a legitimate complaint, the argument is dangerously wrong, and the consequences of acting on it (especially enforcing a new IP right) would be disastrous. Let me work through why:

The complaint is real

Artists wanting to use copyright to limit the “right to train” isn’t the right approach, but not because their complaint isn’t valid. Sometimes a course of action is bad because the goal is bad, but in this case I think people making this complaint are trying to address a real problem.

I agree that the dynamic of corporations making for-profit tools using previously published material to directly compete with the original authors, especially when that work was published freely, is “bad.” This is also a real thing companies want to do. Replacing labor that has to be paid wages with capital that can be owned outright increases profits, which is every company’s purpose. And there’s certainly a push right now to do this. For owners and executives production without workers has always been the dream. But even though it’s economically incentivized for corporations, the wholesale replacement of human work in creative industries would be disastrous for art, artists, and society as a whole.

So there’s a fine line to walk here, because I don’t want to dismiss the fear. The problem is real and the emotions are valid, but that doesn’t mean none of the reactions are reactionary and dangerous. And the idea that corporations training on material is copyright infringement is just that.

The learning rights approach

So let me focus in on the idea that one needs to license a “right to train”, especially for training that uses copyrighted work. Although I’m ultimately going to argue against it, I think this is a reasonable first thought. It’s also a very serious proposal that’s actively being argued for in significant forums.

Copyright isn’t a stupid first thought. Copyright (or creative rights in general) intuitively seems like the relevant mechanism for protecting work from unauthorized uses and plagiarism, since the AI models are trained using copyrighted work that is licensed for public viewing but not for commercial use. Fundamentally, the thing copyright is “for” is making sure artists are paid for their work.

This was one of my first thoughts too. Looking at the inputs and outputs, as well as the overall dynamic of unfair exploitation of creative work, “copyright violation” is a good place to start. I even have a draft article where I was going to argue for this same point myself. But as I’ve thought through the problem further, that logic breaks down. And the more I work through it, every IP-based argument I’ve seen to try to support artists has massively harmful implications that make the cure worse than the disease.

Definition, proposals, assertions

The idea of a learning right is this: in addition to the traditional reproduction right copyright reserves to the author, authors should be able to prevent people from training AI on their work by withholding the right.

This learning right would be parallel to other reservable rights, like reproduction: it could be denied outright, or licensed separately from both viewing and reproduction rights at the discretion of the rightsholder. Material could be published such that people were freely able to view it but not able to use it as part of a process that would eventually create new work, including training AI. The mechanical ability to train data is not severable from the ability to view it, but the legal right would be.

This is already being widely discussed in various forms, usually as a theory of legal interpretation or a proposal for new policy.

Asserting this right already exists

Typically, when the learning rights theory is seen in the wild it’s being pushed by copyright rightsholders who are asserting that the right to restrict others from training on their works already exists.

A prime example of this is the book publishing company Penguin Random House, which asserts that the right to train an AI from a work is already a right that they can reserve:

Penguin Random House Copyright Statement (Oct 2024) No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems. In accordance with Article 4(3) of the Digital Single Market Directive 2019/790, Penguin Random House expressly reserves this work from the text and data mining exception.

In the same story, the Society of Authors explicitly affirms the idea that AI training cannot be done without a license, especially if that right is explicitly claimed:

Is AI eating all the energy? Part 2/2

  • Posted in cyber

Part 2: Growth, Waste, and Externalities

The AI tools are efficient according to the numbers, but unfortunately that doesn’t mean there isn’t a power problem. If we look at the overall effects in terms of power usage (as most people do), there are some major problems. But if we’ve ruled out operational inefficiency as the reason, what’s left?

The energy problems aren’t coming from inefficient technology, they’re coming from inefficient economics. For the most part, the energy issues are caused by the AI “arms race” and how irresponsibly corporations are pushing their AI products on the market. Even with operational efficiency ruled out as a cause, AI is causing two killer energy problems: waste and externalities.

Is AI eating all the energy? Part 1/2

  • Posted in cyber

Recent tech trends have followed a pattern of being huge society-disrupting systems that people don’t actually want. Worse, it then turns out there’s some reason they’re not just useless, they’re actively harmful. While planned obsolescence means this applies to consumer products in general, the recent major tech fad hypes — cryptocurrency, “the metaverse”, artificial intelligence… — all seem to be comically expensive boondoggles that only really benefit the salesmen.

simpsons monorail screencap Monorail!

The most recent tech-fad-and-why-it’s-bad pairing seems to be AI and its energy use. This product-problem combo has hit the mainstream as an evocative illustration of waste, with headlines like Google AI Uses Enough Electricity In 1 Second To Charge 7 Electric Cars and ChatGPT requires 15 times more energy than a traditional web search.

It’s a narrative that’s very much in line with what a disillusioned tech consumer expects. There is a justified resentment boiling for big tech companies right now, and AI seems to slot in as another step in the wrong direction. The latest tech push isn’t just capital trying to control the world with a product people don’t want, it’s burning through the planet to do it.

But, when it comes to AI, is that actually the case?

What are the actual ramifications of the explosive growth of AI when it comes to power consumption? How much more expensive is it to run an AI model than to use the next-best method? Do we have the resources to switch to using AI on things we weren’t before, and is it responsible to use them for that? Is it worth it?

These are really worthwhile questions, and I don’t think the answers are as easy as “it’s enough like the last thing that we might as well hate it too.” There are proportional costs we have to weigh in order to make a well-grounded judgement, and after looking at them, I think the energy numbers are surprisingly good, compared to the discourse.

Fake Twitter accounts

  • Posted in cyber

Remember when Elon Musk was trying to weasel out of overpaying for Twitter? During this very specific May 2022-Jul 2022 period, there was a very artificial discourse manufactured over the problem of “fake accounts” on Twitter.

The reason it was being brought up was very stupid, but the topic stuck with me, because it’s deeply interesting in a way that the conversation at the time never really addressed.

So this is a ramble on it. I think this is all really worth thinking about, just don’t get your hopes up that it’s building to a carefully-constructed conclusion. ;)

Argument is stupid

First, to be clear, what was actually being argued at the time was exceedingly stupid. I’m not giving that any credit.

After committing to significantly overpay to purchase Twitter with no requirements that they do due diligence (yes, really!) Elon Musk tried to call off the deal.

This was a pretty transparent attempt to get out of the purchase agreement after manipulating the price, and it was correctly and widely reported as such.

Scott Nover, “Inside Elon Musk’s legal strategy for ditching his Twitter deal”

Elon Musk has buyer’s remorse. On April 25, the billionaire Tesla and SpaceX CEO agreed to buy Twitter for $44 billion, but since then the stock market has tanked. Twitter agreed to sell to Musk at $54.20 per share, a 38% premium at the time; today it’s trading around $40.

That’s probably the real reason Musk is spending so much time talking about bots.

I don’t want to get too bogged down in the details of why Elon was using this tactic, but fortunately other people wrote pages and pages about it, so I don’t have to.

Reddit: Your API *IS* Your Product

  • Posted in cyber

Reddit is going the same route as Twitter by making “API access” prohibitively expensive. This is something they very famously, very vocally said they would not do, but they’re doing it anyway. This is very bad for Reddit, but what’s worse is it’s becoming clear that companies think that this is a remotely reasonable thing to do, when it’s very critically not.

It’s the same problem we see with Twitter and other late-capitalist hell websites: Reddit’s product is the service it provides, which is its API. The ability for users to interact with the service isn’t an auxiliary premium extra, it’s the whole caboodle!

I’ll talk about first principles first, and then get into what’s been going on with Reddit and Apollo. The Apollo drama is very useful in that it directly converts the corporate bullshit that sounds technical enough to make sense into something very easy to understand: a corporation hurting them, today, for money.

The API is the product

Reddit and all these other companies who are making user-level API access prohibitively expensive have forgotten that the API is the product. - The API is the interface that lets you perform operations on the site. The operations a user can do are the product, they’re not auxiliary to it!

“Application programming interface” is a very formal, internal-sounding term for a system that is none of those things. The word “programming” in the middle comes from an age where using a personal computer at all was considered “programming” it.

What an API really is a high-level interface to the web application that is Reddit. Every action a user can take — viewing posts, posting, voting, commenting — goes from the app (which interfaces with the user) to the API (which interfaces with the Reddit server), gets processed by the server using whatever-they-use-it-doesn’t-matter, and the response is sent back to the user.

The API isn’t a god mode and it doesn’t provide any super-powers. It doesn’t let you do anything you can’t do as a user, as clearly evidenced by the fact that all the actions you do on the Reddit website go through the API too.

The Reddit website, the official Reddit app, and the Apollo app all interface with the user in different ways and on different platforms, but go through the same API to interact with what we understand as “Reddit”. The fact that the API is the machine interface without the human interface should also concisely explain why “API access” is all Apollo needs to build its own app.

Right now, you can view the announcement thread at https://www.reddit.com/r/apolloapp/comments/144f6xm/apollo_will_close_down_on_june_30th_reddits/, and you can view the “API” data for the same thread at https://www.reddit.com/r/apolloapp/comments/144f6xm/apollo_will_close_down_on_june_30th_reddits.json. It’s not very fun to look at, but it’s easy to tell what you’re looking at: the fundamental representation of the page without all the trappings of the interface.

Public APIs are good for both the user and the company. They’re a vastly more efficient way for people to interact with the service than by automating interaction (or “scraping”). Having an API cuts out an entire layer of expense that, without an API, Reddit would pay for.

The Reddit service is the application, and you interface with it through WHATEVER. Whatever browser you want, whatever browser extensions you want, whatever model phone you want, whatever app you want. This is fundamentally necessary for operability and accessibility.

The API is the service. The mechanical ability to post and view and organize is what makes Reddit valuable, not its frontend. Their app actually takes the core service offering and makes it less attractive to users, which is why they were willing to pay money for an alternative!

So you want to write an AI art license

  • Posted in cyber

Hi, The EFF, Creative Commons, Wikimedia, World Leaders, and whoever else,

Do you want to write a license for machine vision models and AI-generated images, but you’re tired of listening to lawyers, legal scholars, intellectual property experts, media rightsholders, or even just people who use any of the tools in question even occasionally?

You need a real expert: me, a guy whose entire set of relevant qualifications is that he owns a domain name. Don’t worry, here’s how you do it:

Given our current system of how AI models are trained and how people can use them to generate new art, which is this:

sequenceDiagram
    Alice->>Model: Hello. Here are N images and<br>text descriptions of what they contain.
    Model->>Model: Training (looks at images, "makes notes", discards originals)
    Model->>Alice: OK. I can try to make similar images from my notes,<br>if you tell me what you want.
    Curio->>Model: Hello. I would like a depiction of this new <br>thing you've never seen before.
    Model->>Curio: OK. Here are some possibilites.

Replika: Your Money or Your Wife

  • Posted in cyber

If1 you’ve been subjected to advertisements on the internet sometime in the past year, you might have seen advertisements for the app Replika. It’s a chatbot app, but personalized, and designed to be a friend that you form a relationship with.

That’s not why you’d remember the advertisements though. You’d remember the advertisements because they were like this:

Replika "Create your own AI friend" "I've been missing you" hero ad

Replika ERP ad, Facebook (puzzle piece meme) Replika ERP ad, Instagram

And, despite these being mobile app ads (and, frankly, really poorly-constructed ones at that) the ERP function was a runaway success. According to founder Eugenia Kuyda the majority of Replika subscribers had a romantic relationship with their “rep”, and accounts point to those relationships getting as explicit as their participants wanted to go:

erp1

So it’s probably not a stretch of the imagination to think this whole product was a ticking time bomb. And — on Valentine’s day, no less — that bomb went off. Not in the form of a rape or a suicide or a manifesto pointing to Replika, but in a form much more dangerous: a quiet change in corporate policy.

Features started quietly breaking as early as January as Replika began to filter conversations, and the whispers sounded bad for ERP. But the final nail in the coffin was the official statement from founder Eugenia Kuyda:

“update” - Kuyda, Feb 12 These filters are here to stay and are necessary to ensure that Replika remains a safe and secure platform for everyone.

I started Replika with a mission to create a friend for everyone, a 24/7 companion that is non-judgmental and helps people feel better. I believe that this can only be achieved by prioritizing safety and creating a secure user experience, and it’s impossible to do so while also allowing access to unfiltered models.

People just had their girlfriends killed off by policy. Things got real bad. The Replika community exploded in rage and disappointment, and for weeks the pinned post on the Replika subreddit was a collection of mental health resources including a suicide hotline.

Resources if you're struggling post

Cringe!

First, let me deal with the elephant in the room: no longer being able to sext a chatbot sounds like an incredibly trivial thing to be upset about. But these factors are actually what make this story so dangerous.

These unserious, “trivial” scenarios are where new dangers edge in first. Destructive policy is never just implemented in serious situations that disadvantage relatable people first, it’s always normalized by starting with edge cases and people who can be framed as Other, or somehow deviant.

It’s easy to mock the customers who were hurt here. What kind of loser develops an emotional dependency on an erotic chatbot? First, having read accounts, it turns out the answer to that question is everyone. But this is a product that’s targeted at and specifically addresses the needs of people who are lonely and thus specifically emotionally vulnerable, which should make it worse to inflict suffering on them and endanger their mental health, not somehow funny. Nothing I have to content-warning the way I did this post is funny.

Virtual pets

So how do we actually categorize what a replika is, given what a novel thing it is? What is a personalized companion AI? I argue they’re pets.