GioCities

blogs by Gio

You can Google it

  • Posted in tech tech

The other day I had a quick medical question (“if I don’t rinse my mouth out enough at night will I die”), so I googled the topic as I was going to bed. Google showed a couple search results, but it also showed Answers in a little dedicated capsule. This was right on the heels of the Yahoo Answers shutdown, so I poked around to see what Google’s answers were like. And those… went in an unexpected direction.

Should I rince my mouth after using mouthwash? Why is it bad to swallow blood? Can a fly live in your body? What do vampires hate? Can you become a vampire? How do you kill a vampire?

So, Google went down a little rabbit trail. Obviously these answers were scraped from the web, and included sources like exemplore.com/paranormal/, which is, apparently, a Wiccan resource for information that is “astrological, metaphysical, or paranormal in nature.” So possibly not the best place to go for medical advice. (If you missed it, the context clue for that one was the guide on vampire killing.)

There are lots of funny little stories like this, where some AI misunderstood a question. Like this case where a porn parody got mixed in the bio for a fictional character, or that time novelist John Boyne used Google and accidently wrote a video recipe into his book. (And yes, it was a Google snippet.) These are always good for a laugh.

Wait, what’s that? That last one wasn’t funny, you say? Did we just run face-first toward the cold brick wall of reality, where bad information means people die?

Well, sorry. Because it’s not the first time Google gave out fatal advice, nor the last. Nor is there any end in sight. Whoops!

Trying to use algorithms to solve human problems🔗

But I’m not going to be too harsh on Google’s sourcing algorithm. It’s probably a very good information sourcing algorithm. The problem is that even a very good information sourcing algorithm can’t possibly work in anything approaching fit-for-purpose here.

The task here is “process this collection of information and determine both which sources are correct and credible and what those sources’ intended meanings are.” This isn’t an algorithmic problem. Even just half of that task — “understanding intended meaning” — is not only not something computers are equipped to do but isn’t even something humans are good at!

By its very nature Google can’t help but “take everything it reads on the internet at face value” (which, for humans, is just basic operating knowledge). And so you get the garbage-in, garbage-out problem, at the very least.

Google can’t differentiate subculture context. Even people are bad at this! And, of course, Google can “believe” wrong information, or just regurgitate terrible advice.

But the problem is deeper than that, because the whole premise that all questions have single correct answers is wrong. There exists debate on points of fact. There exists debate on points of fact! We haven’t solved information literacy yet, but we haven’t solved information yet either. The interface that takes in questions and returns single correct answers doesn’t just need a very good sourcing function, it’s an impossible task to begin with. Not only can Google not automate information literacy, the fact that they’re pretending they can is itself incredibly harmful.

But the urge to solve genuinely difficult social, human problems with an extra layer of automation pervades tech culture. It essentially is tech culture. (Steam is one of the worst offenders.) And, of course, nobody in tech ever got promoted for replacing an algorithm with skilled human labor. Or even for pointing out that they were slapping an algorithm on an unfit problem. Everything pulls the other direction.

Dangers of integrating these services into society🔗

Of course, there’s a huge incentive to maximize the number of queries you can respond to. Some of that can be done with reliable data sourcing (like Wolfram|Alpha does), but there are a lot of questions whose answers aren’t in a feasible data set. And, according to Google, 15% of daily searches are new queries that have never been made before, so if your goal is to maximize how many questions you can answer (read: $$$), human curation isn’t feasible either.

But if you’re Google, you’ve already got most of the internet indexed anyway. So… why not just pull from there?

Well, I mean, we know why. The bad information, and the harm, and the causing of preventable deaths, and all that.

And this bad information is at its worst when it’s fed into personal assistants. Your Alexas, Siris, Cortanas all want to do exactly this: answer questions with actionable data.

The problem is the human/computer interface is completely different with voice assistants than it is with traditional search. When you search and get a featured snippet, it’s on a page with hundreds of other articles and a virtually limitless number of second opinions. The human has the agency to do their own research using the massively powerful tools at their disposal.

Not so with a voice assistant. They have the opportunity to give zero-to-one answers, which you can either take or leave. You lose that ability to engage with the information or do any followup research, and so it becomes much, much more important for those answers to be good.

And they’re not.

Let’s revisit that “had a seizure, now what” question, but this time without the option to click through to the website to see context.

Oh no.

And of course these aren’t one-off problems, either. We see these stories regularly. Like just back in December, when Amazon’s Alexa told a 10 year old to electrocute herself on a wall outlet. Or Google again, but this time killing babies.

The insufficient responses🔗

Let’s pause here for a moment and look at the response to just one of these incidents: the seizure one. Google went with the only option they had (other than discontinuing the ill-conceived feature, of course): case-by-case moderation.

Now, just to immediately prove the point that case-by-case moderation can’t deal with a fundamentally flawed problem like this, they couldn’t even fix the seizure answers

See, it wasn’t fixed by ensuring future summaries of life-critical information would be written and reviewed by humans, because that’s a cost. A necessary cost for the feature, in this case, but that doesn’t stop Google from being unwilling to pay it.

And, of course, the only reason this got to a Google engineer at all is that the deadly advice wasn’t followed, the person survived, and the incident blew up in the news. Even if humans could filter through the output of an algorithm that spits out bad information (and they can’t), best-case scenario we have a system where Google only lies about unpopular topics.

COVID testing🔗

And then we get to COVID.

Now, with all the disinformation about COVID, sites like Twitter and YouTube have taken manual steps to try to specifically provide good sources of information when people ask, which is probably a good thing.

But even with those manual measures in place, when Joe Biden told people to “Google COVID test near me” in lieu of a national program, it raised eyebrows.

Now, apparently there was some effort to coordinate manually sourcing of reliable information for COVID testing, but it sounds like that might have some issues too:

So now, as Kamala Harris scolds people for even asking about Google alternatives in an unimaginably condescending interview, we’re back in the middle of it. People are going to use Google itself as the authoritative source of information, because the US Federal government literally gave that as the only option. And so there will be scams, and misinformation, and people will be hurt.

But at least engineers at Google know about COVID. At least, on this topic, somebody somewhere is going to try to filter out the lies. For the infinitude of other questions you might have? You’ll get an answer, but whether or not it kills you is still luck of the draw.

Comments

Loading...