What are AI hallucinations?

If you’ve used ChatGPT, Claude, Gemini or any other chatbot for more than a few minutes, you have been lied to. Not maliciously, and not even deliberately, but you have almost certainly been given a confident, fluent, plausible answer that is, in fact, wrong. Maybe it was citation to a paper that never existed. A quote attributed to the wrong person. A case study invented out of nothing. Or even just a few lines of code that seem legit on the surface, but aren’t anchored to reality. This is the phenomenon the industry has settled on calling “hallucination,” and it is a surprisingly misunderstood aspect of how these technologies work.

Hallucinations, confabulations, bullshit and more

The standard definition is straightforward: a hallucination is output from a language model that is fluent and plausible but factually wrong, unsupported, or fabricated. The most cited academic survey of the phenomenon, published in ACM Transactions on Information Systems, splits this into two categories : intrinsic hallucinations, where the output contradicts the source material you gave the model, and extrinsic hallucinations, where the output can’t be verified against the source at all, because the model has produced something from nowhere. A related distinction the literature uses is between faithfulness, whether the output sticks to the input you provided, and factuality, whether it matches the real world. A model can be unfaithful to your document while being factually true, or perfectly faithful to your document while repeating something false.

But the technical definitions of the term “hallucination” also hide some problematic quirks of language. The linguist Emily M. Bender, co-author of the famous “stochastic parrots” paper, argues the word encourages the wrong mental model. To hallucinate is to perceive something that isn’t there, and perception is something minds do; a language model doesn’t perceive anything, so describing its errors as hallucinations grants it a kind of inner experience it does not have.

Bender and her collaborators prefer the more neutral term “undesirable output,” and they make another point: on the system’s side there is no real difference between a desirable and an undesirable output, because all of it is “probabilistically produced synthetic text”. The difference between a true answer and a made-up one exists only for us, the people reading it. The machine is doing the same thing in both cases.

In a paper with the blunt title “ChatGPT is bullshit,” the philosophers Michael Townsen Hicks, James Humphries and Joe Slater argue that the right frame isn’t hallucination at all but Harry Frankfurt’s technical notion of bullshit: speech produced with no concern for truth one way or the other. A liar knows the truth and works to obscure it; a bullshitter simply doesn’t care, and is indifferent to whether what they say is true or false. Their argument is that a language model is, at minimum, what they call a “soft bullshitter,” because it isn’t designed to track the truth at all. An LLM is designed to produce text that looks like the kind of text a person would produce. When it happens to be right, that’s incidental.

This isn’t just academics enjoying the word “bullshit” in journal article titles (though, it’s probably a little bit that). The word you choose tells you what kind of correction to look for. If the machine is “hallucinating,” you might think the solution is to fix its perceptions and show it real sources. If it’s bullshitting, you understand that getting it right and getting it wrong are the same underlying process, and no amount of better sources changes what the process fundamentally is.

I’ll use “hallucination” through the rest of this article, because it’s the term everyone searches for and the term the technical literature still mostly uses, but hold the criticism in mind, because it matches the technical reality very well.

Under the hood

Hallucinations are not bugs that snuck into a truth-telling system. They are a direct consequence of what these models are built to do.

A large language model is a system for predicting the next chunk of text. Given a string of words, it produces a probability distribution over what might come next, and then samples from that distribution. The model learned those probabilities by being trained on an enormous quantity of text, adjusting billions of internal parameters until it got good at prediction. But there is no separate store of facts inside it, no database it consults, and no internal model of the world that it checks its statements against. Unlike a traditional search engine, a language model cannot “look up” information since it has no ground truth to search. So, when the model tells you the capital of France is Paris, it is not looking up a fact – it is producing the most probable continuation of the text, and the most probable continuation happens to be correct because the training data was overwhelmingly consistent.

The trouble starts when the most probable continuation and the true continuation drift apart. For a question about a well-documented, frequently-repeated fact, those two things line up, and the model is reliable. For a question about something rare, obscure, or where the training data was thin or contradictory, the model still produces a confident, fluent, probable-looking answer; it just has nothing solid underneath it. The output looks exactly the same either way. That’s the point Bender makes from a linguistics perspective and the bullshit paper makes from philosophy: the mechanism that produces your correct answers is the same mechanism that produces the fabricated ones.

One interesting study of why this happens comes, perhaps surprisingly, from OpenAI. In a 2025 paper, “Why Language Models Hallucinate,” Adam Kalai and colleagues make two arguments. First, hallucination is partly a statistical inevitability. They basically show that even if you trained a model on perfectly clean, error-free data, the mathematics of fitting a model to a language distribution guarantees a non-zero error rate. Some hallucination falls out of the statistics no matter how good your data is, and it’s worse for facts that appear rarely in training.

The second argument is that hallucination persists because evaluation methods reward it. Almost every benchmark used to rank models scores answers as right or wrong, with no credit for “I don’t know.” Under that scoring system, a model that guesses when uncertain will always outperform a model that abstains, in exactly the way a student sitting a multiple-choice exam with no penalty for wrong answers should always guess rather than leave a blank. We have spent years building benchmark leaderboards that punish honesty about uncertainty and reward confident guessing, and then we act surprised that the models guess confidently.

Mitigating hallucinations

There are several methods for mitigating hallucinations, though it could reasonably be argued that minimising the rate of hallucinations might create more problems, by making them harder to spot.

The most widely attempted mitigation is retrieval-augmented generation, usually shortened to RAG. The idea is to stop relying on the model’s smeared-out internal parameters as a source of facts, and instead retrieve relevant documents from a trusted database at the moment of the query, feed them into the model’s context, and instruct it to answer from those documents. This turns an open-ended memory test into something closer to open-book comprehension, and it does seem to help; an industry study found RAG significantly reduced hallucination and improved reliability on out-of-domain queries. Many enterprise deployments now use some form of RAG.

But RAG is not a panacea. Stanford’s RegLab looked at the commercial legal research tools built by LexisNexis and Thomson Reuters, the kind of purpose-built, RAG-powered, legally-trained systems marketed to lawyers as reliable, with at least one provider having advertised “hallucination-free” citations. In their study in the Journal of Empirical Legal Studies, the researchers found that these tools, while substantially better than general-purpose chatbots, still hallucinated between 17 and 33 percent of the time.

By that count, a lawyer running three research queries could expect, on average, at least one answer containing an error or a misgrounded citation. RAG mitigated some hallucinations but it did not remove the problem. Retrieval can return documents that are topically relevant but don’t actually answer the question, the model can misread or contradict the documents it was given, and in a longer answer there are more individual claims that can each go wrong.

Beyond retrieval, the other broad strategy is to use the language model as just one part of the system, and to connect it to other, deterministic tools. This is the direction the bullshit paper gestured at and the direction most of the practical engineering has gone. Hook the model up to a calculator or a code interpreter so it isn’t “predicting” arithmetic. Connect it to a search engine so it can retrieve current facts. Let it call external tools and APIs that are deterministic and verifiable, rather than asking it to reproduce their outputs from memory.

Each of these layers catches a class of error the bare model would have made, and the combination is more reliable than any language model on its own. The pattern that works is consistent: surround the probabilistic, unreliable text generator with deterministic, checkable systems, and use it for what it’s actually good at, which is producing fluent language, not for what it’s bad at, which is being a trustworthy oracle.

What none of these approaches do is change the nature of the thing at the centre. Every one of them is a guardrail, a verification layer bolted onto a system that is still, underneath, generating probable text rather than surfacing truth.

It doesn’t matter how many tools you hand a language model: it is still just predictive text under the hood.
Photo by Federico Tomasoni on Pexels.com

So, are hallucinations solvable?

I don’t think the problem is solvable, at least not within the large language model paradigm as it currently exists.

I am not saying the technologies will stay as unreliable as they are now. Error rates have already come down a great deal and will keep coming down, with better data, better retrieval, better verification layers and smarter evaluation all helping. Future models will learn to say “I don’t know” when the incentives start to reward abstaining.

But hallucinations are not an accidental flaw sitting on top of an otherwise truthful system that we can eventually patch out; they are a direct expression of how the system works, and the same process that produces the right answers produces the wrong ones.

That’s why I agree with Bender and others that we need a philosophical and linguistic aspect to our understanding of hallucinations, and to go beyond the idea that “LLMs are sometimes wrong”. If you believe the model is hallucinating, you believe it is trying to tell the truth and occasionally failing, and you’ll keep waiting for the version that finally gets it right. If you understand that it was never producing the truth in the first place, even when the answers are correct, and that producing a true sentence and producing a false one are the same act performed on the same probabilities, then you stop waiting for a fix and start building the verification habits you needed all along.

The practical upshot for a school, a business, or anyone putting this technology in front of people who will trust it is the same either way: the “human in the loop” is not a temporary inconvenience to be automated away in the next release. Authentication is the job, and on the evidence we have, it is going to remain one of the most important factors in working with GenAI.

Want to learn more about GenAI professional development and advisory services, or just have questions or comments? Get in touch:

← Back

Dr Leon Furze