Why do Chatbots Dream Up Answers?

On April 2nd, the World Health Organization (WHO) added a health advice chatbot to its website. SARAH (Smart AI Resource Assistant for Health) would provide tips on healthy eating, stress reduction, quitting smoking, and other similar topics in eight languages at any time of day or night. Sounds like a great idea, right?
However, SARAH, like its AI counterparts, isn’t infallible. Advice seekers soon discovered SARAH might provide inaccurate information along with good advice. In one case, SARAH fabricated a list of non-existent health clinics in San Francisco, complete with fictitious names and addresses. WHO quickly shut down SARAH.
We’ve grown accustomed to alarming errors made by chatbots, now a recurring theme in tech news and in internet memes. We now know that fabrications by SARAH and many chatbots. These “hallucinations” are common and persistent. Take Meta’s Galactica, for example, a scientific chatbot. Meta shut it down because it often fabricated academic papers and created fictitious wiki entries about such topics as bears in space. Recently, courts ordered Air Canada to honor a non-existent refund policy dreamed up by its AI customer service agent. In New York City, a lawyer faced sanctions for submitting legal documents filled with fake opinions and citations generated by ChatGPT. He told the judge he had no idea that a chatbot would make up things.
These aren’t just amusing anecdotes. As AI becomes increasingly integrated into our daily lives, these hallucinations can have serious consequences. To understand why this happens, we need to learn more about the large language models (LLMs), the technology powering these chatbots.
Contrary to popular belief, LLMs don’t work like a super-smart search engine pulling pre-existing information from a vast database. Instead, when you ask a question, the LLM doesn’t retrieve an answer—it creates one from scratch.
LLMs predict the next word in a sequence based on patterns they’ve learned from analyzing massive amounts of text data. It’s like an ultra-advanced version of the predictive text on your phone. The model sees “The cat sat…” and might guess “on.” Then it feeds that back into itself, guesses the next word, and so on until it has generated a full response.
In this sense, therefore, everything an LLM produces is a “hallucination”—we just don’t notice when it is correct. Since LLMs are incredibly good at mimicking human-like text, their responses often look very convincing.
So, can we fix this problem? Researchers are trying. Some believe that training models on even more data will help reduce errors. Others are exploring techniques like “chain-of-thought prompting,” where the AI must show its work step-by-step. There’s even hope future models might fact-check themselves.
However, as long as LLMs remain probability games, eliminating hallucinations entirely may be impossible. Even if we could reduce chatbot errors to one in a million, people use chatbots so many millions of times a day, that is still a lot of mistakes, too many mistakes in critical cases.
AI hallucination isn’t just a technological challenge—it’s a reminder of the complex relationship between artificial and human intelligence. As we continue to push the boundaries of what AI can do, we must also sharpen our critical thinking skills and maintain a healthy skepticism of these powerful, yet imperfect, tools. While chatbots can be incredibly useful, they’re not infallible sources of truth.
SARAH is now back up on the WHO website and I wanted to try it out. I declined, however, when the site required me to let them access my camera and microphone so they could collect data on my facial expressions and vocal inflections. Maybe later. The next time you interact with an AI, keep in mind that it might sound convincing, but it’s always worth double-checking before you trust what it says. After all, even the smartest AI can sometimes dream.
Next post on July 19th