The Human Problem With LLMs
The bar for machine-grade quality is above that of single-human delivery.
Traditionally, we view technology as a boolean: correct or incorrect. If something doesn't work, we try a different tool or service. If a significant investment has been made, we may consider adjustments, fixes, or shift how we use the technology. Changing what we do to accommodate better outcomes can be annoying, but we do it all the time.
Human failures are learning opportunities. Taking ownership of shortcomings or poor decision-making is seen as an adult thing rather than shifting blame towards others. In a business context, there are costs and risks associated with repeated failure, so accountability is key. Repeated mistakes signal bad investments and/or an inability to learn.
That said, how do we address situations where intelligent machines fail us?
Large Language Failover
Large Language Models (LLMS) don't perform like other technology. They aren't correct 100% of the time. An LLM can deliver an answer or series of wrong answers, just like the humans it was designed to emulate. The issue is that the incorrect answers are offered with the same confidence level as the correct answers.
The technical term for incorrect LLM answers is 'hallucination,' partially because if we say that a machine is lying to us, it sounds really bad.
Yet if someone gives you the wrong answer to a question, knowing it's incorrect or otherwise made up...it's a lie. Trust is earned over time, and lying erodes trust. It's that simple--for humans. Consider the following:
The scientific method dictates learning from mistakes.
Mistakes are human, as technology is designed to avoid error.
To learn from mistakes, they must be identified.
Accountability for making mistakes is key to building trust.
Reduction of errors drives trust.
The more we use an LLM, the better it performs. Like humans, it learns by making mistakes. But an LLM doesn't 'own' up to them, and it isn't being held accountable since (today) it's a super bright question-and-answer machine that can solve general-purpose problems and do a stellar yet inconsistent job.
If we don't see performance improvement, the level of trust decreases. How does a machine take accountability for its errors and show improvement? Humans reflect personal and professional growth with sincerity, clarity, and behavioral change. These are difficult for machines to emulate.
They try. Machines can be programmed to be artificially sincere--just thank ChatGPT/Claude/ Co-Pilot/Gemini for giving you an answer and see how it responds.
Adventures in Large Language Translation
But does an LLM apologize? Kind of. Does it tell you when it has an answer with higher than lower confidence levels? Sometimes. LLMs can tell you they can't answer your direct question but provide adjacent information, which is remarkable since context is challenging to establish. Since trust is subjective, it's hard to tell whether or not we trust LLMs more with each product release.
Synthetic data and world models are where progress is being made to establish broader and better context, but it will never arrive fast enough for the market. Few things do. Either humans' perceptions of technology's value shift, and we continue changing how we do things to suit technology, or technology improves. Both will happen quickly.
Agents are an example of improving technology, as narrow, task-centric capabilities drive more business value than general knowledge queries. The LLM works to provide general information and capabilities and works in concert with other tools. The next stage of AI is achieving outcomes using multiple types of automation.
Agentive technology, used with LLMs and automation, will result in more ROI from AI.
Humans tend to look for simple, silver-bullet solutions to complex problems. Instead of looking for answers and shortcuts, we could admit that sometimes, we just need a little help.
That feels more human.