
Trust will be the KPI to compete on in the #AI hot summer currently under way, and hallucinations in the LLMs driving chatbots is a key problem. If users still have to get a “second opinion” from web search and authoritative sources, the job isn’t quite done.
There are many solutions under way (e.g., better RAGs), I was curious to see how the models self-inspect for their own hallucinations once prompted. A good use case is finding summarized information about a new piece of knowledge, such as the events surrounding Meta’s Llama 3 release today. Note that Claude and ChatGPT did not warn users about the cutoff date in their knowledge base (Aug 2023 at current writing), which would have been useful. Once fed authoritative sources (e.g. Meta’s PR/blogs), many models did self-identify their own discrepancies. We need to reduce user complexity in this area, and increase trust.Activate to view larger image,