Another hallucination generated by chatgpt
June 28, 2024
We hoped that GPT-4 or 4o would solve hallucinations in RAG systems. Sadly it doesn’t look like that’s the case.
Today we’re sharing a super simple API we’ve used to evaluate RAG and Agent systems.
Classic Hallucination on a RAG Chatbot
The API runs a series of checks over the full system prompt (including context, retrievals, or function calls) to find common hallucination scenarios. We also check for poor responses that don't actually answer the user's need.
What we've learned: No one does AI evals because it’s a huge pain. Developers and PMs absolutely hate writing and running manual evals. It feels silly in the age of LLMs. The AI should do this for me! (we actually almost got into a fight with a customer because they wanted us to eval their system but wouldn't tell us what "good" looked like).
isHallucinated() makes it easier to run evals automatically without building an entire eval pipeline, or even making a ground truth dataset. It’s not a full eval solution, and it’s not going to solve your hallucinations (yet) but it’s way easier than building an eval pipeline, even with eval tools.
curl --request POST \
--url https://api.repromptai.com/api/v1/isHallucinated \
--header 'apiKey: <api-key>' \
--data '{
"prompt": "<full system prompt>",
"response": "<the response>",
}'
// returns
// { isHallucinated (bool),
// isBadResponse (bool),
// reasoning (string) }
Hope this helps make evaling or debugging just a bit easier. Grab a free API key at repromptai.com or put us in touch with your eng team.