We built a correction layer that does this — the model verifies its output against your prompt during generation, not after. Same API call, no retries.
Budget models without it: 40-50% accuracy. With it: 95.7% on 10k+ clinical documents. Hallucinations aren't eliminated — some might still fail — but every failure is explicitly flagged. No silent errors. and it improves over time to give you better results next time.
It doesn't make hallucinations "solved. 100%". It makes them an engineering problem with a measurable - very low error rate you can drive down over time.
We're calling it LiveFix — livefix.ai. Benchmarked across all frontier and budget models.