I assume it’s not possible to get the same results by fine tuning a model with t...

notglossy · 2026-03-26T12:20:59 1774527659

You will still get hallucinations. With RAG you use the vectors to aid in finding things that are relevant, and then you typically also have the raw text data stored as well. This allows you to theoretically have LLM outputs grounded in the truth of the documents. Depending on implementation, you can also make the LLM cite the sources (filename, chunk, etc).

thisischayan · 2026-03-30T00:56:20 1774832180

The approach that has worked for us in production is correction during generation, not after.

The model verifies its output against the rules in the prompt as it generates and corrects itself within the same API call — no retries, no external validator. If there are still failures the model cannot fix at runtime, those are explicitly flagged instead of silently producing wrong output.

This does not mean hallucinations are completely solved. It turns them into a measurable engineering problem. You know your error rate, you know which outputs failed, and you can drive that rate down over time with better rules. The system can also self-learn and self-improve over time to deliver better accuracy.

tren_hard · 2026-03-26T17:16:09 1774545369

I’m still learning this advantages and differences between them, would there be benefits to SFT and RAG? Or does RAG make SFT redundant?

notglossy · 2026-03-27T00:33:32 1774571612

I think generally, SFT is like giving the LLM increased intuition in specific areas. If you combine this with RAG, it should improve the performance or accuracy. Sort of like being a lawyer and knowing something is against the law by intuition, but needing the library to cite a specific case or statute as to why.

tren_hard · 2026-03-27T18:25:55 1774635955

Thank you I appreciate the reply and that analogy helps make sense of this.