You will still get hallucinations. With RAG you use the vectors to aid in finding things that are relevant, and then you typically also have the raw text data stored as well. This allows you to theoretically have LLM outputs grounded in the truth of the documents. Depending on implementation, you can also make the LLM cite the sources (filename, chunk, etc).
The approach that has worked for us in production is correction during generation, not after.
The model verifies its output against the rules in the prompt as it generates and corrects itself within the same API call — no retries, no external validator. If there are still failures the model cannot fix at runtime, those are explicitly flagged instead of silently producing wrong output.
This does not mean hallucinations are completely solved. It turns them into a measurable engineering problem. You know your error rate, you know which outputs failed, and you can drive that rate down over time with better rules. The system can also self-learn and self-improve over time to deliver better accuracy.
I think generally, SFT is like giving the LLM increased intuition in specific areas. If you combine this with RAG, it should improve the performance or accuracy. Sort of like being a lawyer and knowing something is against the law by intuition, but needing the library to cite a specific case or statute as to why.