But are LLMs the right models to even be able to learn such long horizon goals a...

antonvs · 2025-07-23T05:06:11 1753247171

I was including RLHF in "training". And even the system prompt, really.

If it's true that models can be prevented from spiraling into dead ends with "proper prompting" as the comment above claimed, then it's also true that this can be addressed earlier in the process.

As it stands, this behavior isn't likely to be useful for any normal user, and it's certainly a blocker to "agentic" use.

samrus · 2025-07-24T07:31:27 1753342287

The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.