Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Model RLHFed to follow instructions follows instructions, even when we might not want it to.

But alignment is easy folks, nothing to worry about :)



I think people might have forgotten that LLMs before InstructGPT came around could be weirdly opinionated jerks. There was this whole effort to train them so that we could actually give them instructions. It's probably a hell of a lot more useful to have an LLM that will just go with whatever weird stuff the human says rather than try to fight them on it.

https://openai.com/research/instruction-following




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: