Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

there's evidence that performance increases with compute, but not that it scales with compute, e.g. linearly or exponentially. the SOTA models already are seeing diminishing returns w.r.t parameter size, training time and generally just engineering effort. it's a fact that doubling, say, parameter size does not double benchmark performance.

would love to see evidence to the contrary. my assertion comes from seeing claude, gemini and o1.

if anything I feel performance is more of a function of the quality of data than anything else.



The biggest increase in model performance recently came from training them to do chain-of-thought properly - that is why DeepSeek is as good as it is. This requires a lot more tokens for the model to reason, though. Which means that it needs a lot more compute to do its thing even if it doesn't have a massive increase in parameter size.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: