there's evidence that performance increases with compute, but not that it *scale...

int_19h · on Jan 28, 2025

The biggest increase in model performance recently came from training them to do chain-of-thought properly - that is why DeepSeek is as good as it is. This requires a lot more tokens for the model to reason, though. Which means that it needs a lot more compute to do its thing even if it doesn't have a massive increase in parameter size.