Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The value of such a benchmark, to me, would be, "what is peak performance", not just "what is mid-tier performance". Also, possibly, "what's the per-dollar performance". Time and money permitting, I'd really want to see your benchmark extended to the large reasoning models.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: