The value of such a benchmark, to me, would be, "what is peak performance", not ...

The value of such a benchmark, to me, would be, "what is peak performance", not just "what is mid-tier performance". Also, possibly, "what's the per-dollar performance". Time and money permitting, I'd really want to see your benchmark extended to the large reasoning models.