Okay but does any one actually _want_ a reasoning model at such low tok/sec spee...

ripped_britches · on Feb 15, 2025

Lots of use cases don’t require low latency. Background work for agents. CI jobs. Other stuff I haven’t thought of.

behnamoh · on Feb 15, 2025

If my "automated" CI job takes more than 5 minutes, I'll do it myself..

bee_rider · on Feb 15, 2025

I bet the raspberry pi takes a smaller salary though.

chickenzzzzu · on Feb 15, 2025

There are tasks that I don't want to do whose delivery sensitivity is 24 hours, aka they can be run while I'm sleeping.

baq · on Feb 15, 2025

Where I’ve been doing CI 5 minutes was barely enough to warm caches on a cold runner

Xeoncross · on Feb 15, 2025

No, but the alternative in some places is no reasoning model. Just like people don't want old cell phones / new phones with old chips - but often that's all that is affordable in some places.

If we can get something working, then improving it will come.

rvnx · on Feb 15, 2025

You can have questions that are not urgent. It's like Cursor, I'm fine with the slow version until a certain point, I launch the request then I alt-tab to something else.

Yes it's slower, but well, for free (or cheap) it is acceptable.

deadbabe · on Feb 15, 2025

Only interactive uses cases need high tps, if you just want a process running somewhere ingesting and synthesizing data it’s fine. It’s done when it’s done.

baq · on Feb 15, 2025

Having one running in the background constantly looking at your home assistant instance might be an interesting use case for a smart home

JKCalhoun · on Feb 15, 2025

It's local?