More

gdevenyi · 2026-05-27T12:53:42 1779886422

People have been noticing the effects of this in local LLM inference. Power limiting seems to improve overall performance!

Aurornis · 2026-05-27T14:43:57 1779893037

This is not observable from LLM inference, where you would not encounter uniform matrices.

Power limiting does not improve performance but it does improve efficiency. You might be able to get 90% of the performance for only 70% of the power usage, for example. It does not make the card go faster though.

Ardon · 2026-05-28T21:54:21 1780005261

This isn't necessarily true, especially with consumer GPUs. Some actually can clock higher with less voltage. It's pretty rare, and mostly comes from factory overclocked cards. For it to help you need a card that 1. Thermal throttles 2. can sustain its max OC with less voltage than set at the factory. In that rare case you are removing thermal pressure which allows you to clock higher for longer. It's the silicon lottery though, and (often) lazy board partners just smashing the voltage up as high as the chip maker allows. You definitely won't get more performance from a datacenter GPU this way.

Lerc · 2026-05-27T15:41:03 1779896463

When thermal throttling occurs you can perform faster by running slower.

This is precicely because of the efficiency. The lower efficiency of the higher speed triggers a much lower performance sooner.

Aurornis · 2026-05-27T16:19:19 1779898759

> When thermal throttling occurs you can perform faster by running slower.

This is not true unless the throttling algorithm is so broken that it's oscillating between extremes.

The parts have a curve of clock speed versus voltage. More clock speed means higher performance. That goes further up the voltage curve, meaning more power.

Throttling just moves the card further down the voltage to clock speed curve. It reduces clock speed, reducing performance.

The cards don't "perform faster by running slower". If you run the card slower, it performs slower.

Lerc · 2026-05-27T20:01:26 1779912086

>This is not true unless the throttling algorithm is so broken that it's oscillating between extremes.

That algorithm is doing exactly the task I described. If it could temporarily run faster but in a way that would cause occilation, that literally means it can run faster but it is choosing not to to preserve overall performance.

PcChip · 2026-05-27T16:43:14 1779900194

with a lower power cap set, it runs cooler, which sometimes allows the GPU to reach higher boost speeds. This is a real effect on gaming GPUs - however I have no idea if it applies to datacenter GPUs

gchamonlive · 2026-05-27T13:03:36 1779887016

In general, constraints require optimizations and rearchitectures. I'd also expect the ram shortage for instance to have a big impact on the software industry as a whole, specially in games. They will need to make do with what people have, a ps5/pro or similar in PC power.

aNoob7000 · 2026-05-27T13:48:30 1779889710

I actually think it is a good thing to introduce constraints to AI and the overall tech industry. Hopefully everyone will have to look at improving performance without having to add RAM or increase CPU/GPU performance.

gchamonlive · 2026-05-27T15:38:46 1779896326

As long as these constraints are for everyone and not just for thee and not for me, and become an instrument for big tech to keep consumers dependent on their infra.

gdevenyi · 2026-05-02T20:21:26 1777753286

I'm very interested if more VRAM limited setups (8-16GB) show the same performance loss or if they react differently.

gdevenyi · 2026-04-24T12:03:24 1777032204

This.

gdevenyi · 2026-04-22T21:16:23 1776892583

Mine does https://github.com/gdevenyi/huggingface-estimate

zargon · 2026-04-22T21:51:58 1776894718

Excellent job with this! I tried a few combinations that completely fail on other calculators and yours gets VRAM usage pretty much spot on, and even the performance estimate is in the ballpark to what I see with mixed VRAM / RAM workloads.

It's a shame that search is so polluted these days that it's impossible to find good tools like yours.

gdevenyi · 2026-04-22T21:13:25 1776892405

You can point at the GGUF files and figure it out with your hardware here.

https://github.com/gdevenyi/huggingface-estimate

gdevenyi · 2026-04-04T12:36:23 1775306183

> put the lungs into a constant state of readiness, allowing fast responses to almost any invading germ

Pretty sure we call this "autoimmune disorder"

gdevenyi · 2026-03-10T00:16:19 1773101779

https://github.com/yjeanrenaud/yj_nearbyglasses

gdevenyi · 2026-03-04T23:50:34 1772668234

The medieval guilds will return.

gdevenyi · 2026-02-27T01:40:33 1772156433

The programmers were the users. They asked. They said it was ok.

gdevenyi · 2026-02-22T18:45:19 1771785919

A PWA for mobile would be most welcome

cr125rider · 2026-02-22T19:28:15 1771788495

I love PWAs these days. They’re very polished now. A few of my projects I’m going that direction.

Second this.