More

mv4 · 2026-06-15T18:22:30 1781547750

I've been using MiniMax M2.7 with vllm on my dual Nvidia Spark cluster. Slow (<20 tps) but functional for most of my use cases.

cmrdporcupine · 2026-06-16T02:02:25 1781575345

I was just looking and it should be possible to run this one on 3bit quant on my single Spark? Maybe? Depending on context size? Assuming 3-bit doesn't totally lobotomize it.

mv4 · 2026-06-11T22:27:49 1781216869

Translation: now I can FIRE.

mv4 · 2026-06-03T16:49:53 1780505393

What the hell did I just read.

2b3a51 · 2026-06-03T18:15:42 1780510542

Around 5000 words on the impact of tourism on de-industrialised cities and how property prices and rents drive new businesses. Some of it seemed reasonable to me (we have some of the pastiche 'experiences' being built in my city and I actually rather like the new mock hutong with the glazed green roof tiles and lions), and some of it seemed a bit clumsy and patronising.

cholantesh · 2026-06-03T19:15:59 1780514159

I gave up a few paragraphs in, partly because of the formatting, partly because the author seemed very impressed with themselves for being the nth person to articulate this sentiment in a particularly ostentatious way.

mv4 · 2026-06-02T14:25:08 1780410308

Intel has already tried this (CPU+GPU+NPU) in partnership with Asus [1]. Big announcement at CES back in January. Not seeing much traction. Let's see how this one is received.

[1] https://www.intel.com/content/www/us/en/ai-pc/overview.html

mv4 · 2026-06-01T14:05:09 1780322709

I have an old 192GB DDR4 Dell Precision with dual Intel Xeon Gold 6130 that I've considered spinning up. What's giving me pause is 250W at idle.

mtoner23 · 2026-06-01T14:11:26 1780323086

Surely that number can go lower with some tweaks

mv4 · 2026-06-01T15:20:13 1780327213

I am sure it can. It will still generate a lot of heat when under load.

Are you telling me I should go for it? :)

I do have a dual DGX Spark cluster running MiniMax M2.7 already so I am all for on-prem. But will be interesting how this old machine will perform!

mv4 · 2026-06-01T13:42:18 1780321338

You just described the absolute nightmare scenario for the newly minted trillion-dollar companies whose only hope is for enterprises and SMB to move all their business processes to the cloud, with employees competing at token maxxing.

mv4 · 2026-05-27T19:45:25 1779911125

I configured a dual DGX Spark cluster, and it's certainly "good enough" for my agentic and coding needs.

datadrivenangel · 2026-05-27T19:56:48 1779911808

what models are you using on that? My experiences with apple hardware have convinced me that it is not really good enough for coding locally.

girvo · 2026-05-27T22:06:57 1779919617

DeepSeek v4 Flash, various quantised versions of Kimi K2.6, MiniMax 2.7, Qwen 3.5 “full sized, with a dual spark setup you can fit some decent setups on here

My single spark has me running Qwen 3.6 27B and antirez’s specially quantised DeepSeek v4 Flash (which is shockingly impressive)

Tepix · 2026-05-28T03:41:48 1779939708

Kimi K2.6 does not run well on 256GB.

zozbot234 · 2026-05-28T09:23:30 1779960210

Have you tried it? It would be slow for sure, but the main limitation AIUI would actually be storing the context in RAM - models like Kimi and GLM have high demands there which limit your ability to get meaningful aggregate throughput via large batches.

Tepix · 2026-05-30T07:39:39 1780126779

No need to try really. 1100b weights with 256GB RAM that‘s less than 1.8 bits per weight if you want a little bit of context.

How is that supposed to give good results?

girvo · 2026-05-28T07:23:19 1779952999

True, I might be thinking of some of the communities four-Spark clusters for it; it’s already int4 right?

Tepix · 2026-05-30T07:43:48 1780127028

Yeah, the default quants are 595GB. Even four Sparks would require a quant lower than 4bit

irishcoffee · 2026-05-27T20:07:49 1779912469

It isn’t the models, it’s the closed api and the tooling associated with it. It’s driving me crazy how not-talked-about this is.

klausa · 2026-05-28T10:49:43 1779965383

You can point both Codex and Claude Code at a local model and they'll work just fine. Codex even explicitly supports that as a feature! [1]

With a nice UI on top, for the desktop app too: [2]

[1]: https://developers.openai.com/codex/config-advanced#custom-m...

[2]: https://docs.ollama.com/integrations/codex-app

datadrivenangel · 2026-05-27T20:20:32 1779913232

As in the coding harnesses?

irishcoffee · 2026-05-27T23:02:33 1779922953

If I could leverage the same closed api VSCode uses, the entire moat is drained.

What you call harnesses I call… bullshit?

mv4 · 2026-05-28T19:13:48 1779995628

Minimax M2.7.

mv4 · 2026-05-27T19:43:44 1779911024

If people figure out how to run agents on-prem (already becoming feasible for both agentic tasks and coding on consumer hardware like Mac Studio 128GB+ or DGX Spark with some models) these companies will be in deep trouble.

Privacy is also a huge issue.

mv4 · 2026-05-26T18:11:09 1779819069

Only took 19 years!

mv4 · 2026-05-14T21:54:15 1778795655

Can someone recommend an IDE that can be used with a self-hosted model (via OpenAI or similar)?

suyash · 2026-05-15T00:01:34 1778803294

Look up OpenCode

aiscoming · 2026-05-14T23:20:29 1778800829

vs code supports local models (bring your own key/model)

you need a model server - ollama/llama.cpp/lm studio

no-name-here · 2026-05-15T00:50:51 1778806251

> bring your own key

Do you mean supporting oai-compatible api URLs in copilot? If so then you need either VS Code Insiders, or a VS Code extension I believe?