Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I configured a dual DGX Spark cluster, and it's certainly "good enough" for my agentic and coding needs.
 help



what models are you using on that? My experiences with apple hardware have convinced me that it is not really good enough for coding locally.

DeepSeek v4 Flash, various quantised versions of Kimi K2.6, MiniMax 2.7, Qwen 3.5 “full sized, with a dual spark setup you can fit some decent setups on here

My single spark has me running Qwen 3.6 27B and antirez’s specially quantised DeepSeek v4 Flash (which is shockingly impressive)


Kimi K2.6 does not run well on 256GB.

Have you tried it? It would be slow for sure, but the main limitation AIUI would actually be storing the context in RAM - models like Kimi and GLM have high demands there which limit your ability to get meaningful aggregate throughput via large batches.

No need to try really. 1100b weights with 256GB RAM that‘s less than 1.8 bits per weight if you want a little bit of context.

How is that supposed to give good results?


True, I might be thinking of some of the communities four-Spark clusters for it; it’s already int4 right?

Yeah, the default quants are 595GB. Even four Sparks would require a quant lower than 4bit

It isn’t the models, it’s the closed api and the tooling associated with it. It’s driving me crazy how not-talked-about this is.

You can point both Codex and Claude Code at a local model and they'll work just fine. Codex even explicitly supports that as a feature! [1]

With a nice UI on top, for the desktop app too: [2]

[1]: https://developers.openai.com/codex/config-advanced#custom-m...

[2]: https://docs.ollama.com/integrations/codex-app


As in the coding harnesses?

If I could leverage the same closed api VSCode uses, the entire moat is drained.

What you call harnesses I call… bullshit?


Minimax M2.7.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: