I was just looking and it should be possible to run this one on 3bit quant on my single Spark? Maybe? Depending on context size? Assuming 3-bit doesn't totally lobotomize it.
Around 5000 words on the impact of tourism on de-industrialised cities and how property prices and rents drive new businesses. Some of it seemed reasonable to me (we have some of the pastiche 'experiences' being built in my city and I actually rather like the new mock hutong with the glazed green roof tiles and lions), and some of it seemed a bit clumsy and patronising.
I gave up a few paragraphs in, partly because of the formatting, partly because the author seemed very impressed with themselves for being the nth person to articulate this sentiment in a particularly ostentatious way.
Intel has already tried this (CPU+GPU+NPU) in partnership with Asus [1]. Big announcement at CES back in January. Not seeing much traction. Let's see how this one is received.
You just described the absolute nightmare scenario for the newly minted trillion-dollar companies whose only hope is for enterprises and SMB to move all their business processes to the cloud, with employees competing at token maxxing.
DeepSeek v4 Flash, various quantised versions of Kimi K2.6, MiniMax 2.7, Qwen 3.5 “full sized, with a dual spark setup you can fit some decent setups on here
My single spark has me running Qwen 3.6 27B and antirez’s specially quantised DeepSeek v4 Flash (which is shockingly impressive)
Have you tried it? It would be slow for sure, but the main limitation AIUI would actually be storing the context in RAM - models like Kimi and GLM have high demands there which limit your ability to get meaningful aggregate throughput via large batches.
If people figure out how to run agents on-prem (already becoming feasible for both agentic tasks and coding on consumer hardware like Mac Studio 128GB+ or DGX Spark with some models) these companies will be in deep trouble.
reply