I built this Clojure lib for robust high scale LLM calls wherein the consumer is usually a http request waiting on an SSE stream. https://github.com/jhancock/aimee
The article states: "Most applications are built on an architecture like the one above, where there are a number of stateless horizontally scaleable server replicas that can handle client requests."
Using the library I built, I have yet to worry about this as Clojure core.async, http libs and Java VM are so rock solid, I don't have a fragile set of stateless servers. Sure, at some point there are rare edge cases but it's nice to get very far along without worrying about them.
Why would any company release open weights once the investment money stops ?
Releasing open weights have been basically a PR move, the moment those companies need to actually make money they will cut it out as that reduces their client base.
They DO NOT want you to run AI. They want you to pay them to do it
Minimax just released a new model yesterday. You're conflating one company with a countries entire industry. There's more than just Qwen coming out of China.
Two years ago a lot of people thought GPT-4o was usable for software development. I didn’t really find that to be the case in general but certainly it could do a lot of useful things. And now Qwen3.5-8B is just as capable and runs fine on an M2 MacBook Air.
QWEN3.5 coder next runs to ~84k context before it poops out on AMD395+ w/128GB. Most of what it's good at is boilerplate find/replace/copy/paste; but being able to scaffold things out and touch up 20-30% of the code is pretty sweet.
z.ai did go public on the HK exchange. They are under pressures similar to other public companies.
I know that China models are increasingly being trained and run using Huawei chips instead of Nvidia. I know China has a surplus of electricity from renewables (wind, solar, hydro).
People keep repeating this without any real thought behind it because of the high profile resignations on the Qwen team. Meanwhile the Minimax team just released a new open weights version of their 229B model yesterday. So much for that narrative.
The AI landscape in China is larger than just Qwen and Alibaba.
The statement was that China was giving up on open weights, they didn't say anything about licensing. Licensing on these models has always been hit or miss depending on which lab and which release.
but context of the statement is discussion about corps do grab and rent strategy. My understanding is that referenced Chinese model can't be argument in this context, and there is no recent 200B+ params Chinese models with friendly license.
That license is more like business source license vs open source license.
Of course, but for how long? Do you think that companies will keep giving away valuable assets for free forever, or do you think that in the near future there's going to be an open weights model that's so good that people keep using it indefinitely instead of going back to frontier model providers?
The first one is just incredibly naive, the second might be true for some people, for some tasks, but it's not going to capture the majority who're chasing the latest and greatest to "keep up".
> Do you think that companies will keep giving away valuable assets for free forever
If China is forced to choose between giving the entire AI market to the US or releasing free models, they'll be releasing free models as long as it's necessary.
do you think that in the near future there's going to be an open weights model that's so good that people keep using it indefinitely instead of going back to frontier model providers?
We are almost at that point now, where the harnesses and tools are more important drivers of functionality and performance than the model weights themselves. We'll get there.
Every time you release the models you even the playing field out for the competition, which ruins a lot of the advantage your bigger competitors had. It also lets smaller players work on the latest tech and then you can make deals with them.
z.ai models are open weights. GLM-5.1 is very close to Opus with obvious exception of session length.
Only academic models will be true open source as companies can't legally afford to disclose learning inputs.
In regards to "They want to train models on our engineering to replace us". Some software engineers in China can run circles around some of the best teams in Silicon Valley. Days of U.S. hegemony are over. I recommend you make peace and make friends.
I've been using z.ai and codex latest models since last September.
Each release has been an improvement.
codex handles longer sessions but the quality seems to decline and it tends to over engineer and lose focus. It will happily add slop on top of slop...which may pass immediate tests of "code works" but doesn't pass my criteria of "code as craft"
I'm using z.ai GLM with opencode. It's obvious when GLM loses its mind when the session gets too long.
I've been using AI to support programming for around 3 years now. The models have gotten amazing. However, unless there is a significant breakthrough I have determined that it's best for me to focus on short sessions.
I a) organize my work, b) improve my AGENTS.md, ensure source has appropriate comments to guide the models to the patterns and separation of concerns c) use shorter sessions d) review and test without AI. This approach means I still own my code. The AI is just an assistant.
With this approach GLM-5.1 is an excellent model. I never run out of token allotment on z.ai or codex plans. At this point, I only keep my OpenAI subscription as the ChatGPT desktop app is excellent at long web research tasks and I get codex with it.
AST of what? Will it read my clojure code's forms as such? What if my source file has a paran balancing error? I feel I'm thinking of this at the wrong level/angle.
I cannot remember a case, in the last 10 years at least, when I committed code that does not compile. Why should I share that? Also, tree-sitter sort of handles that.
> code that does not compile. Why should I share that?
If you collect test cases for compilers, for example.
> tree-sitter sort of handles that
My worry is that stability of committed ASTs would depend on tree-sitter being stable, and it might be difficult to guarantee that for languages are still in flux. Even most well established languages gain new grammar once every few years, sometimes in backward incompatible ways.
Maybe you meant tree-sitter itself will also be versioned inside this repository?
Also, there is an option to pick a codec for a particular file. Might use tree-sitter-C, might use general-text. The only issue here, you can't change the codec and keep nice diffs.
I usually want the codex approach for code/product "shaping" iteratively with the ai.
Once things are shaped and common "scaling patterns" are well established, then for things like adding a front end (which is constantly changing, more views) then letting the autonomous approach run wild can *sometimes* be useful.
I have found that codex is better at remembering when I ask to not get carried away...whereas claude requires constant reminders.
Depends on your app cache needs. If it's moderate, I'd start with postgres...ie. not have operate another piece of infra and the extra code. If you are doing the shared-nothing app server approach (rails, django) where the app server remembers nothing after each request Redis can be a handy choice. I often go with having a fat long lived server process (jvm) where it also acts for my live caching needs. #tradeoffs
I've started using a container (podman) which is just for the AI tools. I start it up for Codex etc and let it access to the appropriate code directory outside the container.
Anyone else using this approach? Ideas on improvements?
reply