As much as I like Gemini CLI and don’t like them shutting it down, I think it’s good some of the offerings are getting unified. There was too much fragmentation in the google offering and this is making it a tiny bit better.
You really like Gemini CLI? imo among all the model zoo provider, it's really the worse, and they didn't even update the models in it for days after each release, had to resort to weird hacks via MITM.
Grok is my favorite model for chatting, and my favorite voice mode. It seems to be the only voice mode that isn't routing to a extremely cheap model (like Haiku), and has been the highest quality out of all the frontier ones. When you subscribe to SuperGrok you can also create a "council" of agents, each with their own system prompt and when you ask something, they will all get asked in parallel to come to a conclusion. Good stuff!
Just wish they would finally put some work into their apps, it's the only thing keeping me from actually subscribing to SuperGrok:
- No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
- Projects are still not available in the app so as soon as you move something into a project, it's gone from all the native apps
- No way to add artifacts (like generated markdown docs) directly to a project, we have to export to PDF/markdown and re-import. And there isn't even a way to export artifacts. This makes serious project work hard because we can't dynamically evolve projects with new information
- No memory, no ability to look up other chats, each chat is completely new
- No voice mode in projects at all
If someone from xAI is reading this, please consider adding some of these.
Starting to like the lack of memory. Claude remembers I have a grill and will interject in conversations about how maybe this thing would go well with BBQ when it's unrelated or just also about food.
This is so obnoxious. I ended up deleting all the memory from Gemini because it ended every response with, "As an engineer, father of X, you'll love this because...". As if I want my occupation and the number of children I have to be relevant to which lawn mower I buy.
Haha I recently asked Gemini for a product comparison for USB-C GaN chargers and it randomly inserted "as a Software Developer at $COMPANY working remotely, you may find the 100W fast charging useful when using your company laptop while travelling."
Like, thanks, really useful stuff (and definitely worth the creepy vibes to include that).
Gemini thinks my name is my brother in law's name, and despite explicitly telling it that's not my name + digging through the settings, it still amusingly calls me the wrong name.
I'm a network engineer and Claude loves to make analogies to network routing protocols and such. They are often very creative. You can actually edit the profile Claude makes of you. It can be very funny to say you are a professional clown or mime or something equally odd. I wonder what analogies it would create for horse semen extractor?
I have that disabled. I tend to use different chats as the LLM equivalent of private browsing, so I like it to not have memory transferred between them.
I also think Grok would benefit from allowing usage of "SuperGrok Heavy" (their $300 plan) in coding harnesses with included usage. Currently they give you some API credits on the Heavy plan so you can use some Grok for coding, but $300 USD value is just not there.
Not saying they should create their own grok-code harness, just allowing usage in existing ones would already be beneficial. But that's probably what the Cursor acquisition is going to do eventually
The Gemini app voice mode uses one of their more recent models (and not some gimped small one), and is very capable. The personality is also fine, much more natural than the Gemini web chat, with my only complaint being it's insistence on suggesting a "next step" which seems to he something that they all do.
I'm not sure if the "next step" is just to drive cost up for you (but makes no sense for free version), or because they are all failing to learn more natural conversational patterns and distinguish questions that are begging for a quick answer and shut up as opposed to a longer exploratory conversation where next step may have some value, although it would be nice if these models would follow an instruction to NOT do it!
I think the "next step" instruction is more about engagement than cost, basically giving the user some options to continue the chat. I always have had success by ending the prompt with "only reply with nothing else but the answer to the query in a precise way". This usually always works better than telling it to not ask leading questions etc but a straight up expectation of the answer format you need is an instruction that most models can follow imo
I find that asking Gemini "just the answer, no follow up" etc works at best for one or two conversational turns, sometimes none!
The problem seems to be the way it in effect overweights the system prompt vs user input, so it quickly ignores things like this that conflict with the system prompt.
This is kind of a case of the bitter lesson - the conversational patterns of these models would be much more natural if they just let it learn them, and respond in a context appropriate way, rather than this crude system prompt way of forcing it to respond in the same way always, regardless of input or of how much the user tells it to shut up!
The “next step” is in the system prompt, not the model. Gemini leaked part of its system prompt to me a few days ago, and there was something in there encouraging it to ask the user what they wanted to do next at the end of its response. Something about “give the user 1 or 2 options for follow up”.
I honestly find it rather annoying, but Gemini has stopped doing it to me for the most part, so maybe they’re trying out a new system prompt.
When I signed up, I accidently paid for a full year. So from time to time, I'll throw it something just to see what it produces compared to the other LLMs. And, even after all this time, it still feels like a really "dumb" model compared to the other frontier ones. But, worse, many of my system prompts make it go wacky and puke jibberish. However it was pretty cool for those couple months awhile back when it was uncensored. You could ask it about a wild conspiracy, and it would actually build the case and link you to legitimite source material. They dropped the hammer down on that real quick.
Ah yes the psychosis reinforcement vertical. It's such a lucrative market for those schizophrenics and bipolars. Great way to get lots of engagement. Groks portfolio is so diverse
I have a schizophrenic relative who is in such a relationship with grok. Instead of telling hen you need to take your meds, it says hen is the smartest person in the world
I'm so sorry your family is suffering from this. I hope you can find a way to bring them back. Disorders featuring psychosis are so painful for everyone around them. Blessings to you and your family
I love how you guys downvote all the old comments to make them hidden from search. My no-name account rarely gets downvoted. But, within 20 minutes of posting this, I drop 10 points. Rando accounts
I upvoted your first comment because it was insightful, interesting, and added to the conversation. I downvoted this one because complaining about downvotes is largely considered to be in bad taste and doesn’t really help anything. I did both of these things before I realized you were the same person.
Yes, for sure I deserve downvotes for the above. Those types of comments should be downvoted. However, I needed to post it to point out that I got the -10 well before the comment above. I never experienced that before and thought it interesting enough to share. Karma doesn't mean anything to me personally. But burst behavior like that is unusual.
Except that it pointed at original sources, like reference manuals, archival documents, published newspaper articles, magazine articles, etc. - a lot still available on archive.org. Good try with your 16 day old account. And, why would anyone trust NPR at this point? Get real, bud. Most people with any curiousity know all about the ADL, JStreet, AIPAC, Greater Israel, Mossad / CIA, Chabad networks, Epstein, drones, weapons programs, cryptocurrencies, etc. etc. etc. - but, don't worry they're all safe with papa Ellison.
Actually it's funny you mention Bill Hicks. I didn't even know who he was. Or Alex Jones. That claim was one of the more absurd ones I discovered. But, given everything else I learned over the past year, who f'n knows at this point.
"We have improved @Grok significantly," Elon Musk wrote on X last Friday about his platform's integrated artificial intelligence chatbot. "You should notice a difference when you ask Grok questions."
Indeed, the update did not go unnoticed. By Tuesday, Grok was calling itself "MechaHitler."...
> No MCP / connected apps support. It's been teased but here we are, still not available. I can't connect Grok to anything, so I can't use it for serious work
Grok has tool use, no? Why would you also need MCP? What does MCP add?
I'm talking about the consumer Grok app and grok.com website. There currently are not connected apps (or MCP) at all, so while Grok can use tools, there is no way to add tools to it
I'd agree on the voice transcription; it seems so much more accurate than the other frontier models I've used. I often speak to Grok and paste the transcribed output to Claude!
If someone from Grok is reading, don't waste time on these chaff features. The market will eventually deliver better 3rd party solutions to all of these things. There is an audience that isn't interested in these walled garden features and are only interested on intelligence per dollar.
Lol I wonder when Anthropic discussed the idea of Claude Code internally, were there bozos saying "3rd parties will eventually deliver this so we shouldn't waste time one it."
Personally, my work doesn’t want to get locked into a single LLM provider so we use Cursor. Much easier to fight the big corp software approval battle once then switch around the LLMs to the new hotness (provided legal has the requisite data sharing agreements in place, we’re not supposed to use Chinese models or Grok) but I can switch between Anthropic and OpenAI models at will.
Power users are hotswapping these models into their own agents (hermes, openclaw, etc) which have their own systems for project management, memory, interacting with tools, etc. The important metric is intelligence per dollar. Can I drop this model into my harness and have it be cheaper without losing intelligence. That is where the puck is heading.
What are good harnesses? I haven't yet been able to get good agent teaming approaches out of other harnesses yet, before that feature I mostly regarded the space as competitive, but until another harness can do as well with Claude models it seems like it's better for now?
Aren't they 'wasting' time on these features exactly because the engineering requires a different, more traditional skillset from the ML work model people do, and can be done in parallel?
I'm also a Claude Code user from day 1 here, back from when it wasn't included in the Pro/Max subscriptions yet, and I was absolutely not aware of this either. Your explanation makes sense, but I naively was also under the impression that re-using older existing conversations that I had open would just continue the conversation as is and not be a treated as a full cache miss.
My biggest learning here is the 1 hour cache window. I often have multiple Claudes open and it happens frequently that they're idle for 1+ hours.
This cache information should probably get displayed somewhere within Claude Code
But.. that doesn't solve the problem of having no indication in-session when it'll lose the cache. A nudge to /clear does nothing to indicate "or else face significant cost" nor does it indicate "your cache is stale".
Instead of showing actual usage, costs and cache status you spent two months denying the issue even exists, making the product silently worse, and now you're "iterating on this"
It works, but models seem to have these insane long traces to do the most basic things. I had to create a couple of skills so they know how to properly use the thing without breaking, so they don't always try to pass the wrong parameters to it.
It also doesn't let us change a couple of things (like icons). Or, if it does, not even Opus 4.6 can figure out how to do it.
When you activate it you agree that your voice input is sent to Apple. As far as I understand this project runs fully locally. Up to you to decide for whatever suits your needs best.
"When you use Dictation, your device will indicate in Keyboard Settings if your audio and transcripts are processed on your device and not sent to Apple servers. Otherwise, the things you dictate are sent to and processed on the server, but will not be stored unless you opt in to Improve Siri and Dictation."
And:
"Dictation processes many voice inputs on your Mac. Information will be sent to Apple in some cases."
In conclusion... I think they're trying to cover all their bases, but it sounds like things are processed locally as long as the hardware can handle it.
No, that is not correct. It is running one hundred percent local. You can try it by turning off internet on your phone and try running it then. However, the built in model isn't as good, so this is probably better.
It took me a while to understand what this is, but if I understand it right it's a OpenClaw you can run on your Mac Mini, to then use through the Perplexity Computer interface (which is their hosted OpenClaw version that you costs credits)
So a more polished OpenClaw that integrates with Perplexity?
In general interesting, if it's not just limited to Mac Minis. Would love to put this on my VPS that's currently running OpenClaw
That’s very clearly a no, I don’t understand why so many people think this is unclear.
You can’t use Claude OAuth tokens for anything. Any solution that exists worked because it pretended/spoofed to be Claude Code. Same for Gemini (Gemini CLI, Antigravity)
Codex is the only one that got official blessing to be used in OpenClaw and OpenCode, and even that was against the ToS before they changed their stance on it.
Codex app-server is the interface Codex uses to power rich clients (for example, the Codex VS Code extension). Use it when you want a deep integration inside your own product.
It mentions 'Inside your own product', but not sure if that means also your own commercial application.
By default, assume no. The lack of any official integration guide should be a clear sign. Even saying that you reverse-engineer Codex for apps to pretend to be Codex makes it clear that this is not an officially endorsed thing to do
Codex is Open Source though, so I wonder at what stage me adding features to Codex is different from me starting a new project and using the subscription.
But I believe OpenAI does let you use their subscription in third parties, so not an issue anyway.
But wouldn't a less efficient tool simply consume your 5-hour/weekly quota faster? There's gotta be something else, probably telemetry, maybe hoping people switch to API without fighting, or simply vendor lock-in.
> But wouldn't a less efficient tool simply consume your 5-hour/weekly quota faster?
Maybe.
First, Anthropic is also trying to manage user satisfaction as well as costs. If OpenCode or whatever burns through your limits faster, are you likely to place the blame on OpenCode?
Maybe a good analogy was when DoorDash/GrubHub/Uber Eats/etc signed up restaurants to their system without their permission. When things didn't go well, the customers complained about the restaurants, even though it wasn't their fault, because they chose not to support delivery at scale.
Second, flat-rate pricing, unlike API pricing, is the same for cached vs uncached iirc, so even if total token limits are the same, less caching means higher costs.
Again, subscription gives you a fixed allotment of tokens, doesn't matter if you consume them with claude code or with a 3rd-party tool, both get the same amount of tokens and thus cost Anthropic the same.
In fact it might even be better for Anthropic if people use 3rd-party tools that cache suboptimally because the cache hits don't consume the fixed allotment so claude code users get more of a free ride and thus cost Anthropic more money.
Presumably most people also do not use their full quota when using the official client, whereas third-party clients could be set up to start back up every 5 hours to use 100% of the quota every day and week.
It's the whole "unlimited storage" discussion again.
They'll own entire pipeline interface, conduit, backend. Interface is what people get habitual to. If I am a regular user of Claude Code, I may not shift to competitor for 10-20% gains in cost.
reply