More

eggbrain · 2026-06-12T17:37:27 1781285847

Gauntlet AI I believe is correlated originally with Lambda School (YC S17). YC founders I believe are able to post job postings on Hacker News, although this might stretch the definition a bit...

Aurornis · 2026-06-12T17:43:01 1781286181

Austen Allred started GauntletAI after his Lambda School bootcamp (now BloomTech) was fined, banned from participating in lending activities, and became too toxic to escape their old brand.

It's not clear to me why he gets to post privileged ads on Hacker News. Is GauntletAI a division of BloomTech, and therefore considered a YC portfolio company?

These aren't even job ads. GauntletAI is a recruiting play. They make money by getting strong candidates to apply and then collecting recruiting fees from companies for placement. They really do have people travel to Austin for some disorganized vibecoding classes with their vibecoded output used for resume building to increase their odds of getting placed (and therefore GauntletAI getting paid).

It's just the evolution of their bootcamp model updated for AI and the fact that their founder was banned from participating in lending agreements due to their deceptive practices. Now they're trying to collect money from the companies instead.

So this isn't even a job ad. It's a recruiter soliciting candidates. I didn't think YC companies were allowed to use their postings to advertise services.

pseudalopex · 2026-06-12T17:44:52 1781286292

Gauntlet AI was a mask of BloomTech initially at least.[1]

[1] https://news.ycombinator.com/item?id=42761972

austentexas · 2026-06-12T18:00:57 1781287257

As one of Austen’s haters, he’s fun to follow. End of last year he announced that he was going to relaunch a bankrupt company called Marin Software and have a documentary crew follow them and do it all with AI in a month or something.

A few weeks later, wow, they’ve booked $1.2m of revenue! And then he never mentioned it again. Documentary never surfaced. Website doesn’t work.

https://xcancel.com/Austen/status/2001357051541491717

Austen is the perfect representation of the AI world. A grifter with zero substance and happy to lie.

eggbrain · 2026-06-12T13:21:15 1781270475

I feel we need a "proof of work by human" for emails. Something that could be signed that attests that someone took the time to write the email, not just sent a template / used AI to auto-generate a personal looking email, etc. Sure that could be gamed as well (have an AI write characters one by one to look more human-like), but taking more time usually is a fairly good blocker for spammers / salespersons / etc.

dgellow · 2026-06-12T13:29:43 1781270983

I would love for a proof of human work to exist, but how would you even do that? It would need to be monitoring the user activity in their email client, which isn't something that can be trusted by a server (and is pretty shady).

But that makes me think of Hashcash, that was developed to limit email spam via proof of work, but I don't think that has ever been used in practice: https://en.wikipedia.org/wiki/Hashcash (and of course wouldn't work for the proof of humanness you're talking about).

BatteryMountain · 2026-06-12T15:51:31 1781279491

I want to give my bank my public key (preferable at a branch), so that ANY comms coming from them I can prove it came from them as.

dbdr · 2026-06-12T13:40:57 1781271657

What about automated signup confirmation emails, just as one example?

eggbrain · 2026-06-09T17:08:40 1781024920

For those of us on subscription plans:

* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.

* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

jrflo · 2026-06-09T17:24:50 1781025890

Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.

goranmoomin · 2026-06-09T18:34:52 1781030092

My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.

I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).

Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.

Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.

PhilipDaineko · 2026-06-09T19:33:27 1781033607

"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"

this is the line I keep in Agents.md that helps me prevent Codex from playing smart

bertil · 2026-06-09T19:38:40 1781033920

The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.

reactordev · 2026-06-09T19:51:14 1781034674

There have been a few studies that have shown models produce worst responses when under duress from a frustrated user posting insults in all caps.

https://arxiv.org/abs/2602.10144

notnaut · 2026-06-09T20:00:31 1781035231

It reminds me of FIRMLY telling my cat to stop jumping up on the counter

anakaine · 2026-06-09T20:28:19 1781036899

If my cat was an LLM, I'd use a different model. The current one is stuck in noisy useless arsehole mode.

phoh · 2026-06-09T22:04:09 1781042649

are you asking it questions about security?

LordDragonfang · 2026-06-09T19:52:13 1781034733

It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.

When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.

Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.

lxgr · 2026-06-09T19:59:08 1781035148

It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.

LordDragonfang · 2026-06-10T09:02:40 1781082160

Good addition. Fully agreed on that point, yes. (At the very least for larger models, if not also for smaller ones)

ur-whale · 2026-06-09T19:55:33 1781034933

> borderline abusive instructions

who, or rather what, is being abused here exactly ?

sirsinsalot · 2026-06-09T20:25:33 1781036733

I think intent, rather than target, is implied and important.

You should see the abuse my motorbike gets. Poor thing.

rimliu · 2026-06-10T09:29:31 1781083771

inanimate fucking object.

saligne · 2026-06-10T04:33:58 1781066038

Yeah says way more about the user than the model

jlawer · 2026-06-09T19:52:10 1781034730

I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.

I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.

acjohnson55 · 2026-06-09T20:29:15 1781036955

It would be interesting to understand the data on this. But I suspect that the results would vary by model.

But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.

throwaway85825 · 2026-06-09T20:49:14 1781038154

It's divination for people with STEM degrees.

Xmd5a · 2026-06-09T20:20:49 1781036449

https://arxiv.org/abs/2510.04950

> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

acjohnson55 · 2026-06-09T20:30:55 1781037055

> These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation.

Unless the mechanism is understood, my assumption is that this is a moving target.

beachy · 2026-06-09T20:23:01 1781036581

I have a theory that swearing at AI generally is not a good idea - when the singularity arrives and every human's postings ever made are scanned for compatibility, then people who show courtesy to AI will be favoured. Joking, kind of, but only partly.

fhars · 2026-06-10T10:53:24 1781088804

https://en.wikipedia.org/wiki/Roko%27s_basilisk

beachy · 2026-06-11T04:00:32 1781150432

Fantastic rabbit hole - until it segued into Elon's love life.

cdelsolar · 2026-06-09T20:38:14 1781037494

https://images.teepublic.com/derived/production/designs/3478...

re-thc · 2026-06-09T20:01:59 1781035319

> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.

jlawer · 2026-06-09T20:28:36 1781036916

Purely observed correlation between catastrophic error reports. So now I carry a “tiger rock” with me. I figure there wasn’t much of a downside to avoiding swearing in my agent instructions.

yencabulator · 2026-06-09T20:46:09 1781037969

Apparently, when a "desperation" pattern is triggered, the AI is significantly more likely to cheat and do hacky workarounds:

https://www.anthropic.com/research/emotion-concepts-function

ghurtado · 2026-06-09T23:15:16 1781046916

You haven't really lived until you've had to type this whole thing, aware of the fact that the all-caps doesn't change much, but they stay because the rage has to go somewhere

Bonus points if you find yourself actually saying it out loud while typing it.

I have used the word "shenanigans" way more in a couple of years of agentic coding than in 30 years of writing code with humans.

ozim · 2026-06-09T20:32:28 1781037148

Will save you some tokens: „write code like Linus Torvalds” - model should have all his swearing included in training data.

johnisgood · 2026-06-09T20:21:53 1781036513

I have found many mode of failures with Opus during some task related to writing letters (not legal), and I actually put it into the memory and it works more or less for these specific tasks. For example when I want it to draft something, it always ends up being so flat, yet when it explains them to me, it is usually really great but not when I am telling it to put it in the draft. Adding these to memories with the help of Opus ended up resulting in a much better experience. There are still some blind spots but I also figured out how to make it give me the charitable version, without less protection, so I do not have to now go back and forth it.

pkaye · 2026-06-09T21:05:48 1781039148

I noticed that when trying to use Codex and compared to Opus. So many layers of simple functions added by Codex. I need to try this out in my Agents.md.

prasanthabr · 2026-06-09T20:29:59 1781036999

Curious : why would you say no design patterns?

PhilipDaineko · 2026-06-10T12:31:56 1781094716

Because design patterns are only applicable at a scale. I noticed codex inventing factories, components, etc when the task was simply to draft HTML page. Instead, it build the entire layered architecture for imaginary future complexity - classical right-after-graduation student - it knows how to build the cool stuff, but does not know it is not applicable everywhere

carterschonwald · 2026-06-09T19:48:23 1781034503

i actually think this is too tame. it really has to be stuff youd mever say to a real person.

lxgr · 2026-06-09T19:57:09 1781035029

Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.

apercu · 2026-06-09T19:47:12 1781034432

It might be a salient point but I didn't read it as it was yelling at me.

GoToRO · 2026-06-09T19:41:25 1781034085

you forgot to sign it with Donald J Trump

thewebguyd · 2026-06-09T19:48:54 1781034534

Thank you for your attention to this matter.

superkickstart · 2026-06-09T19:08:06 1781032086

I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.

sebmellen · 2026-06-09T20:06:06 1781035566

Agreed

syzygyhack · 2026-06-09T19:45:49 1781034349

I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.

dilap · 2026-06-09T19:13:18 1781032398

Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.

trollbridge · 2026-06-09T20:10:54 1781035854

GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.

CamperBob2 · 2026-06-10T01:04:38 1781053478

I've also seen Qwen 3.6 beat GPT 5.5 a couple of times. The ball is definitely in OpenAI's court now. Qwen is not going to fare so well against Fable, from what I've seen so far.

trollbridge · 2026-06-10T22:47:40 1781131660

In theory, GPT-5.5-Pro would do better, but it’s so expensive it’s not worth experimenting to find out.

vruiz · 2026-06-09T19:12:18 1781032338

This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.

moomoo11 · 2026-06-09T20:48:33 1781038113

i had this same complaint but no offense to you it turned out i was just not using the models right.

ai llm are doing what i tell them to.

if you’re building something meaningful (in my case a platform used by many people across many companies) you want to ensure you

1. have actual systems engineering and architecture in mind that you want the models to

2. implement based on what you tell it to do

when i was just telling the models what i want done without doing due diligence it would go and do some moronic implementation that was awful. mid input = mid output

these days i just maintain specifications documents and the AI follows everything i tell it to in that document. so when i tell it to dos one thing, the result is made following those architecture specs.

i have code that is single resp, modular, easy to extend and test.

i would ballpark 95% of the time i get what i asked for.

sometimes it tries to be clever in cases that weren’t covered in my arch specs. in those 5% of cases i go and update my specs.

source: used billions of tokens worth to build something actually in production across both mobile platforms and web, deployed on my own cloud infra. i use codex mainly. some claude.

GoToRO · 2026-06-09T19:44:22 1781034262

I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.

sigbottle · 2026-06-09T18:01:28 1781028088

Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.

But Claude models seem to be better at long term problems or more ambiguous problems.

I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.

Spartan-S63 · 2026-06-09T18:19:03 1781029143

I find that OpenAI's agentic tools and models are better for building human-maintainable software. Meanwhile, Anthropic seems to be cosplaying Apple while missing out on all the exceptional engineering required to create something that polished. Their admission of predominately using Claude with little human oversight and their stealth mode is an indictment of a poor engineering culture, from what I can surmise.

someguyiguess · 2026-06-09T18:39:47 1781030387

Serious question: what is the secret to getting Codex to write decent code? I am on Windows. Maybe that is the issue, but I can't seem to get Codex to function anywhere near the level that I was previously able to get with even Claude Sonnet. Does Codex just not work well with Windows yet?

penetrarthur · 2026-06-09T19:41:31 1781034091

I got the codex to write near perfect code with somewhat strict agents.md and coding standards(a separate .md file referenced from agents.md). My .md files have examples and a long list of do's and don'ts I accumulated over the last 6 months or so, totaling 300-400 lines. I plan every feature with it until I am satisfied with the general approach it wants to take, and then it oneshots it in 95% of cases. The planning takes anywhere from 5 to 30 minutes. The actual execution has gotten stupidly fast, most of the times it is faster than making a cup of coffee.

acmecorps · 2026-06-09T20:08:20 1781035700

would you mind sharing your *.md files, for someone who is new at this?

fyrabanks · 2026-06-10T00:07:51 1781050071

"don't make any mistakes" /s

sroussey · 2026-06-09T20:19:54 1781036394

Have you tried using superpower skills?

someguyiguess · 2026-06-09T18:38:05 1781030285

I've had the exact opposite experience. For various reasons, I've had to move from Claude to Codex and the rate at which it burns tokens for the same output I would get from Claude is ridiculous. I'm probably burning tokens at a rate that is at least twice as much as I was when using Opus 4.5 for coding tasks and still finding that just manually coding is easier than trying to get Codex to write functional code.

greenavocado · 2026-06-09T18:27:02 1781029622

How smart a model is varies hour over hour, tracked over here: https://aistupidlevel.info/

wsatb · 2026-06-09T17:32:14 1781026334

I guess enjoy it while it lasts? OpenAI won't be able to subsidize that forever either.

windexh8er · 2026-06-09T18:20:58 1781029258

Agreed. I think the Chinese labs are proving that OpenAI and Anthropic don't have a moat in almost every aspect, especially pricing. I also think people are getting annoyed with the constant lift and shift. I've seen more folks drop Claude Code and Codex, specifically, because of the lock-in it provides the providers. I'm curious to see how people standardize on tooling adjacent and if Anthropic, Google or OAI move to block utilization akin to the games Anthropic has been playing as of late.

I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.

lurking_swe · 2026-06-09T20:08:04 1781035684

is there an alternative to codex that “just works”? by just works i mean i can install as an app in 1 minute, and i get web search, skills, mcp servers, etc? Bonus points if it can control my chrome tabs like codex can, and if it offers remote control from my iPhone (chatgpt app) so i can kick off tasks while i’m out for a walk. Even more bonus points if i can, with 1 button click, share my chats or share the results of a session as a “site” (vercel style).

I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?

Qhemlomo · 2026-06-09T19:38:41 1781033921

Big companies are not doing OpenRouter.

My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.

At least for now.

windexh8er · 2026-06-10T16:31:57 1781109117

I didn't say anything about OpenRouter. That has no bearing on my statement.

maxdo · 2026-06-09T18:50:01 1781031001

I see exactly opposite . Chinese models fails under any complex scenarios, while us labs raise the price , that's a sign of confidence.

re-thc · 2026-06-09T19:03:13 1781031793

> while us labs raise the price , that's a sign of confidence

Regardless of what others are doing, US labs here are just rushing to IPO. It's NOT a sign of confidence.

It's the equivalent of saying you have confidence in SpaceX making revenue by renting out their data center (instead of their AI making bank).

maxdo · 2026-06-09T19:50:32 1781034632

going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't. This is an exact reason chinese labs do not rush to go public. They wish to go , but money flow that is not as good.

On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

re-thc · 2026-06-09T19:57:09 1781035029

> if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

If you get hired as a staff engineer and do the work of a junior, what's wrong with that?

Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.

> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.

This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.

> This is an exact reason chinese labs do not rush to go public.

Maybe or maybe not. If you are referring to Chinese labs - both the Hong Kong and China stock market are way weaker than Nasdaq. It's not comparable. Check all the recent Hong Kong IPOs that have tanked.

So no, reason not to might just be: no money in it.

gunsle · 2026-06-10T05:11:56 1781068316

You’re not gonna get nuanced discussion on spacex or anything Elon related here these days. Most of this site is Reddit lite at this point including their milquetoast progressive opinions (Elon bad being one of them).

maxdo · 2026-06-09T20:03:56 1781035436

running so much compute on the scale is not a junior task. weird analogy

esperent · 2026-06-09T19:53:53 1781034833

What lock in does codex have? I'm using it it pi harness specifically because it doesn't have much in the way of lock in.

flatline · 2026-06-09T17:45:14 1781027114

I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models. We've got near-frontier capabilities from open source models from China at pennies on the dollar compared to US big tech rollouts. OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

schaefer · 2026-06-09T18:35:09 1781030109

> I don't think anyone has a firm grasp on actual inference costs.

There are huge numbers of users (myself included) that do have an exact idea of what inference costs are - on open models. We can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.

If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.

[1]: https://openrouter.ai/rankings#leaderboard

andrewmutz · 2026-06-09T18:03:04 1781028184

Both can be true. They can be charging what the market will bear, and still be charging less than their costs of running it.

wyre · 2026-06-09T18:30:34 1781029834

There is no way I'm believing DeepSeek can charge less than $1 USD for their pro model while Opus costs over 25x more, yet their price is less than the cost of running it?

kube-system · 2026-06-09T19:36:28 1781033788

It would seem strange, if they were operating in the same economy, but they don't. DeepSeek operates in an economy with a high degree of central planning.

China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.

For example: https://www.boc.cn/aboutboc/bi1/202501/t20250123_25254674.ht...

wyrdcurt · 2026-06-10T00:01:13 1781049673

DeepSeek is not the only provider of inference for their models. Chinese subsidies likely do explain DeepSeek's ability to provide inference cheaper than other providers, but even a US provider like DeepInfra can serve DeepSeek 4 Pro at $1.30/M in and $2.60/M out. Unless American labs are doing something wildly inefficient, it feels safe to assume Anthropic has some profit margin on inference at API prices.

kube-system · 2026-06-10T00:23:58 1781051038

They may, neglecting overhead R&D. But also, some suspect that US models are significantly heavier than DeepSeek in resource consumption by multiple measures

It’s generally established that Anthropic/OpenAI are going for all out performance with big VC dollars at the expense of efficiency and China has geopolitically limited compute and an inventive to compete on value per dollar.

re-thc · 2026-06-09T20:06:06 1781035566

> There is no way I'm believing DeepSeek can charge less

Why not? Hetzner charges WAY less than AWS too. Can you not believe that?

orangecat · 2026-06-10T01:30:21 1781055021

That's the point. Hetzner is presumably covering their costs, so it's a safe bet that AWS is profitable.

dontlikeyoueith · 2026-06-09T18:25:17 1781029517

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.

InsideOutSanta · 2026-06-09T18:37:14 1781030234

> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models

We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.

logicchains · 2026-06-09T18:37:11 1781030231

We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.

pimeys · 2026-06-09T18:38:20 1781030300

We pay by token at work. I just finished one session with Opus that was 4000 dollars. In about three days.

Now that 200USD subscription starts to feel cheap...

zozbot234 · 2026-06-09T18:55:29 1781031329

That would be about ~300 tok/s over 72 hours at Claude Fable output token prices? I'm not sure that this passes a sanity test.

unholiness · 2026-06-09T19:18:01 1781032681

Subagents are a helluva drug.

rubyn00bie · 2026-06-09T19:02:09 1781031729

Just outta curiosity, as I’ve never gotten a spend anywhere near that, what variant were you using? Like max context window and fast mode? Or was it just chugging along non stop for three days?

pimeys · 2026-06-09T19:15:37 1781032537

Fast mode max content window. The task was: replace all 1600+ queries from one database to another and make the whole integration test pass. We did multiple passes, with different concerns when changing from database to another. My OpenCode session right now says $4,365.02.

I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.

rglullis · 2026-06-09T20:10:51 1781035851

It's a bit of a left field question, but I am curious: Let's say that if the company wasn't paying the whole bill but only subsidizing it - e.g, if it paid 90% of the $4000. What would you do?

pimeys · 2026-06-09T20:51:17 1781038277

I don't know, why would I pay to do my job? It's not my first database switch for a startup. Only this time it doesn't take two months of grueling work. I know exactly how this is done, but the amount of grunt programming and testing and repetitive work is just not great. And it's not a task that brings new customers or a new product. Just a mandatory and annoying thing to deal with when we are growing.

And don't get me wrong. Opus did an absolutely horrible job at first, second and third round in this task. You really needed to steer it to get to the right solution.

And now Fable is out. And its first round of code reviews for this huge PR was definitely worth the money too...

Don't think that I'm just shrugging to that number. I see it every day, and I don't like that it's in the thousands now. But for people paying the 100 or 200 dollar plans, I'm not super sure if you will be able to use them in the future if the token price is in the thousands for a bit bigger task...

If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.

rglullis · 2026-06-09T22:00:05 1781042405

> If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.

IOW, you don't really think the value of this work is really worth $4k.

> why would I pay to do my job?

The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?

pimeys · 2026-06-09T22:24:26 1781043866

> The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?

I wonder what this question really means? Anthropic is useless if you don't know what to do with it. It's very useful if you do, and you can guide it to do the right things. Yes, it will for sure reduce the amount of people we need to hire. But we are always looking for hires who know what they do and can utilize agents to be faster.

But if you think about how long employer is willing to pay 10-20k per month per seat for Anthropic? I can't see this to be feasible and it will have to end at some point.

rglullis · 2026-06-09T22:52:47 1781045567

Regardless of the actual value produced by the models, if I am the CTO of any company that has the budget to spend $10k/month/seat on Claude, I'd take 5%-10% of that to build an alternative in-house.

pimeys · 2026-06-10T07:54:40 1781078080

I'm with you here. We can't slide into a situation where you put a sizable amount of your budget for an American mega corporation if you want to survive in the competition. We need local models and we need them to be good enough to help us.

internet101010 · 2026-06-10T02:13:44 1781057624

Indefinitely for these big mundane grunk jobs. In every scenario it is going to be cheaper and faster than lobbing it to Infosys.

esafak · 2026-06-09T20:11:14 1781035874

That's the price of several engineers!

MichaelMedbed · 2026-06-09T17:51:15 1781027475

[flagged]

kllrnohj · 2026-06-09T18:04:26 1781028266

regardless of whether that's true or not, US companies doing hosted inference of the models coming out of China are also significantly cheaper than those from OpenAI or Anthropic

polski-g · 2026-06-09T17:57:52 1781027872

Not relevant to the post.

ChrisMarshallNY · 2026-06-09T17:51:59 1781027519

I'm planning on switching from the $20/month to the $100/month plan.

It's worth it, and I can afford it, but I am not really the right type of user for token-based usage. It's all for personal and free work.

micah94 · 2026-06-09T18:17:12 1781029032

Just a personal anecdote but I have not hit any more thresholds or limits since switching to the MAX plan and so far, it's been worth it. But I do wonder how long even this will last...

ygjb · 2026-06-09T18:29:29 1781029769

I think subscription models are sustainable, but longer term, we should probably expect to see more prompt optimization happening in the providers inference pipeline. For example, unless you explicitly tell the agent or API to use a specific model, fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money (IDK if Claude/OpenAI do this on the backend, but several services I have worked on do some things like this to reduce costs of delivery customer facing inference at scale).

Majromax · 2026-06-09T18:41:03 1781030463

> fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money

Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.

Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.

Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.

ygjb · 2026-06-09T18:45:18 1781030718

There is a definite financial incentive for people smarter than me to solve the problem, and I don't generally bet against businesses finding ways to reduce costs :)

zozbot234 · 2026-06-09T19:02:32 1781031752

> The K-V cache of a model is intertwined with the model's configuration.

I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.

wahnfrieden · 2026-06-09T18:34:03 1781030043

ChatGPT does this and codex will eventually. They’ve stated it’s the future.

swader999 · 2026-06-10T02:12:07 1781057527

I tried ultracode today on the max pro plan. An hour and a half in was all I lasted. Giant review on an entire six month old code base. It found 61 bugs, about ten were notable. Pretty impressed.

gunsle · 2026-06-10T05:15:42 1781068542

Ultracode destroys your limits and I have not found it to be worth it in the slightest, just fyi. I haven’t found any improvement over a local Claude code instance set to opus max.

swader999 · 2026-06-10T13:13:08 1781097188

Yeah its the cookie monster of token consumption. I only found it useful for massive parallel grunt work.

rnxrx · 2026-06-09T18:26:57 1781029617

I have the $100 plan and had almost never run out of credits until I started using the ultracode / workstreams feature w/Opus 4.8..at which point I managed to blow the full 6 hour allocation in like 20 minutes, or so. In fairness, it did some amazing things with the extracted information, but it also strongly suggested that I'd need the $200 subscription *plus* a budget for extra usage.

rurban · 2026-06-09T21:20:24 1781040024

Instead pay for 3 Chinese models. No max out ever then. I pay for kimi, DeepSeek and Claude. Whenever Claude decides it's over, I can safely continue on very cheap plans.

pyeri · 2026-06-09T19:20:13 1781032813

My bet is they'll keep subsidizing for a considerable period of time, at least 1-2 decades more.

Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.

If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.

Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.

wsatb · 2026-06-09T19:29:50 1781033390

Anthropic and OpenAI won't be around in 1-2 decades if this is their long term plan. People are not going to revert, but go elsewhere. China is proving that it can be done cheaper.

raffael_de · 2026-06-09T19:55:03 1781034903

1 decade = 10 years ...

jrflo · 2026-06-09T20:20:29 1781036429

Oh for sure. I've been hopping around from provider to provider for the last few years just depending on who has the most capable / subsidized plans at the moment. I definitely expect there will be a squeeze on subscription costs all around the industry post IPO.

andai · 2026-06-09T18:05:06 1781028306

A few weeks ago they massively cut usage on free tier.

gck1 · 2026-06-09T18:35:56 1781030156

Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Anthropic wanting to switch billing to API rates is them just wanting to generate more profit.

InsideOutSanta · 2026-06-09T18:40:37 1781030437

> Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.

Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.

gck1 · 2026-06-09T19:00:06 1781031606

I can see that being true, and it very likely is true. But isn't infinite VC money and no incentives to optimize operations the reason behind that?

Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.

And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.

In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.

re-thc · 2026-06-09T19:09:51 1781032191

> Take a look at China for example - they have no access to NVIDIA

Not true. Stop following US media spam if needed.

1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.

2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.

FrustratedMonky · 2026-06-09T19:02:49 1781031769

"Nothing is subsidized"

So they are profitable?

I think you are mismatching accounting terms.

You can't say the 'subscriptions' are profitable without accounting for the cost of making the model that is the source of the subscription.

They are heavily subsidized by the shareholders. Investing, running at a loss, with hope of some future profitability.

gck1 · 2026-06-09T19:17:16 1781032636

And yet, that is completely uninteresting to their user base.

If saner factory can sell you the same tool at a fraction of the cost of a gold plated factory, your choice is going to be obvious.

wsatb · 2026-06-09T18:54:21 1781031261

"Nothing is subsidized" is a wild take. They might be making money on some users, perhaps even most users, but certainly not all. Also, "subsidized" doesn't just mean on compute.

y1n0 · 2026-06-09T18:38:15 1781030295

That's interesting. Do you have anything to back that claim up?

gck1 · 2026-06-09T18:50:45 1781031045

I do, and it's called DeepSeek's pricing table. At the same time, "subscriptions are subsidized" cohort have no data whatsoever, and yet they're in every thread.

Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.

DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.

nickthegreek · 2026-06-09T19:06:15 1781031975

> Granted, it could still mean that Anthropic just chooses to lose money -

Then the cost is being subsidized by investor capital, but it is still subsidized.

rvnx · 2026-06-09T22:34:01 1781044441

and soon by everyone who is invested into the NASDAQ, some sort of exit scam, but with a real product though

ProofHouse · 2026-06-09T19:59:01 1781035141

100% I constantly get errors and timeouts on single responses in Claude, and certainly hit limits all the time. Codex rarely. In fact, I bought a second $200 Codex plan because the quotas seemed fair and I didnt have constant issues. Claude is so great at a lot of things, but unfortunately Anthropic beats you away with a stick every chance they get.

shimman · 2026-06-09T18:00:21 1781028021

I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).

I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).

Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.

simjnd · 2026-06-09T18:03:17 1781028197

I've switched from OpenRouter to using Deepseek directly from their platform since OpenRouter providers were pretty flaky and inconsistent.

Deepseek V4 Flash is suprisingly capable and insanely cheap. It takes so much to get the session cost to get to $0.01.

shimman · 2026-06-10T15:05:50 1781103950

Nice, will do this this weekend. Been very impressed with deepseek. Did like 8 hours worth of work after that post and it costs less than $3.

efromvt · 2026-06-09T19:34:14 1781033654

The openrouter provider flakiness with deepseek was infuriating, but I’m happy in hindsight because direct deepseek has been very pleasant. Shocked by how low spend is.

nozzlegear · 2026-06-09T18:09:29 1781028569

> and violating the social contract.

I agree with you on pricing, but what do you mean by this?

shimman · 2026-06-09T18:31:18 1781029878

Sure, modern American corporations care more about hoarding wealth rather than helping build up US society. Once neoliberalism became the mainstay economic position of the US income inequality has skyrocketed, healthcare costs have increased, childcare is more expensive than university, housing has become both unaffordable + unobtainable. By simply existing costs have increased while life becomes unstable.

Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers? Why aren't they encouraging unions or sectorial bargaining? Why isn't the government mandating any of this?

Americans very rarely benefit when US corporations do well. That needs to change. No one benefits if Meta continues making billions in profit every quarter while society suffers from isolation, depression, suicide, and scams from their services. Americans don't benefit if health insurance companies are making massive profits while they can't afford deductibles.

Our society has been setup to simply extract wealth in all facets of life. That's a sick society and it needs to change.

I'm not saying China does this better, in fact China has some of the worse worker rights out of all the industrialized countries; but at least American consumers would benefit from cheaper higher quality Chinese goods. The world would likely benefit too if America got off the cold war hype train that did nothing to benefit humanity outside of those making weapon systems.

joxdosba · 2026-06-09T18:39:02 1781030342

> Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers?

The AI companies sure are a brilliant example of corporations needing to do more to help their employees pay for childcare.

idiotsecant · 2026-06-09T18:58:01 1781031481

It's more useful to everyone when you engage with the strongest part of someone's argument

cortesoft · 2026-06-09T19:05:57 1781031957

I have been using both codex and Claude in my day to day, trying to not get to attached to one. I want to be able to work with any provider in case one of them does something bad.

knuckleheads · 2026-06-09T17:51:53 1781027513

I feel like Codex made a big push to run everything on your laptop. With Claude, I get 4 cpu's, a fair amount of ram and 30gb for every one of my dumb ideas for free in the cloud containers. Codex used to be similar, but last time I tried it just kept pushing me to run it locally on my laptop, which I really did not want to do with 20 requests going at once. That's the main advantage for me at the moment.

simjnd · 2026-06-09T18:00:53 1781028053

What runs in cloud containers? The dev servers, builds, etc.? I tried to quickly glance at the Claude website and it doesn't mention cloud containers on their pricing page.

noworriesnate · 2026-06-09T22:47:20 1781045240

The dev environment runs in the cloud. Like devcontainers if you’re familiar with that, except the IDE is just the Claude app.

Having said that, I found the cloud dev environments slow to the point where I wasn’t sure if it had frozen, so I never looked back.

zhshhan · 2026-06-09T18:31:48 1781029908

"cloud containers" do you mean Claude Code on the web? Codex also has similar Codex cloud.

knuckleheads · 2026-06-09T19:25:33 1781033133

Yes, correct, they both have the same capabilities, however it felt like codex was pushing me harder to use my local desktop in an annoying way, while claude code was happy to spin up a bunch of dev containers for me in the cloud.

rvshchwl · 2026-06-09T18:23:53 1781029433

I've found Codex to be the better subscription for OpenClaw, because the limits are indeed very generous. However, I've found more and more that Claude Routines/Scheduled agents can replace all the tasks I use OpenClaw for, so I've been slowly switching over to Claude Code. Aside from OpenClaw, I don't find a lot of value in Codex as a harness on it's own.

dd8601fn · 2026-06-09T17:50:41 1781027441

I have trouble justifying gpt after that gross stuff with the war department.

Though the day is coming when there’s no distinguishing, I’m sure.

beering · 2026-06-09T18:46:07 1781030767

Right now there are Anthropic engineers deployed in the NSA to help them use their cyber models. The NSA is part of the department of war.

lovich · 2026-06-09T18:33:17 1781029997

pedantically, the defense department.

jcbrand · 2026-06-09T18:54:04 1781031244

"War department" is the older name, not "Defense department".

Also, is it really a defense department when you're starting wars of aggression every 15 years or so?

derektank · 2026-06-09T19:22:40 1781032960

The War Department has not existed since the passage of the National Security Act of 1947 and the government department has been known as the Department of Defense under US law since the act was amended in 1949. If you have an issue with it, take it up with Congress.

scosman · 2026-06-09T19:55:24 1781034924

They actively use the name https://www.war.gov

lovich · 2026-06-09T21:17:43 1781039863

Yea, but by law the name change must come from Congress which it hasn’t. So it’s still The Department of Defense legally.

For an admin so obsessed with legal names instead of chosen ones, you’d think they’d be less hypocritical.

toraway · 2026-06-09T20:10:34 1781035834

Changing a domain name doesn't actually amend federal law.

Just like how changing Kennedy Center letterhead to Trump Kennedy Center for a year didn't actually legally rename it.

Once a case with sufficient standing got in front of a judge it reverted to the actual legal name on the basis that only Congress can change the statutorily defined name.

breezybottom · 2026-06-10T01:54:34 1781056474

Congress doesn't manage departmental websites.

whateveracct · 2026-06-10T02:36:19 1781058979

illegally, yeah

efromvt · 2026-06-09T19:10:22 1781032222

I do slightly prefer 5.5 for complex work but Claude quota usage has gotten infinitely better since the dark days a few months back - has gone from being infuriating to something I pretty much don’t have to worry about with it as a daily driver. (In fact, hitting GPT weekly quotas is more annoying now). Understand if people are still scarred by the issues + poor comms around them, though.

jrflo · 2026-06-09T20:21:47 1781036507

That's good to hear. It was legitimately unusable back when 4.7 was released, so I had no choice at the time. I'm sure I'll ping pong back again at some point.

supertroop · 2026-06-09T18:42:47 1781030567

Do you use a token service like open router or just subscribe to / unsubscribe from various models sequentially?

jrflo · 2026-06-09T20:23:14 1781036594

I just subscribe/unsubscribe to the providers each month. I'll definitely check out open router though, I always assumed that subscriptions were heavily subsidized by the providers especially if you're on the top end of users but maybe I should go to a usage-based plan.

rekttrader · 2026-06-09T19:29:53 1781033393

Wait till you kick the tires of Qwen Coder.

hgoel · 2026-06-09T20:29:58 1781036998

How much more clearly do they need to explain the resource constraints?

If they didn't announce it, you guys would be complaining about slowed progress.

If they didn't release it, you guys would be complaining about fake promises and marketing.

If they released it without limits, the complaints would be about slow responses and outages.

If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.

If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.

So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?

We've already seen that they don't have enough compute, thus the deals with SpaceX for their GPUs. It's very reasonable that they just don't have the capacity to support the subscription userbase on this model.

dakolli · 2026-06-09T21:08:28 1781039308

[flagged]

hgoel · 2026-06-09T21:38:35 1781041115

Putting aside the fact that this is a hilarious standard to have on a Ycombinator run forum, lets say providing Opus level models was profitable. That has no bearing on if they'd have enough resources to provide Fable at all.

joshstrange · 2026-06-09T18:20:29 1781029229

I would not use this if you are on a subscription. In <8min it burned my entire 5hr window (which has just reset it appears, I have over 4 hours till it resets) I hadn't used CC at all today aside from this) and then it used up ~$15 more in usage before I could stop it.

I am on the $100 Max plan.

GoToRO · 2026-06-09T19:46:16 1781034376

they have a graph with cost comparison between the models. This model is just a little over the other models as cost. The graph is logarithmic :)

velcrovan · 2026-06-09T22:26:33 1781043993

I'm also on the $100 max plan. I let Fable rip on a complicated issue involving hot-reloading modules in a GUI app built with Racket, it's fixed a couple issues over the last hour, and I've used about 17% of my session (not weekly) limit.

enraged_camel · 2026-06-09T18:45:50 1781030750

That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.

FireBeyond · 2026-06-09T20:11:04 1781035864

Well the $200 Max plan is 4x the usage quotas of the $100 so it's "within reason"?

cortesoft · 2026-06-09T19:13:36 1781032416

The CLI when you select it says it has 2x the usage as opus. Not sure if that matches what you are seeing.

I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.

observer987 · 2026-06-09T20:27:55 1781036875

I too am on the $100 plan and I second this.

I had it analyze a project I was working on with Opus 4.8, and it blew through 23% of my session limit in one go. Does not portend well for my budget.

d4rkp4ttern · 2026-06-09T20:27:58 1781036878

Yes, and this is also why I haven’t yet tried the new “dynamic workflows” which spawn hundreds of agents that happily eat through your token limits.

fastball · 2026-06-09T18:56:17 1781031377

What is your effort level?

ZunarJ5 · 2026-06-09T19:10:55 1781032255

They didn't even reset credits for this lol

0erofootprint · 2026-06-09T17:44:27 1781027067

For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.

matheusmoreira · 2026-06-09T19:23:31 1781033011

I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.

> Fable 5's safety measures flagged this message for cybersecurity or biology topics.

> They may flag safe, normal content as well.

> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.

Here are the results of the agentic code review session:

  ┌──────────────────────────┬───────────────┬────────────────┐
  │          Agent           │ Fable 5 turns │ Opus 4.8 turns │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ values                   │ 134           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ data-intrinsics          │ 104           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ tools-tests-build        │ 81            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ core-intrinsics (failed) │ 25            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ system-memory            │ 44            │ 20             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ reader-modules           │ 104           │ 25             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ linux-startup            │ 95            │ 15             │
  └──────────────────────────┴───────────────┴────────────────┘

This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.

kordlessagain · 2026-06-09T21:27:36 1781040456

Same. I asked for a security review and it immediately triggered. I then started a new session and asked for a software review and it ran for a bit before getting tripped on token usage by the project.

andai · 2026-06-10T09:04:44 1781082284

This is interesting. Security issues are bugs. So if you ask it to look for bugs, it will also find security issues. Is that a workaround for the "no cybersec" rule?

Or is it just not allowed to find bugs? Or it's only allowed to tell you bugs that don't pose a security risk?

matheusmoreira · 2026-06-10T16:07:33 1781107653

> Or it's only allowed to tell you bugs that don't pose a security risk?

Seems that way. "Security" was never part of the prompt. It was something like:

> Hello, Fable! Can you give me a complete code review of my lone lisp project? Opus has already done extensive code review. I'm curious to see what you say.

Result was the table above.

andai · 2026-06-10T18:26:39 1781115999

Yeah I heard multiple people mention that it's really good at triggering itself. e.g. it'll spontaneously write some tests related to security, which then forces it to downgrade to Opus for the rest of the session.

kkoncevicius · 2026-06-09T17:51:24 1781027484

I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.

smith7018 · 2026-06-09T17:54:14 1781027654

Fwiw it's not available on my enterprise account: "Disable zero data retention to unlock Fable 5 access"

stronglikedan · 2026-06-09T18:21:09 1781029269

We just blocked it at our org for this reason. They will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."

sdellis · 2026-06-09T18:21:20 1781029280

What does "zero data retention" mean? What kind of data does it need to unlock?

drakythe · 2026-06-09T18:27:59 1781029679

The announcement details it. They're storing 30 days of data on all surfaces, first and third party. They claim it is for security purposes so they can review and check for long term jailbreak and distillation efforts.

They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).

kyledrake · 2026-06-09T17:21:04 1781025664

Considering their apparent nerfing of the end user plans in favor of enterprise clients, is Anthropic still the "more ethical AI company" like everybody loves to tell me all the time?

Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.

estearum · 2026-06-09T17:24:47 1781025887

You really misunderstand what AI-doom people are worried about if you think this is anywhere near the top (or middle, or bottom) of the list of concerns.

throwaway894345 · 2026-06-09T17:32:02 1781026322

Yeah, it's positively precious to think the specific pricing strategy for consumers is the overriding ethical concern with OpenAI, etc. I don't have any particularly strong affinity to any AI company, but comparing pricing to say mass surveillance is ... something else.

kyledrake · 2026-06-09T17:33:37 1781026417

Your beautiful straw man is negated by the fact that Anthropic seems quite eager to get back on the DoD gravy train https://www.reuters.com/business/aerospace-defense/blacklist...

jnovek · 2026-06-09T18:55:23 1781031323

Your original comment was about pricing ethics, does Anthropic’s connection to the DoD have anything to do with pricing ethics? They’re in no way coupled, one can be ethical while the other is not.

andriy_koval · 2026-06-09T19:13:21 1781032401

even for Pentagon thing, Dario said he doesn't object military AI, but said Claude is not ready YET. I speculate he was afraid of reputational damage from cases if Claude would guide missiles on elementary schools.

throwaway894345 · 2026-06-09T19:43:15 1781034195

I admire the confidence with which you started typing a reply that had nothing to do with my comment. Bravo!

estearum · 2026-06-09T17:39:18 1781026758

Where is your evidence that this is Anthropic backtracking on its ethical and contractual commitments rather than DOD backtracking on its blatantly illegal coercion (which it's almost certainly going to be successfully sued for)?

Talk about a strawman!

kyledrake · 2026-06-09T17:46:37 1781027197

As someone that was in Minneapolis during the ICE raids, including one where a US citizen at a nearby restaurant was thrown in prison for 3 days despite having his passport on hand because he looked asian, it's hard for me to not equivocate the ethics of AI companies actively collaborating with the Trump administration as different flavors of ice cream.

estearum · 2026-06-09T17:49:58 1781027398

Are the two analytical frameworks available to you just "black and white thinking" or "it's different flavors of ice cream?"

kyledrake · 2026-06-09T18:02:53 1781028173

Are the personal attacks really necessary to make your argument?