Gauntlet AI I believe is correlated originally with Lambda School (YC S17). YC founders I believe are able to post job postings on Hacker News, although this might stretch the definition a bit...
Austen Allred started GauntletAI after his Lambda School bootcamp (now BloomTech) was fined, banned from participating in lending activities, and became too toxic to escape their old brand.
It's not clear to me why he gets to post privileged ads on Hacker News. Is GauntletAI a division of BloomTech, and therefore considered a YC portfolio company?
These aren't even job ads. GauntletAI is a recruiting play. They make money by getting strong candidates to apply and then collecting recruiting fees from companies for placement. They really do have people travel to Austin for some disorganized vibecoding classes with their vibecoded output used for resume building to increase their odds of getting placed (and therefore GauntletAI getting paid).
It's just the evolution of their bootcamp model updated for AI and the fact that their founder was banned from participating in lending agreements due to their deceptive practices. Now they're trying to collect money from the companies instead.
So this isn't even a job ad. It's a recruiter soliciting candidates. I didn't think YC companies were allowed to use their postings to advertise services.
As one of Austen’s haters, he’s fun to follow. End of last year he announced that he was going to relaunch a bankrupt company called Marin Software and have a documentary crew follow them and do it all with AI in a month or something.
A few weeks later, wow, they’ve booked $1.2m of revenue! And then he never mentioned it again. Documentary never surfaced. Website doesn’t work.
I feel we need a "proof of work by human" for emails. Something that could be signed that attests that someone took the time to write the email, not just sent a template / used AI to auto-generate a personal looking email, etc. Sure that could be gamed as well (have an AI write characters one by one to look more human-like), but taking more time usually is a fairly good blocker for spammers / salespersons / etc.
I would love for a proof of human work to exist, but how would you even do that? It would need to be monitoring the user activity in their email client, which isn't something that can be trusted by a server (and is pretty shady).
But that makes me think of Hashcash, that was developed to limit email spam via proof of work, but I don't think that has ever been used in practice: https://en.wikipedia.org/wiki/Hashcash (and of course wouldn't work for the proof of humanness you're talking about).
* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.
The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.
My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.
I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).
Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.
Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.
"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"
this is the line I keep in Agents.md that helps me prevent Codex from playing smart
The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.
It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.
When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.
Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.
It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.
I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.
We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.
I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.
> These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation.
Unless the mechanism is understood, my assumption is that this is a moving target.
I have a theory that swearing at AI generally is not a good idea - when the singularity arrives and every human's postings ever made are scanned for compatibility, then people who show courtesy to AI will be favoured. Joking, kind of, but only partly.
> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.
How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.
Purely observed correlation between catastrophic error reports. So now I carry a “tiger rock” with me. I figure there wasn’t much of a downside to avoiding swearing in my agent instructions.
You haven't really lived until you've had to type this whole thing, aware of the fact that the all-caps doesn't change much, but they stay because the rage has to go somewhere
Bonus points if you find yourself actually saying it out loud while typing it.
I have used the word "shenanigans" way more in a couple of years of agentic coding than in 30 years of writing code with humans.
I have found many mode of failures with Opus during some task related to writing letters (not legal), and I actually put it into the memory and it works more or less for these specific tasks. For example when I want it to draft something, it always ends up being so flat, yet when it explains them to me, it is usually really great but not when I am telling it to put it in the draft. Adding these to memories with the help of Opus ended up resulting in a much better experience. There are still some blind spots but I also figured out how to make it give me the charitable version, without less protection, so I do not have to now go back and forth it.
I noticed that when trying to use Codex and compared to Opus. So many layers of simple functions added by Codex. I need to try this out in my Agents.md.
Because design patterns are only applicable at a scale. I noticed codex inventing factories, components, etc when the task was simply to draft HTML page. Instead, it build the entire layered architecture for imaginary future complexity - classical right-after-graduation student - it knows how to build the cool stuff, but does not know it is not applicable everywhere
Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.
I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.
I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.
Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.
GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.
I've also seen Qwen 3.6 beat GPT 5.5 a couple of times. The ball is definitely in OpenAI's court now. Qwen is not going to fare so well against Fable, from what I've seen so far.
This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.
i had this same complaint but no offense to you it turned out i was just not using the models right.
ai llm are doing what i tell them to.
if you’re building something meaningful (in my case a platform used by many people across many companies) you want to ensure you
1. have actual systems engineering and architecture in mind that you want the models to
2. implement based on what you tell it to do
when i was just telling the models what i want done without doing due diligence it would go and do some moronic implementation that was awful. mid input = mid output
these days i just maintain specifications documents and the AI follows everything i tell it to in that document. so when i tell it to dos one thing, the result is made following those architecture specs.
i have code that is single resp, modular, easy to extend and test.
i would ballpark 95% of the time i get what i asked for.
sometimes it tries to be clever in cases that weren’t covered in my arch specs. in those 5% of cases i go and update my specs.
source: used billions of tokens worth to build something actually in production across both mobile platforms and web, deployed on my own cloud infra. i use codex mainly. some claude.
I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.
Codex IME is just smarter, I think it shows given both anecdotes but also how OpenAI has always been at the front of programming competitions and math problems.
But Claude models seem to be better at long term problems or more ambiguous problems.
I'm curious as to what the primary benefit here. Are there secret improvements in training? There hasn't been much in fundamental model architecture, I don't think. What about harnesses? I wonder what's pushing the AI. It seems like harnesses is the main thing pushing AI ever since CoT.
I find that OpenAI's agentic tools and models are better for building human-maintainable software. Meanwhile, Anthropic seems to be cosplaying Apple while missing out on all the exceptional engineering required to create something that polished. Their admission of predominately using Claude with little human oversight and their stealth mode is an indictment of a poor engineering culture, from what I can surmise.
Serious question: what is the secret to getting Codex to write decent code? I am on Windows. Maybe that is the issue, but I can't seem to get Codex to function anywhere near the level that I was previously able to get with even Claude Sonnet. Does Codex just not work well with Windows yet?
I got the codex to write near perfect code with somewhat strict agents.md and coding standards(a separate .md file referenced from agents.md). My .md files have examples and a long list of do's and don'ts I accumulated over the last 6 months or so, totaling 300-400 lines.
I plan every feature with it until I am satisfied with the general approach it wants to take, and then it oneshots it in 95% of cases. The planning takes anywhere from 5 to 30 minutes. The actual execution has gotten stupidly fast, most of the times it is faster than making a cup of coffee.
I've had the exact opposite experience. For various reasons, I've had to move from Claude to Codex and the rate at which it burns tokens for the same output I would get from Claude is ridiculous. I'm probably burning tokens at a rate that is at least twice as much as I was when using Opus 4.5 for coding tasks and still finding that just manually coding is easier than trying to get Codex to write functional code.
Agreed. I think the Chinese labs are proving that OpenAI and Anthropic don't have a moat in almost every aspect, especially pricing. I also think people are getting annoyed with the constant lift and shift. I've seen more folks drop Claude Code and Codex, specifically, because of the lock-in it provides the providers. I'm curious to see how people standardize on tooling adjacent and if Anthropic, Google or OAI move to block utilization akin to the games Anthropic has been playing as of late.
I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.
is there an alternative to codex that “just works”? by just works i mean i can install as an app in 1 minute, and i get web search, skills, mcp servers, etc? Bonus points if it can control my chrome tabs like codex can, and if it offers remote control from my iPhone (chatgpt app) so i can kick off tasks while i’m out for a walk. Even more bonus points if i can, with 1 button click, share my chats or share the results of a session as a “site” (vercel style).
I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?
My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.
going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't. This is an exact reason chinese labs do not rush to go public. They wish to go , but money flow that is not as good.
On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?
> if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?
If you get hired as a staff engineer and do the work of a junior, what's wrong with that?
Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.
> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.
This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.
> This is an exact reason chinese labs do not rush to go public.
Maybe or maybe not. If you are referring to Chinese labs - both the Hong Kong and China stock market are way weaker than Nasdaq. It's not comparable. Check all the recent Hong Kong IPOs that have tanked.
So no, reason not to might just be: no money in it.
You’re not gonna get nuanced discussion on spacex or anything Elon related here these days. Most of this site is Reddit lite at this point including their milquetoast progressive opinions (Elon bad being one of them).
I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models. We've got near-frontier capabilities from open source models from China at pennies on the dollar compared to US big tech rollouts. OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
> I don't think anyone has a firm grasp on actual inference costs.
There are huge numbers of users (myself included) that do have an exact idea of what inference costs are - on open models. We can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.
If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.
There is no way I'm believing DeepSeek can charge less than $1 USD for their pro model while Opus costs over 25x more, yet their price is less than the cost of running it?
It would seem strange, if they were operating in the same economy, but they don't. DeepSeek operates in an economy with a high degree of central planning.
China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.
DeepSeek is not the only provider of inference for their models. Chinese subsidies likely do explain DeepSeek's ability to provide inference cheaper than other providers, but even a US provider like DeepInfra can serve DeepSeek 4 Pro at $1.30/M in and $2.60/M out. Unless American labs are doing something wildly inefficient, it feels safe to assume Anthropic has some profit margin on inference at API prices.
They may, neglecting overhead R&D. But also, some suspect that US models are significantly heavier than DeepSeek in resource consumption by multiple measures
It’s generally established that Anthropic/OpenAI are going for all out performance with big VC dollars at the expense of efficiency and China has geopolitically limited compute and an inventive to compete on value per dollar.
> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.
> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models
We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.
> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?
Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.
We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.
Just outta curiosity, as I’ve never gotten a spend anywhere near that, what variant were you using? Like max context window and fast mode? Or was it just chugging along non stop for three days?
Fast mode max content window. The task was: replace all 1600+ queries from one database to another and make the whole integration test pass. We did multiple passes, with different concerns when changing from database to another. My OpenCode session right now says $4,365.02.
I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.
It's a bit of a left field question, but I am curious: Let's say that if the company wasn't paying the whole bill but only subsidizing it - e.g, if it paid 90% of the $4000. What would you do?
I don't know, why would I pay to do my job? It's not my first database switch for a startup. Only this time it doesn't take two months of grueling work. I know exactly how this is done, but the amount of grunt programming and testing and repetitive work is just not great. And it's not a task that brings new customers or a new product. Just a mandatory and annoying thing to deal with when we are growing.
And don't get me wrong. Opus did an absolutely horrible job at first, second and third round in this task. You really needed to steer it to get to the right solution.
And now Fable is out. And its first round of code reviews for this huge PR was definitely worth the money too...
Don't think that I'm just shrugging to that number. I see it every day, and I don't like that it's in the thousands now. But for people paying the 100 or 200 dollar plans, I'm not super sure if you will be able to use them in the future if the token price is in the thousands for a bit bigger task...
If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.
> If I'd pay this from my own pocket, I'd definitely go with DeepSeek or local models and figure it out how to make the best use of them.
IOW, you don't really think the value of this work is really worth $4k.
> why would I pay to do my job?
The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?
> The question is: how long do you think that you employer will be willing to pay for you and Anthropic, if you yourself said if it were your money you'd put some time and effort to work with an open model?
I wonder what this question really means? Anthropic is useless if you don't know what to do with it. It's very useful if you do, and you can guide it to do the right things. Yes, it will for sure reduce the amount of people we need to hire. But we are always looking for hires who know what they do and can utilize agents to be faster.
But if you think about how long employer is willing to pay 10-20k per month per seat for Anthropic? I can't see this to be feasible and it will have to end at some point.
Regardless of the actual value produced by the models, if I am the CTO of any company that has the budget to spend $10k/month/seat on Claude, I'd take 5%-10% of that to build an alternative in-house.
I'm with you here. We can't slide into a situation where you put a sizable amount of your budget for an American mega corporation if you want to survive in the competition. We need local models and we need them to be good enough to help us.
regardless of whether that's true or not, US companies doing hosted inference of the models coming out of China are also significantly cheaper than those from OpenAI or Anthropic
Just a personal anecdote but I have not hit any more thresholds or limits since switching to the MAX plan and so far, it's been worth it. But I do wonder how long even this will last...
I think subscription models are sustainable, but longer term, we should probably expect to see more prompt optimization happening in the providers inference pipeline. For example, unless you explicitly tell the agent or API to use a specific model, fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money (IDK if Claude/OpenAI do this on the backend, but several services I have worked on do some things like this to reduce costs of delivery customer facing inference at scale).
> fronting the inference layer with a caching prompt classifier to determine which model to use, and automatically select the lowest cost model would probably already save alot of money
Unfortunately, that doesn't work within a single session. The K-V cache of a model is intertwined with the model's configuration. Switching models invalidates the cache, meaning everything up to the point of the switchover is processed like a new, uncached input token.
Per Anthropic's pricing doc, an Opus 4.8 cache hit costs 50¢/MTok, while Haiku costs $1/MTok for uncached input.
Model selection works best if sessions are short and self-contained, particularly if the first few interactions can reliably classify the model need. That probably covers most 'support chatbot' use-cases, but it doesn't describe the kinds of heavy agentic automation that really chews through token budgets.
There is a definite financial incentive for people smarter than me to solve the problem, and I don't generally bet against businesses finding ways to reduce costs :)
> The K-V cache of a model is intertwined with the model's configuration.
I don't think this is true if you simply quantize the model or run it with fewer active experts? The underlying weights would stay the same. You could also play further tricks with skipping some of the model's middle layers outright, which works surprisingly well due to how skip connections are used.
I tried ultracode today on the max pro plan. An hour and a half in was all I lasted. Giant review on an entire six month old code base. It found 61 bugs, about ten were notable. Pretty impressed.
Ultracode destroys your limits and I have not found it to be worth it in the slightest, just fyi. I haven’t found any improvement over a local Claude code instance set to opus max.
I have the $100 plan and had almost never run out of credits until I started using the ultracode / workstreams feature w/Opus 4.8..at which point I managed to blow the full 6 hour allocation in like 20 minutes, or so. In fairness, it did some amazing things with the extracted information, but it also strongly suggested that I'd need the $200 subscription *plus* a budget for extra usage.
Instead pay for 3 Chinese models. No max out ever then. I pay for kimi, DeepSeek and Claude. Whenever Claude decides it's over, I can safely continue on very cheap plans.
My bet is they'll keep subsidizing for a considerable period of time, at least 1-2 decades more.
Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.
If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.
Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.
Anthropic and OpenAI won't be around in 1-2 decades if this is their long term plan. People are not going to revert, but go elsewhere. China is proving that it can be done cheaper.
Oh for sure. I've been hopping around from provider to provider for the last few years just depending on who has the most capable / subsidized plans at the moment. I definitely expect there will be a squeeze on subscription costs all around the industry post IPO.
> Nothing is subsidized. Subscriptions are profitable for both Anthropic and OpenAI.
Even if subscriptions are locally profitable (i. e., the cost of the subscription covers the cost of inference), they're still subsidized because they don't cover training and running the company; otherwise, these companies would be profitable.
I can see that being true, and it very likely is true. But isn't infinite VC money and no incentives to optimize operations the reason behind that?
Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.
And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.
In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.
> Take a look at China for example - they have no access to NVIDIA
Not true. Stop following US media spam if needed.
1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.
2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.
"Nothing is subsidized" is a wild take. They might be making money on some users, perhaps even most users, but certainly not all. Also, "subsidized" doesn't just mean on compute.
I do, and it's called DeepSeek's pricing table. At the same time, "subscriptions are subsidized" cohort have no data whatsoever, and yet they're in every thread.
Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.
DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.
100% I constantly get errors and timeouts on single responses in Claude, and certainly hit limits all the time. Codex rarely. In fact, I bought a second $200 Codex plan because the quotas seemed fair and I didnt have constant issues. Claude is so great at a lot of things, but unfortunately Anthropic beats you away with a stick every chance they get.
I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).
I'm doing basic web development here utilizing animejs. Nothing too complicated (mostly saving time doing the scaffolding, still write the bulk of animations manually).
Truly believe that American companies are going to get completely curb stomped by China due to greed, ineptitude, and violating the social contract.
The openrouter provider flakiness with deepseek was infuriating, but I’m happy in hindsight because direct deepseek has been very pleasant. Shocked by how low spend is.
Sure, modern American corporations care more about hoarding wealth rather than helping build up US society. Once neoliberalism became the mainstay economic position of the US income inequality has skyrocketed, healthcare costs have increased, childcare is more expensive than university, housing has become both unaffordable + unobtainable. By simply existing costs have increased while life becomes unstable.
Why aren't corporations doing more to help workers with childcare? Why aren't they doing more profit sharing with workers? Why aren't they encouraging unions or sectorial bargaining? Why isn't the government mandating any of this?
Americans very rarely benefit when US corporations do well. That needs to change. No one benefits if Meta continues making billions in profit every quarter while society suffers from isolation, depression, suicide, and scams from their services. Americans don't benefit if health insurance companies are making massive profits while they can't afford deductibles.
Our society has been setup to simply extract wealth in all facets of life. That's a sick society and it needs to change.
I'm not saying China does this better, in fact China has some of the worse worker rights out of all the industrialized countries; but at least American consumers would benefit from cheaper higher quality Chinese goods. The world would likely benefit too if America got off the cold war hype train that did nothing to benefit humanity outside of those making weapon systems.
I have been using both codex and Claude in my day to day, trying to not get to attached to one. I want to be able to work with any provider in case one of them does something bad.
I feel like Codex made a big push to run everything on your laptop. With Claude, I get 4 cpu's, a fair amount of ram and 30gb for every one of my dumb ideas for free in the cloud containers. Codex used to be similar, but last time I tried it just kept pushing me to run it locally on my laptop, which I really did not want to do with 20 requests going at once. That's the main advantage for me at the moment.
What runs in cloud containers? The dev servers, builds, etc.? I tried to quickly glance at the Claude website and it doesn't mention cloud containers on their pricing page.
Yes, correct, they both have the same capabilities, however it felt like codex was pushing me harder to use my local desktop in an annoying way, while claude code was happy to spin up a bunch of dev containers for me in the cloud.
I've found Codex to be the better subscription for OpenClaw, because the limits are indeed very generous. However, I've found more and more that Claude Routines/Scheduled agents can replace all the tasks I use OpenClaw for, so I've been slowly switching over to Claude Code. Aside from OpenClaw, I don't find a lot of value in Codex as a harness on it's own.
The War Department has not existed since the passage of the National Security Act of 1947 and the government department has been known as the Department of Defense under US law since the act was amended in 1949. If you have an issue with it, take it up with Congress.
Changing a domain name doesn't actually amend federal law.
Just like how changing Kennedy Center letterhead to Trump Kennedy Center for a year didn't actually legally rename it.
Once a case with sufficient standing got in front of a judge it reverted to the actual legal name on the basis that only Congress can change the statutorily defined name.
I do slightly prefer 5.5 for complex work but Claude quota usage has gotten infinitely better since the dark days a few months back - has gone from being infuriating to something I pretty much don’t have to worry about with it as a daily driver. (In fact, hitting GPT weekly quotas is more annoying now). Understand if people are still scarred by the issues + poor comms around them, though.
That's good to hear. It was legitimately unusable back when 4.7 was released, so I had no choice at the time. I'm sure I'll ping pong back again at some point.
I just subscribe/unsubscribe to the providers each month. I'll definitely check out open router though, I always assumed that subscriptions were heavily subsidized by the providers especially if you're on the top end of users but maybe I should go to a usage-based plan.
How much more clearly do they need to explain the resource constraints?
If they didn't announce it, you guys would be complaining about slowed progress.
If they didn't release it, you guys would be complaining about fake promises and marketing.
If they released it without limits, the complaints would be about slow responses and outages.
If they didn't add to susbcription plans, the complaints would be about phasing out subscriptions.
If they added to subscriptions with cost reflecting their resource availability, the complaints would be about how quickly it eats limits.
So they choose the middle ground of providing some initial access and assessing if they can satisfy demand, only to still be ignored and accused of trying to get users hooked?
We've already seen that they don't have enough compute, thus the deals with SpaceX for their GPUs. It's very reasonable that they just don't have the capacity to support the subscription userbase on this model.
Putting aside the fact that this is a hilarious standard to have on a Ycombinator run forum, lets say providing Opus level models was profitable. That has no bearing on if they'd have enough resources to provide Fable at all.
I would not use this if you are on a subscription. In <8min it burned my entire 5hr window (which has just reset it appears, I have over 4 hours till it resets) I hadn't used CC at all today aside from this) and then it used up ~$15 more in usage before I could stop it.
I'm also on the $100 max plan. I let Fable rip on a complicated issue involving hot-reloading modules in a GUI app built with Racket, it's fixed a couple issues over the last hour, and I've used about 17% of my session (not weekly) limit.
That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.
The CLI when you select it says it has 2x the usage as opus. Not sure if that matches what you are seeing.
I do wonder if you switched models mid-session, you would have lost all your cache. Reloading the context into cache can really eat through your usage.
For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.
I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.
> Fable 5's safety measures flagged this message for cybersecurity or biology topics.
> They may flag safe, normal content as well.
> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.
Here are the results of the agentic code review session:
This 40 minute session cost me 16% of my weekly usage. A simple code review of the most critical areas of my project got flagged as a cybersecurity risk. It really made me not want to try it again.
Same. I asked for a security review and it immediately triggered. I then started a new session and asked for a software review and it ran for a bit before getting tripped on token usage by the project.
This is interesting. Security issues are bugs. So if you ask it to look for bugs, it will also find security issues. Is that a workaround for the "no cybersec" rule?
Or is it just not allowed to find bugs? Or it's only allowed to tell you bugs that don't pose a security risk?
> Or it's only allowed to tell you bugs that don't pose a security risk?
Seems that way. "Security" was never part of the prompt. It was something like:
> Hello, Fable! Can you give me a complete code review of my lone lisp project? Opus has already done extensive code review. I'm curious to see what you say.
Yeah I heard multiple people mention that it's really good at triggering itself. e.g. it'll spontaneously write some tests related to security, which then forces it to downgrade to Opus for the rest of the session.
I had a similar experience. I wanted to test it by asking it to summarise a scientific OMICs-related paper. It gave a warning about me potentially developing a bio-weapon or something like that. And switched back to Opus 4.8.
We just blocked it at our org for this reason. They will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."
The announcement details it. They're storing 30 days of data on all surfaces, first and third party. They claim it is for security purposes so they can review and check for long term jailbreak and distillation efforts.
They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).
Considering their apparent nerfing of the end user plans in favor of enterprise clients, is Anthropic still the "more ethical AI company" like everybody loves to tell me all the time?
Assuming this isn't just a supply issue on their side, nothing says "ethical AI" like only allowing mega corporations to use it through cost barriers.
You really misunderstand what AI-doom people are worried about if you think this is anywhere near the top (or middle, or bottom) of the list of concerns.
Yeah, it's positively precious to think the specific pricing strategy for consumers is the overriding ethical concern with OpenAI, etc. I don't have any particularly strong affinity to any AI company, but comparing pricing to say mass surveillance is ... something else.
Your original comment was about pricing ethics, does Anthropic’s connection to the DoD have anything to do with pricing ethics? They’re in no way coupled, one can be ethical while the other is not.
even for Pentagon thing, Dario said he doesn't object military AI, but said Claude is not ready YET. I speculate he was afraid of reputational damage from cases if Claude would guide missiles on elementary schools.
Where is your evidence that this is Anthropic backtracking on its ethical and contractual commitments rather than DOD backtracking on its blatantly illegal coercion (which it's almost certainly going to be successfully sued for)?
As someone that was in Minneapolis during the ICE raids, including one where a US citizen at a nearby restaurant was thrown in prison for 3 days despite having his passport on hand because he looked asian, it's hard for me to not equivocate the ethics of AI companies actively collaborating with the Trump administration as different flavors of ice cream.
Setting aside the simple fact that there is no ethical consumption under capitalism, the reality is that regardless of how Anthropic feels, it is becoming clear that many, if not all countries regard AI developments as strategic technologies (and they should).
Anthropic needs to be at least somewhat in the good graces of a capricious administration that is already under pressure from businesses and citizens to regulate AI companies across multiple different domains, whether it's energy consumption, job displacement, military and defense applications, surveillance, etc.
If Anthropic wants to survive, they need to acquire influence with the government that most impacts them as an American company, and a massive exporter of services in the AI space to other countries, otherwise they could get locked down and locked out of the market for national security reasons.
It sucks, but sometimes the survival choice is to make an ethical compromise in hopes that you can still be around to make better decisions later.
> Setting aside the simple fact that there is no ethical consumption under capitalism
This "simple" fact needs quite a bit of additional context and work. Making grandiose ethical claims like this can be countered with other grandiose claims such as the fact that there is no ethical existence under communism or socialism.
Sure. Why not, I'm bored today and waiting for some stuff to finish up :D
The fact that there is no ethical consumption under capitalism is not material to whether or not ethical existence is possible under communism or socialism. In order to survive in a capitalist society, one inherently has to make choices that require trade-offs, and those trade-offs are burdened by a history of decisions made not just by the people alive today, but our ancestors as well. Does that mean I walk around chanting "Reparations", "Land-back", or other calls to action? No, but I do acknowledge that there are unresolved issues and as a Canadian, I know we need to do more to resolve treaty issues, and environmental issues, and system discrimination. I also know that Americans need to do better to address systemic discrimination and many, many other issues. It also doesn't mean I want to give back my house, or give away all of my possessions. It just means I try to make good choices and support businesses and people that are open about the trade-offs they make and try to engage as ethically as possible.
Acknowledging those facts doesn't absolve us of responsibility, it's a framework that allows folks concerned about whether or not they are doing the right thing to accept the trade-offs that they choose to make and be responsible and accountable for those choices to themselves or their communities.
We live in a world with scarce resources. It's possible that with a foundational redesign of the global economy, and the requisite authoritarian government that would be required to force such a redesign, we could eliminate food scarcity, solve energy scarcity, and make sure that everyone has a place to live. Those trade-offs are probably not worth the ethical cost in political and physical violence required to accomplish it. We have seen the trade-offs that happen when the powerful are able to exploit communist or socialist governments. We are seeing the "late stage capitalism" impacts of allowing the powerful to exploit capitalism in democratic societies. Acknowledging that the current capitalist system has lead to the greatest prosperity for the upper echelon (financially) of humanity, and a dramatic reduction in global poverty shouldn't obscure the reality that much of that wealth comes from exploitation of people and the environment.
It's a huge problem to unwind, and we can't let the burden of every choice that we make stop us from trying to do better, but we (as in society in general) can't do better if we don't at least acknowledge the compromises we are making along the way, and try to plan to fix it in the future.
Probably a topic better suited to beer and a pub setting than HN though :P
> The fact that there is no ethical consumption under capitalism
I don't believe that this is a fact. How are you demonstrating that this is a fact?
When you talk about things like reparations or "land back" you're already cargo-culting in concepts and ideas that themselves need to be fleshed out in order to make a subsequent claim that a specific economic system is unethical. Someone can just argue all economic systems are unethical, how are you going to defend against that? And can you pay reparations for example without going back in all of human history and finding all cases of injustices and then tallying it up? Why pick an arbitrary point in time? Better yet, why not start in countries where slavery still exists instead of focusing on the west which led the world in abolishing slavery and created concepts such as universal human rights.
Even with respect to "eliminating food scarcity" - eliminate in what sense? All olive groves and grapevines and rice farms have to be destroyed and rebuilt to only build certain foods?
Dabbling in communism or other inhumane and authoritarian governmental systems is extremely dangerous and in the same vein of extraordinary claims required extraordinary evidence, suggesting as you did creating an authoritarian government to create a utopia is precisely the same project of suffering and death that mass murderers throughout history have undertaken to abject failure, and thus, you need some incredible amount of evidence and theory to be able to even fairly suggest going down this path.
It's simple, I am not going to defend any economic system because they all require trade-offs, because any economic model that we could currently implement must necessarily ration scarce resources according to some set of rules. Those rules will explicitly deny someone else resources, and the adminstration of that economy will also be subject to abuse by the people who enforce the rules.
I am not going to do the work of gathering the evidence for you, and I don't think this is the right venue for a debate on the topic.
If you'd like to concede the debate that's fine, but you can't drop a few comments that are, well, not at all simple, and then when someone points out the flaws in your reasoning or asks clarifying questions you throw your hands up and say it's not the right venue for debate.
If you don't have evidence I think it's mature of you to admit that and applaud you in doing so. We all like to just talk and don't have to always provide evidence for every citation or what not and it's fair to just say hey I'm just making this up and it requires further discussion.
It only needs additional context and work if you are unfamiliar with the concepts underlying it. Possibly consider you are out of your depth here, rather than jumping to conclusions.
If you can't trust them to act ethically on the small scale, why would you expect that to turn around once it gets to a larger much more important scale?
How many government sanctioned school bombings does it take for them to quit working with said government? For now we know that number is somewhere between infinity and 1.
It literally does not register as "unethical" at any scale to have different products or prices for different customer tiers.
The question of collaboration with USG is a much more complex one, but is not the one raised above.
Edit: I'll also add that I doubt any AI-doom people "trust" Anthropic per se. The entire angle of questioning – again – misunderstands the AI-doom argument. You appear to think that if companies behave unethically, they cannot be trusted and they will not produce good outcomes, inversely: if they behave ethically, they can be trusted, and they will produce good outcomes.
Any competent AI-doomer would argue that ethics or trust are essentially irrelevant.
The entire problem is that people can act totally reasonably, even ethically, and this is not a guarantee of good outcomes. Situations can be created in which completely ethical, reasonable behavior actually produces a bad outcome. You do not need to assume people are bad in order to produce a bad outcome, and inversely you cannot assume that you will get a good outcome from good people.
"Arms races" are one class of situations that often have this characteristic. "Bureaucracy" is another class that we encounter a lot in daily life. There's a lot of them!
I don't think offering a product under a certain set of terms obligates a company to maintain that offering forever. The bait and switch is certainly annoying but seeing as they're very upfront about it you can't say you weren't warned. Don't like it? Don't use it.
Yup - who cares about x-risk or red lines for domestic mass surveillance anyways? I draw my red lines at prioritizing profitable customers when heavily resource constrained. That's the true definition of evilness!
It smells like an architecture-related issue to me. They wanted to release the model asap, but they're still implementing the fine-grained controls to constrain the model to non-subscription users.
They said they would release it back into subscriptions as capacity allows in the future. If they don't, people are going to point back at it and rake them over the coals.
More of a free trial to those authenticated and qualified with existing payment. Subscription billing is going away for sure though eventually based on the economics. Token “all you can eat” is a capital furnace otherwise.
(I’m highly confident open models will eventually achieve a similar performance benchmark with distillation over time)
Yeah that payment scheme sounds like they gear up to shift everyone into API token prices, eventually. Time to convert the existing tokens into software, until then.
Subs lose money on individuals to get those individuals to force their companies to pay for the corporate plan. The economics are bad, but so are the economics of grocery stores selling Milk and Bananas at a loss to drive traffic, which they basically ALL do.
I pay a lot but barely use it except for some intense days, where the lower plans would have throttled me in like 30 minutes. API billing is still more expensive. If you want to not pay much, go to openrouter and use chinese models. They are cost efficient.
Retain and hire the engineers who don’t require heavy use of AI to deliver value? The current SWE job market speaks for itself. Where will you go where they will let you burn up tokens in a high cost of capital macro?
ZIRP (zero interest rate policy) is over, software engineers no longer call the shots now that there isn’t vast amounts of capital chasing yield, and that capital bidding up salaries and keeping the labor market for engineers tight.
If you are x more productive with generative AI, very shortly you are going to have to prove it with a token budget (or, if you’re lucky, an org willing to spend for on prem hardware for capped token cost, fixed capex vs uncapped opex).
The comparison is not SWE vs SWE with AI. It is SWE vs SWE with AI with a constrained token budget ($x/month) delivering the same value at the same or lower cost. If you cannot prove that you are wildly (vs marginally) more productive with the AI, why would they pay for it? Prove it.
> The comparison is not SWE vs SWE with AI. It is SWE vs SWE with AI with a constrained token budget ($x/month) delivering the same value at the same or lower cost. If you cannot prove that you are wildly (vs marginally) more productive with the AI, why would they pay for it? Prove it.
> That is the real content of the Uber story, and it is why filing it under "budgeting discipline" misses what is actually unfolding across half the engineering organizations in the country right now. They ran the same experiment Uber ran, most of them without Uber's $3.4 billion R&D cushion to absorb the surprise, and almost none of them having modeled the heavy-user tail or instrumented the gap between tokens consumed and value shipped. The reckoning will arrive for each of them on their own fiscal calendar, and the first instinct will be the wrong one. The tool is too good to abandon, the bill is too large to absorb, and the only durable resolution runs through a question the entire rollout was designed to defer.
> You cannot get labor-replacement economics out of a tool you deployed as a labor supplement, and the bill comes due before anyone is willing to admit which one they actually bought.
That's not how it works. They don't need revenue, they need addicts.
Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.
Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control
But the only companies firing people (and certainly not everybody) are either the companies with an AI or the investment and finance firms that stand to profit from AI. I smell hype. And no company is firing everybody because of A.I.
I agree. They need addicts, but they are high on their own supply and everyone else can see the danger in getting hooked.
That's a big problem for all of the AI companies. Most people don't find the technology compelling, accurate, or ethical enough to pay for a subscription.
Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?
The Team plan is ~125 USD / month / user. Big enterprises like Uber are paying upwards of $1500 USD / month / user. Anthropic can raise their revenue a lot more by selling to big enterprises than they can by selling more team plan seats.
I just assume Opus is constantly nerfed based on capacity. I was exclusively Claude for a long time, but the inconsistency in quality, constant outages, and slow downs were too hard to work with.
I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.
I agree with you. The more I vibe code, the less interested I feel in what I'm building. Working with models that force me to think, especially with personal projects, helps me stay engaged and enjoy what I am doing more.
It's possible that they will transition to usage credits but why not take them at their word? To date they have continued to offer better and better models to their subscription plans.
Upd: I meant big picture, not with respect to this model release. Where do subscriptions figure into their strategic vision. Will consumers end up paying enterprise prices in the future?
In the blog post they say when sufficient capacity allows them to do so they aim to restore Fable 5 as a standart part of subscription plans and intend to do so as quickly as they can.
HN needs to take a chill pill. Could it be that Mythos is expensive and they just want to give people a taste of it? I mean the alternative is not offering it at all?
Even Opus 4.7 felt like a regression from 4.6, consumed a lot more tokens while I didn't experience any substantial improvements. The company I work at simply rolled back to 4.6 on everyone's configurations, disabling the toggle for 4.7.
I don't think they'll phase out subscriptions ever, their whole play has been to drive demand from the bottom up. Get engineers hooked on building with claude at home, then get them to demand the ability to use it at work, and bend over their employer with no lube.
They'll probably tighten the quotas to reign in whales though.
They almost certainly already make a fuckload more money off API pricing than they do subscriptions, even if there might be more total subscription users. So offering subscriptions even at some loss is probably going to continue. Honestly, I'd be surprised if they even lost money on most subs; there are definitely Token Whales out there who mess up all the accounting up, though.
Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".
But how is this sustainable? It's not like paying $5000 per feature means you'll be refunded if prompting "make no mistakes" didn't work.
The only reason why I pay $200 is because LLM's errors costs me that much, at worst. If "make no error" starts working - sure. But surely, unless you have millions of dollars of cash to burn, a coin flip that costs $5000 is an insane idea?
I certainly hope not. PAYG is not predictable enough for smaller companies or individuals. Where I work (non-tech company), PAYG would never fly. We aren't big enough for that. Of course, you can set usage budgets, but there's a pretty big difference between $200/user/month vs. the equivalent PAYG usage being closer to $1,000/user/month, if you currently use the subscription plan to its limits each week.
Going PAYG only will effectively take these tools away from a huge amount of people and accelerate the push for local LLMs.
OTOH, accelerating the push for local LLMs would also be fine with me.
I doubt it, given the importance of those subscriptions for building and maintaining market awareness.
The AI landscape is changing rapidly, and with Apple announcing the option to change the AI backend, and potential requirements enable AI choices as well, similar to EU browser choice requirements (this is more reading tea leaves than any actual requirements I am aware of). The new OS changes coming to support Googlebook, and deep Copilot/AI integration into Windows will make maintaining user facing subscriptions essential for independent model developers like OpenAI, Anthropic, and Mistal to remain relevant longer term.
If the don't maintain that relevance there is increasing likelihood that they will get consumed by other companies whether it's Apple, Microsoft or Google to form a foundation for their OS, or other cloud providers.
That make sense, but what about the specific bifurcation we're seeing here of super primo models versus still good models being available to subscriptions?
It's kind of annoying not getting access to the primo model and paying 200 bucks a month. I understand 200 bucks a month is basically nothing though.
Like I don't totally understand why they'd let me have it for a couple weeks and then take it away and say I can have it but I have to pay retail and retail is like $1,000 a day.
It's better to have loved and lost than to have never loved at all??
It's a trade-off. Every hyperscaler is buying and building compute capacity as fast as they can dodge red tape. There is limited compute capacity, and scarcity is a real thing.
As a consumer I can choose to buy subscriptions to a range of things, including $5 droplets or VMs on a broad range of cloud hosting providers. I can even buy cheap bare metal at a bunch of providers at an affordable retail rate.
I can also buy "unlimited" AI packages that will be optimized to fit the cost model from a variety of services, with different impacts, such as rolling outages when I consume a daily or hourly allotment.
Right now VC and the investor class are subsidizing the rapid evolution of the services and availability, but that VC is running out. In more traditional economies, AI would have developed and rolled out more slowly, and through metered subscriptions, with the eventual rolling out of "unlimited" packages like telephone, internet, or cell services once the market became commoditized.
We have seen a big inversion of that with the race to "win" AI marketshare. Now the true cost is being exposed, and the most competitive and capable models are hideously expensive to operate, so it makes sense that we are moving to metered billing for a utility service. If you want gas, you can buy regular or premium. If you have a premium car you definitely want the premium, but for most people regular is good.
Give it a couple of years, and the survivors will settle around fairly industry standard models of consumer grade services, pro-sumer accounts, and business/enterprise models.
Things are still shaking out, but I get the sadness. Luckily I work at a big tech company who is banging the drum on doing experimentation so I use my prosumer claude pro and other accounts at home for hobby stuff, and save my heavy lifting and potentially experimentation for work :P
It could be my use cases, which have always seemed to be outside the wheelhouse of these models, but I find it very hard to downgrade after accessing a more capable model.
Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).
Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.
So it might work if it turns out to be a substantial leap in capability.
I switched back to Sonnet. It replies faster so I work faster. Also cheaper. But I really like the speed. I have to be more specific with what I want. Also I stop it more often than Opus. These new models will be awesome, but they need to increase the speed.
> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.
Fable seems very good at finding bugs (unsurprising given Mythos lineage), so this seems a pretty smart strategy. Once you see the bugs it finds in your existing Opus code, it's going to be hard to go back, psychologically speaking.
This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.
It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1
It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.
i doubt that's the goal for them. i bet they just really don't have capacity for people using it a ton, yet they wanted people to be able to try it out while it's new. so they compromised and made it temporarily available. and then hope they can get costs down or capacity up so they can make it more available again
I think the goal is "private citizens: subscriptions; corporations: per-token billing." It's getting people addicted to LLMs on cheap subscriptions so that they can then force companies to pay for expensive inference.
I don't see how it won't be. They lose insane amounts of money on subscription plans. I'm sure they still lose money on usage-based billing, but probably not as much.
Most gyms sell more subscriptions than they can fit under their roof at one time. If a gym only sells to heavy users, it will either be constantly turning members away or have to buy more equipment. Its equipment will wear off faster. Depending on amenities, it will go through towels, soap, water, et cetera faster, too.
It depends on the gym and their business model! A super-budget gym like Planet Fitness that charges $15/month is going to lose money on heavy users, but they count on most of their members being infrequent gym-goers. A luxury gym like Equinox that charges $300/month can target heavy users without any issues, and they'd actually rather members go more so they stay and spend money on expensive salads and smoothies.
Right now all these AI subscriptions are priced like Planet Fitness, but they're used like Equinox. They're hoping that the new a la carte offerings will move their pricing more in that direction as well.
The other user is right, you are being a pedant. Why do you think planet fitness makes money hand over fist? Because 99% of its users sign up, never go, and then also never cancel because it’s cheap enough to leave running. Gyms absolutely bank on low amounts of power users, meaning the rest of the subscribers are subsidizing those that go frequently.
Members will switch gyms if it's too busy at times they want to visit. "Too busy" includes too much contention for a single piece of equipment.
US gyms might be vast warehouses but in the UK, most only have a couple of benches, couple of cages, one set of db per denomination above 20kg etc. They require working-in and consideration for others.
A couple of unapproachable "heavy users" doing 3 hour sessions across peak hours can ruin the workout for dozens of paying members needing a few min per station for ~5 sets.
It might also be a euphemism for "dickhead" who also tend to be "heavy users". Those that damage, hoard and don't share equipment and repel other customers on many levels besides - threatening, lecherous, loud and smelly.
Doesn't even need malicious intent - can be weirdo bores, forever talking at victims while doing a routine that makes absolutely no sense besides camping on equipment for half a day... 100 sets of incline press 7 days a week... what are you even doing to yourself fella?
That doesn't mean the company is losing money in aggregate on these subscriptions. Buffets are still in business even though some people gorge themselves silly at them. The incremental cost may exceed the incremental revenue for a particular person or minority group, but that's not how these businesses measure profitability.
I assume consumers aren’t a big note in their bottom line. I’m not actually very sure about that, just an assumption.
What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.
I just don’t think I’d use them much at all at home.
I’m just about ready to cancel my small business 5 user plan with max licenses, because although cowork is really great. I just find OpenAI/Codex to be a lot better most of the time.
> Pricing for both models is $10 per million input tokens and $50 per million output tokens.
The step-up in intelligence looks massive (we'll see in practice), but the price is getting to a point where it's making me question if it's even worth giving it a try.
Good competitors will probably be out soon, which should level the playing field. I am more excited about that, just the fact that they showed that such an improvement is possible. I'm okay waiting a bit longer for this to become attainable for plebs like me.
Models are getting better, but there's a negative change in terms of "productivity" per dollar. Yeah, I can throw 5 sub-agents at the problem, but the cost is getting significantly higher. And yes, I can crank out the solution much faster, but again, at some point that cost will be hard to justify. And it doesn't matter if the cost is subsidized by a provider, if it's paid by your company, or from your pocket. We are slowly reaching a point where the cost will be too high to justify the gains.
Sadly this does not seem to be the case here: if you read the announcement entirely, they include a "cost per task" metric which basically continues the trend of their previous models. So yes, tasks will cost you more, but results will be better - allegedly.
I'm not sure how it might be with Fable in practice, but we are already not that far away from AI costing as much as a full-time professional, faster in some ways but considerably less independent.
Perhaps not that close to US salaries, but those are inflated to hell. Worldwide senior engineers and scientists have salaries just about an order of magnitude away from AI subscriptions that you can use most of the day every day.
> * On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.
Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.
Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.
It’s an interesting thing to bring up because it’s this classic thing we’ve seen for decades now.
The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.
I suspect it'll go on the subscription plan once other providers have similar benchmarks.
As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.
If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.
My guess is that it is a massive model similar to GPT 4.5 and $10/$50 pricing is for its output will discourage people from using it. I also read safety = nerfed.
"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
...
Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."
The AI circular infinite money glitch won't last forever. I hope.
If you have good expertise in a domain and access to cheaper models, you may still be more skilled than someone without expertise but a lot of money to bruteforce the problems using SOTA LLMs.
But they're behind by quite a bit now. CFO (of OAI) Sarah Friar said the next training run will be in the fall on Vera Rubin, I think that means I'll have to wait > 6 months?!
Yeah but they might still have an unreleased bigger pretrain than 5.5. (but maybe not). still 5.5 is smarter than opus 4.8 IME, so you're only losing the mythos tier (fable). and all the cool fun stuff i'd want to use fable for our blocked (can't have it do even defensive cybersecurity work [in theory you can but the classifiers fire like crazy], can't discuss stuff like the furin cleavage site of sars-cov-2, etc)
Of course anything could happen but Anthropic has always been stingier and more expensive and is getting worse about that, while OpenAI is getting more generous with subscriptions (eg permanently doubling the highest tier allotment, setting policy to not cut off tasks that run past quota, resetting after outages, etc.)
This serves as a good reminder that relying on AI models is borrowing your tech from someone else. They can take it away or raise the prices arbitrarily.
If you rely on this as a core part of your business/profession, you will be at their mercy and subject to whatever whims or challenges they have.
> [...] People do not realise how much of a toll it takes on you if you actually care about the environment, exploited workers, theft from the people who can least afford it, the impact on people's cognitive skills, the centralisation of power, the spread of disinformation, the ruination of the web and/or the destruction of entire career paths (not billionaire of course, that's always a safe one), and not endorsing (either distinctly or tacitly by using) AI.
I believe people do understand the toll caring about something deeply takes -- but caring about all these things at once, many which you personally can't control, feels more like atlas syndrome or compassion fatigue by the author.
I also find the author a bit all-or-nothing in general. Losing friends because they use AI? Why does the dichotomy have to be so black and white? Can people have moral quandaries about AI while still using it, or does the moral stance always have to be absolute?
There will always be a subset of users whose goal is to not use your service, but to arbitrage your service into the maximum value for themselves.
For example -- let's say you offer $100 in free AWS credits by signing up to your platform. Expect a malicious user to eventually come to your platform, realize they can resell those $100 in credits for $50, and start using your platform for their own gain. Unless the mechanisms you add in place to reduce fraud / second sign ups / etc is greater than the value that they are receiving ($50), they will continue.
With sites where the platform is free, the math almost always makes sense for these malicious users to eventually abuse. In this case it was leveraging the email reputation of another domain at no cost to their own (along with the added value of anyone getting phished), but on other sites it's public profiles being used for backlinks / spam, etc.
You're mixing up bonus abuse here. The people behind phishing are more like hackers, whereas bonus abuse is usually run by non-technical people or bot farms. Scammers are much more dangerous, because they're typically behind operations far wider than just phishing, this might include actual financial fraud, international money laundering, and so on.
Bonus abuse is a small shop, whereas phishing through third-party services is much more likely to be an organized crime group.
To the end platform, what's the difference? Mitigation techniques largely remain the same, in that you make it more time / energy / money than what the end result of their abuse is worth. The platform cares about stopping the abuse -- not neccesarily correctly identifying whether the people abusing their platform are small shop "bot farms" vs organized crime.
To the platform, the difference shows up exactly in the mitigation math. The 'make it cost more than it's worth' model only works when both sides of that ratio are knowable and bounded. With bonus abuse, the reward is fixed and the math is clear, so you can reliably price the abuser out.
With organized criminals, you can't actually see what the abuse is 'worth' to them. And they can escalate almost infinitely: mimicking real user behavior, routing through residential IP proxies, using email addresses with established reputation, and at the top of the pyramid we've seen full mimics with real social network profiles and activity, they even answer phone calls.
That's why it's worth collecting events before acting: what the account is about, which IP network they use, whether they fake devices, whether there's any warmup prior to registration. Because that's what helps estimate whether your mitigation will actually work, and lets you respond in a balanced manner instead of under- or over-reacting.
> [...] With organized criminals, you can't actually see what the abuse is 'worth' to them.
Even without collecting events, you can calculate what the abuse is worth to you, even if the math ends up being fuzzier.
At the small platform operator level (one guy running a platform, as this article), the cost can be as simple as "this pisses me off and I have weekends." They can burn forty hours bolting on JA4 fingerprinting and a disposable-email blocklist to stop an abuser whose dollar-EV to them was roughly zero. Looks irrational, and that's exactly the deterrent — abuse pricing assumes a rational counterpart, and a guy who'll overspend his own life-hours out of stubbornness is unpriceable.
At any scale larger than a small operator, you also do get real numbers -- you can't perfectly price reputation, but you can price traffic and ad conversions, operational costs, LTV of customers (and conversion funnel metrics) etc, all of which don't stay still while abuse increases.
> [...] That's why it's worth collecting events before acting: what the account is about, which IP network they use, whether they fake devices, whether there's any warmup prior to registration. Because that's what helps estimate whether your mitigation will actually work, and lets you respond in a balanced manner instead of under- or over-reacting.
Isn't this just a way to estimate exactly how much the 'abuse' is worth to the abusers?
The value of human interaction cannot be overstated -- and the writer did a beautiful job outlining how AI isolates us. But there are also hidden difficulties in human interaction that AI helps ameliorate.
- My doctor friend does not wanting me pinging them asking for free medical advice every time I get health anxiety
- My chef friend does not want me calling them every time I'm struggling with a recipe
- My author friend does not want to read the 20th draft of my book, in which I've changed perhaps 10% of the content from the last draft
In these, the cost is a tax on the relationship -- relying on someone else too much to the point where it could potentially be impacting _their_ life.
Similarly, there are enough communities out there that are not accommodating -- even if I wanted to get a human answer and/or connect with someone, the interactions themselves can be painful. Do we remember what it was like posting on Stack Overflow? Do we believe Stack Overflow was a one-off outlier?
I also believe human imagination and knowledge shouldn't be bound to the relationships you have around you. What if my social group is small, or diversity of knowledge that my social group has is small? Should I not be able to think and explore an idea because my best alternative would be to contact a professor at a university that 99% of the time will not answer me?
I do believe that many people use AI now instead of learning and connecting -- I know my own programmatic knowledge has weakened now that AI has acted as a superhuman autocorrect. But on the other hand, with the help of AI I've also learned about a ton of things that would have otherwise been unavailable to me -- and I believe has improved me on the whole.
I think one of the important things at the core of these conversations is that the "things where AI is the best option" parts of the matrix don't feel like they need to be encouraged. People are going to use AI. A lot. I.e. for those of us who agree there is a balance to be struck, it's hard to argue that real-life usage doesn't naturally lean toward "too much AI" vs "not enough AI".
Normally, I separate the download filename (what the server / person chose to call the file) from my own organization system file name.
So if I download or get sent "Book.pdf", I'll rename it to how I'll remember it -- "Book Title - Author.pdf", etc.
That being said, I don't think there's any right answer here, it's usually just a matter of time and energy. If I had to enrich every single file I download with a great title / detailed metadata / etc that I'd need to find that file later, that's all I'd do all day.
I only bother with files I expect to want to look at again after the next 48 hrs. Which is maybe 5% of what I download. Those usually also get moved from the downloads folder and filed somewhere useful. Or for some stuff like family snapshots they get categorized in a folder and not renamed.
> Stripe (their payment process) will handle adult content payments. It puts the account into the high risk category due to the high rate of fraud in those categories.
Stripe _says_ they will handle these type of payments, but more often than not, within roughly a year of implementation you'll get an email from them kicking you off their platform, no matter how vigilant you were, or even if the things you were selling were more rated R than rated X. Source: my own insider knowledge along with colleagues in the space.
No it isn't. It's struggling on the front page because this is a very old story and it's the same conversation every time: payment processors hate this stuff because digital goods are fraud and chargeback magnets, and that's doubly true of adult content.
Those are valid facts, but this is missing an underlying point: HN’s community is not concerned about this form of discrimination, so each time it crosses the front page, we see lots of threads about deregulation but few about the spectre of ethics raised by these acts. Ethics aren’t typically in-scope for HN unless the party harmed is either a for-profit corporation or a tech worker; since HN doesn’t as a community tend to openly self-identify with the fields of sex work, the ethical issues here are effectively out of scope here. One can imagine a different HN that gave the ethical threats to Others as much airtime as it gives to ethical threats to Self. I remain hopeful.
If only all moral objections had such plausible-deniability ready to promote disregarding them, we’d never have to teach or debate morality and ethical practices in tech at all! Fortunately, the core debate — should payment processors be required to provide service so long as the operator is cooperative with escrow and other such ‘avoid money going out the door fraudulently’ restrictions on high-chargeback enterprises? — remains a ‘brass ring’ desirable outcome of techno-libertarians and so the issue continues be fought about. (Even if it’s only indirectly a morality debate over sex products.)
This isn't responsive to anything I've written. It's not in any sense a moral debate over sex products. It's a practical debate over how expensive it is to underwrite transactions in these markets. The people involved in making those payments work are extending credit.
Payment processors have constructed a “moral ordering of sexuality” [1] that would be entirely unnecessary if, as you claim, their intentions are purely legal and/or related to high chargeback rates.
If it’s not a moral issue, then the rules should be simple and easily communicable. Examples: Comply with the law of your jurisdiction. Keep your chargeback rates below x%. Instead, payment processors intentionally refuse to enforce consistent rules across platforms. Not the behavior of an economically-motivated, entirely rational agent.
First off, great article, everyone involved in this discussion should read it.
Second, agreed, if this was primarily about chargeback rates, there'd be no differentiation between disallowing things like hypnosis, (fictional) non-con, BDSM, etc. over vanilla sexual material. Instead it seems to be a mixture of pressure by (primarily religious, though some feminist) anti-porn activists, negative media portrayals (e.g. Kristof's PornHub article in the NYT), and understandable fear of lawsuits resulting from hosting actual illegal material (Visa/Pornhub case in California).
I’m not calling for “more transparency,” I’m calling into question your assertion that the payment processors are acting out of rational self-interest.
It’s a little strange to complain about no one being responsive to you when you’ve summarily dismissed every comment in this thread.
Once again this is like the 10th time this discussion has played out on HN. If you want to see a less conclusory set of arguments, use the search bar and go back a couple years.
The counterargument here doesn't even make sense. You think payment processors are run by people with weird puritan takes on adult content? No, they're exactly the same nerds that work everywhere else in the industry. I'm sure someone will come up with some just-so story about how payment processors, and only payment processors, are suspectible to influence from religious radicals or whatever, but: special pleading is special pleading.
> Once again this is like the 10th time this discussion has played out on HN.
If the conversation is too boring and repetitive for you personally due to your long, long history as a commenter, you could always choose not to participate in it. That’s more or less what you’ve done here in any case, with the added efficiency of one fewer step.
This is what, past the thirtieth anniversary of Eternal September? I’d think you’ve had plenty of time to cope with the social phenomenon.
> I'm sure someone will come up with some just-so story about how payment processors, and only payment processors, are suspectible to influence from religious radicals or whatever, but: special pleading is special pleading.
If U.S. credit-card issuers were worried about fraud, they would have implemented the other half of "chip-&-PIN," which the rest of the world has been using for decades.
U.S. customers pretty much JUST got chips in our cards... but issuers "forgot" to implement the PIN part.
In my country you usually need to confirm payments with SMS OTP, except for trusted merchants (but they take the risk of fraud by opting out from confirmation). So simply stealing a bank card doesn't get you far. And pretending that you did not pay is also more difficult. Is US different? Do banks and clients trust each other in US and do not require OTP?
Yep. If I take someone's credit card, I can use it all I want, until either 1) they notice and cancel the card, or 2) I trip the fraud protection with unusual spending patterns.
Are you viewing by the default page or active? Several of these articles were discussed last year when the processors were pressuring Valve. Maybe a little topic fatigue?
Collective Shout is a Christian anti-rights organization wrapped in feminist cloth.
The founder of Collective Shout previously successfully lobbied against mifepristone and opposed changes to legislation requiring pro-life pregnancy-counseling services to disclose their affiliations in their advertising.
In 2004, she founded the anti-abortion lobby Women's Forum Australia.
> Published by Bible Society Australia, Eternity is a national media platform for Christians, designed to encourage, equip and inspire them by revealing what God is doing in our nation and beyond.
That’s not a jump, it’s a straight line. These fundies like to dress it up but it’s transparently obvious to anyone who has dealt with religious fundamentalists that is their core driver.
US civics 101. The first amendment mostly restricts government action. This is not a free speech issue unless you want to legislate that adult content is a protected class or want to make a special clause for payment processing.
This is a perfect use case for crypto imo.
If you are making an argument that new legislation needs to made, great but unfortunately people jump to the idea that this is immediately a free speech issue.
Freedom of speech is not defined by the US constitution. Free speech is an ideological stance, not a legal definition. US laws protects some forms of free speech and not others.
Good luck with that. We can all day long discuss what is free speech and not free speech but unless it’s a protected class or a carveout for payment processors it does not matter. Propose solutions instead. You could argue that payment processors control so much of the market that it’s like the government limiting speech but I would counter argue that they could use crypto easily.
Not to mention usually businesses use payment processors as the scape goat. Very few business, other than purpose built, want to deal with adult content.
Lately I've been building Aho (https://aho.com) -- an API for verifying age, credentials, and identity using cryptographic proof from digital wallets instead of document inspection.
For context, I built out Playboy's age verification system, and watched as it hurt conversion (nobody wants to upload an ID to an adult website, who would have thought!). Cryptographic signatures from issuing authorities (DMVs, universities, employers, etc) with selective disclosure (e.g. you don't need to upload your full ID, just the fields that matter) is how verification _has_ to work going forward -- AI can fake documents, but not private keys.
I've been working on this 6 months full time, and implemented all the W3C VC, OpenID4VCI/VP, SD-JWT specifications myself.
I find people over-rotate on whether we should be reviewing AI-produced code. "What if bad code gets into production!" some programmers gasp, as if they themselves have never pushed bad code, or had coworkers do the same.
I've worked at places where I've trusted everyone on my team to the extent that most PRs got only a quick glance before getting a "LGTM". On the flipside, I've also worked on teams where every person was a different kind of liability with the code that they pushed, and for those teams I implemented every linting / pre-commit / testing tool possible that all needed to pass inspection (including human review) before any code arrived on production.
A year ago, AI was like that latter team I mentioned -- something I had to check, double check, and correct until I was happy with what it produced. Over the past 6 months, it's gotten closer (but still fairly far away) from the former team I mentioned -- I have to correct it about 10% of the time, whereas for most things it gets it right.
The fact that AI produces a much _larger_ volume of code than the average engineer is perhaps slightly concerning, but I don't see it much differently than code at large companies. Does every Facebook engineer review every junior engineer's pull request to make sure bad code doesn't slip in?
That isn't to say I'm for letting AI go wild with code -- but I think if at worse we consider AI to be a junior engineer we need to reign in with static analysis tools / linters / testers etc, we will probably be able to mitigate a lot of the downside.
There are two opposite answers here, and I feel like I could argue either one:
1) Humans were never held accountable, really
Outside of a few regulated industries, the worst that happens to an engineer who pushes negligent code is that they get fired. But after that happens, what actually changes? The organizational structure of the company that allowed the employee to push bad code still exists.
2) Humans will still be held accountable
If a human (managing a fleet of AI agents, let's say) ends up deploying bad code to production, they won't be able to point to the AI agent and say "it was them that did it!" -- it will still be the human at the end of the line that is held responsible.
Your comment seems to imply AI is currently at a junior developer's level -- 12 months ago I would have agreed (like I mentioned in my parent comment, both near the end and about the "latter" team I was a part of), but it's gotten quite good over the past few months.
That's not to say it won't ship bugs, but so does any engineer (junior or senior). It's up to you as to what level of tooling you surround the AI with (automated testing / linting / etc), but at the very least it doesn't also hurt to have that set up anyways (automated tests have helped prevent senior devs from shipping bad code too).
reply