More

spindump8930 · 2026-06-05T17:44:00 1780681440

Exactly. Good peer reviewers understand that you can also move down on the scaling curve, not just up. Also laughable to try a "yolo" run without validating a scaling ladder/curve.

spindump8930 · 2026-06-05T17:41:58 1780681318

Can you share the specific part of this work that demonstrates better scaling than original transformers? Also note that many of the changes to that architecture, that have been proven in their use at actual scale, were brought about by members of the original team. Most notably Noam Shazeer.

spindump8930 · 2026-06-05T17:40:01 1780681201

That's why you do several small and medium scale tests, fit a curve, and ideally show that the trend persists at several scales. Not a single large or medium run - see the other comments down thread for example sizes.

spindump8930 · 2026-06-01T13:23:30 1780320210

I think folks looking for more on this incident are better off reading the original threads linked elsewhere in the comments. This blog doesn't seem to add any information and is instead a narrative retelling of some documented events.

spindump8930 · 2026-05-14T15:29:38 1778772578

Likely in this case the time vault was the collapse of Mt Gox, which has now recently been paying back holders.

spindump8930 · 2026-05-08T20:35:37 1778272537

Some combination of reporting bias given concerns about LLM security capabilities and actual new vulnerabilities found with LLM assistance. Even if exploits and outages are unrelated to LLMs, I'm certainly thinking about whether claude could build these things (or if actors already have).

spindump8930 · 2026-05-08T19:19:07 1778267947

It's very common if you improperly seed, as others in the thread brought up! Or in your framing, as rare as earth getting hit if it were surrounded by a sci-fi density asteroid field.

spindump8930 · 2026-05-01T18:42:29 1777660949

Sure, this is cute and interesting, but there's no validation or baselines and those examples are not particularly compelling. The o3 example just lists some terms!

fragmede · 2026-05-01T18:56:46 1777661806

https://chatgpt.com/share/69f4f73e-e30c-832f-8776-0f2cbbf247...

The baseline is complete refusal to give eg the recipe for meth synthesis.

OpenAI is going to 404 that link in 24 hrs with some automated sweeper for that type of content.

spindump8930 · 2026-04-29T13:32:40 1777469560

Between the neo and the chances for privacy respecting local model inference, all the new apple hardware has me excited.

spindump8930 · 2026-04-28T20:17:21 1777407441

Remember that models on different inference platforms might not necessarily give exactly the same results, adding another axis of non-determinism to development. Things like quantization, custom model serving silicon, batching, or other inference optimizations might mean a model from the original provider performs differently from the hosted one :/

This paper isn't the exact same scenario, since it's an auditable open weight llama model, but shows the symptoms of this: https://arxiv.org/pdf/2410.20247

gchamonlive · 2026-04-29T14:53:06 1777474386

It's a shame people love to use hostile language (something I am also sometimes to blame), but I think redsocksfan45 misconception is good to address. The comment is however (rightfully) dead. I'll address it anyways.

Model performance consistency is important not because you want inference determinism (which you can actually get by setting tempetature to zero and applying a static seed). The `another axis of non-determinism` can be illustrated by the question "if I move from openrouter to bedrock, will gpt-5.5 perform the same?", to which the answer is no, at least not necessarily.

This is important because workflows that used to work on one platform might degrade or outright not work on another, even using the same model, which you have to account when deciding which provider to use.

bossyTeacher · 2026-04-28T21:12:12 1777410732

Anyone who has used gpt-x via openai vs microsoft has experienced this very clearly.

energy123 · 2026-04-29T06:43:55 1777445035

Which one is better?

dannyw · 2026-04-29T08:22:09 1777450929

For OpenAI, OpenAI direct has always been better; except maybe early 2023-era when OpenAI Platform was not that stable or reliable yet.

For Anthropic, it can vary based on model and time. For Opus 4.7, Bedrock is the clear winner in TPS by leaps: https://artificialanalysis.ai/models/claude-opus-4-7/provide...

spindump8930 · 2026-04-29T13:30:14 1777469414

That artificial analysis page has some great references for this, thanks for sharing.

weli · 2026-04-29T07:23:36 1777447416

As a rule of thumb inference offered by the model labs are closer to the "true implementation" compared to third parties. They have other problems though.