More

astrange · 2026-06-10T18:23:54 1781115834

Bad pelicans are in the training set because it's read his blog post. Including a good pelican in midtraining wouldn't help the problem because you'd just produce that every time.

astrange · 2026-06-10T18:12:31 1781115151

Query streams aren't really useful data in any sense. Just like nobody else is actually profiting from "selling your data".

astrange · 2026-06-09T18:10:43 1781028643

It is not a sponsored article and he writes one of these every time a new model releases. Why would a professor at Wharton need to write sponsored Substack articles.

astrange · 2026-06-09T18:05:27 1781028327

> Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences.

They're originally named after the blends at a nearby coffee shop.

https://postscript.co/pages/brew-guide

I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.

…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?

ValentineC · 2026-06-09T20:58:04 1781038684

Found a tweet from a year ago about this:

https://twitter.com/brian_a_burns/status/1866987688794132816

Well, TIL.

chroma_zone · 2026-06-09T19:58:29 1781035109

My life has changed, but not necessarily for the better.

bitpush · 2026-06-09T18:11:49 1781028709

This is interesting. Do you have any source?

astrange · 2026-06-02T15:50:04 1780415404

How can a patch be "reproducible"? The testcases are reproducible.

cmxch · 2026-06-03T03:00:55 1780455655

How Mythos’s mysterymeat got there from front to back.

astrange · 2026-05-29T07:46:04 1780040764

Scraping the internet isn't a copyright violation. Using it for LLM training is much more transformative than Google and Internet Archive, which are legal.

alfiedotwtf · 2026-05-29T08:39:02 1780043942

To be honest, this is the first time someone has spelt it out in a nicely succinct paragraph.

And just like that, I totally agree with you

estearum · 2026-05-29T10:58:59 1780052339

Except it ignores the entire premise of copyright which is to protect incentives to create original work, which Google does not destroy and which LLMs (very loudly and proudly) try to do.

There are several components of the Fair Use test, "transformation" is just one of them. The most important dimension is the effect on the market, i.e. the effect on incentives.

You probably shouldn't base your legal analysis on pithy internet comments regardless of how succinct or agreeable they are to you.

jazzyjackson · 2026-05-29T14:12:52 1780063972

Your right, scraping is legally protected. It's reproducing verbatim text that's a violation, which is why LLMs still clumsily refuse to produce song lyrics. They are capable of copyright violations and have to be 'aligned' not to get their providers sued.

estearum · 2026-05-29T23:25:28 1780097128

Verbatim reproduction is neither necessary nor sufficient to create a copyright violation.

"Copyright violation" is what we call the set of things that destroy the incentive for people to create original work by unduly benefitting from someone else's original work.

astrange · 2026-05-28T22:32:09 1780007529

They all looked like real CVEs to me.

coldtea · 2026-05-29T01:47:36 1780019256

Nothing that special about finding a real CVE. They're not that different than what non-Mythos could spot.

astrange · 2026-05-26T21:53:36 1779832416

Do you actually have evidence this works and doesn't degrade performance?

Foskya · 2026-05-27T12:04:02 1779883442

I've seen a few studies about it. If i remeber correctly, basically making them talk like a caveman increases the information density of every token and decreases the chance that they would allucinate.

As sources this is the one i found but i'm sure there are others: > Persona-based Prompting Has An Effect on Theory-of-Mind Reasoning in Large Language Models > Text Compression as a Proxy for LLM Reasoning Efficiency

astrange · 2026-05-26T21:49:59 1779832199

That's how a base model would work. An assistant model is simulating a human and behaves the same way a human would if you screamed at them.

https://www.anthropic.com/research/emotion-concepts-function

astrange · 2026-05-23T15:30:58 1779550258

The US doesn't.

"Social standing" in this case means if your girlfriend's parents will let you marry her. Not, like, who likes your LinkedIn posts.