More

numeri · 2026-06-17T22:01:33 1781733693

The problem is that what people care about are the "black swan" causes of death, i.e., the cases the actuarial table is wrong.

numeri · 2026-06-16T22:35:44 1781649344

Prices for training have dropped immensely in terms of research required, code efficiency, algorithmic/sample efficiency, and possibly also hardware (I'm not qualified to say without looking it FLOPS/dollar, or even to be certain that's the right metric here).

numeri · 2026-06-16T22:29:44 1781648984

I mean, it might listen to him. We have no clue, which is the problem.

eightysixfour · 2026-06-16T22:40:58 1781649658

Sure, but my guess is for "true" super intelligence we won't be able to predict whether that is true or not until it happens. I'm not a doomer, but I also don't really think we can "align" people, much less a "super intelligent" AI.

numeri · 2026-06-16T22:18:04 1781648284

There's a large gap between making up words and an actually native text distribution. LLMs have a clear pattern, clear tells, a "feel" in English, and it's normally even more pronounced in non-English languages.

Lots of bias towards English sentence structure, idioms, etiquette, etc.

tgv · 2026-06-20T10:41:29 1781952089

I didn't notice any of that. Such a bias would be strange, because certainly smaller models don't have the luxury of learning grammar independently: it's still word sequences, and languages are quite well separated.

numeri · 2026-06-15T02:38:56 1781491136

One context I could imagine is a young person with shaky grasp of English trying to come up with an interesting school/university project via conversations with an LLM set up as an OpenClaw agent.

It's got the right combinations of inexperience, cluelessness, panic, expectations that Westerners are rich, and hopes of others being willing to fix their mistake.

numeri · 2026-06-12T22:47:14 1781304434

especially because this is the most painfully glaring flaw in their plan. Their solution is for an inference provider to... store the KV cache (which they can compute!) on-premise, on their own disks, but pay some third party for it?

mistercow · 2026-06-12T23:28:26 1781306906

Well, it’s one flaw. I would argue that the bigger flaw, which you alluded to, is that the cost of computing the cache yourself maxes out in the single digit dollars even very large frontier models, and that’s a one-time cost. Even if you imagine all the logistics are free and all the transfers are instant, what are we even talking about here from an economic perspective?

KV caching is a super interesting engineering space, especially when you’re talking about local models where compute and memory bandwidth are highly constrained and you’re trying to trim fractions of a second everywhere you can by flipping between different ICL prefixes. But selling caches for specific documents just makes no sense at all.

numeri · 2026-06-12T17:16:11 1781284571

I've had it happen. I ran an experiment, taking a couple hours and producing ~2 GiB of files. One of the results looked good, so I told Claude Opus 4.5 (at the time) to commit the code changes, upload the important file to cloud storage, then clean up the rest.

I then saw it run `rm -r results/`, before messaging me: "Now all that's left is for you to upload the successful results, then I'll delete the rest!"

Why did it not upload the files itself, when it had been using the cloud storage CLI during that session? No clue. I do accept that I could have and should have just uploaded the file myself. It would have taken 3 seconds to type.

numeri · 2026-06-11T20:36:59 1781210219

To be fair, it is good to know that it disobeys simple instructions like "don't examine my git history" far more than other models. (It should of course be a different benchmark, so as not to conflate things.)

It's not a great sign for alignment.

bensyverson · 2026-06-11T21:02:47 1781211767

Agreed, alignment is just a separate issue that a vuln fixing benchmark doesn't need to be testing.

numeri · 2026-06-05T05:28:09 1780637289

I would just warn that you may not be able to recognize what is worth learning at your stage.

Intuition for library design and the architecture of software packages/external APIs is something you can only learn by doing.

numeri · 2026-04-15T20:51:40 1776286300

I have DSPD as well, and was pleasantly surprised to see how much of the article discussed DSPD.

That being said, I do think a lot of what the author is saying flies right in the face of traditional advice, esp. the suggestion that we should all just free-sleep and rotate around the clock. I personally find myself happiest when I'm entrained to the 24-hour cycle, but at my own natural offset. Whenever I've been cycling the day it's felt miserable, uncontrollable and exhausting.

To be fair, the author did claim that you can fully solve this by completely cutting out after-dark electronics, but I've tried pretty intensely to do exactly that for extended periods in the past, and didn't see any progress. I do sleep amazingly when camping, though, and the delay is lesser than normal (still definitely there).