Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> All undefined behavior could become "implementation defined" tomorrow, where the C compiler becomes more like a high-level assembler (again), and you could still jump the instruction pointer into arbitrary program text.

Try to work this through in your head. Imagine how you need to specify the working of the abstract machine in order to allow this. How do we talk about an "instruction pointer" on the abstract machine? What are the instructions it's pointing to? Am I defining an entire bytecode VM?

Nah, instead you're going to do one of two things. One: "Undefined Behaviour" which we explicitly took off the table, or Two: "If this happens the program aborts". And with that the big problem evaporates. Does it make those C programmers happy? I expect not.



Implementation defined means the compiler must specify the behavior, but it has near total freedom, and it can define it specific to the target system. There is no abstract machine. If I use GCC on Linux x86-64, then there very much is an instruction pointer.


In the real world, compilers just specify that the behaviour is undefined and tell you to suck it up. But we're talking about a hypothetical where we aren't allowing Undefined Behaviour. Saying "Oh, but we can if we say it's the implementation choosing" is a get out which is meaningless for the hypothetical. Just refuse to engage with the hypothetical instead if you don't like it.


I'm using specific, standards defined language, that's relatively well known. For example, sizeof(int) is implementation defined, meaning it must have a documented definition, specific to the implementation (e.g., gcc x86_64-linux-gnu, it's 4).

In languages like C that are closer to the machine, not everything has to be specified strictly in terms of a generic abstract machine.

I'm not trying to be hostile or evasive or derisive, I'm just genuinely responding to your original comment, that I think missed on some important info. And my point was that if we imagine a different world from the real world we're in right now, where in this new world, all undefined behavior became implementation defined behavior, then there would still be a need for mitigations like endbr64. So I'm not painting a rosy picture for C. I just think undefined behavior is a red herring. Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.


> Assembly doesn't have undefined behavior, but obviously you can have all sorts of issues there.

The machine is in the real world and is thus obliged to have some actual behaviour, but it is not always practical to discern what that behaviour would be let alone make it reliable across a product line and document it in an understandable way. As a result actually your CPU's documentation does in effect include "Undefined Behaviour".


True, when writing my comment I wanted to qualify it to the same effect, but thought it would be an unnecessary subtlety to the general thrust of my point. That is, we can ignore this kind of "undefined behavior in the machine itself" for the purposes of this particular discussion.


I don't see how to ignore it though. If we're defining the behaviour but then our "definition" just doesn't specify the actual behaviour because it's specified in terms of hardware with no clearly defined behaviour for that situation then it's just word play, we're not really doing what I set out.


If for the purposes of this discussion we can't ignore it at the machine level (because we're assuming higher level languages, crappy or otherwise, are unlikely to generate machine instructions that exhibit undefined behavior), then why were we discussing higher level languages and their crappiness at all? I'm not saying this to be snarky, I just mean that I really think the likelihood of machine undefined behavior being an issue is on the order of likelihood for cosmic rays to flip bits -- happens, and can't be ignored (buy ECC memory), but more interesting to talk about the things that we are many orders of magnitude more likely to experience, e.g., bugs in C programs, bugs in unsafe Rust, bugs in managed language runtimes, etc. I think those things are not all equally likely, but could all benefit from endbr64 type mechanisms, including in JIT output.

To be clear, unlike the comment root, I don't think this particular hardware mechanism obviates the need/benefits of related software mechanisms. But in terms of cost/benefit/applicability, endbr64 type mechanisms look pretty good all around.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: