The way to do this is to have a formal verification tool that takes the input, t...

natsch · on March 17, 2024

That would require the tool to prove the equivalence of the two programs, which is generally undecidable. Maybe this could be weakened to preserving some properties of the program.

layer8 · on March 17, 2024

No, it would not. It would require the LLM to provide a proof for the program that it outputs, which seems reasonable in the same way that a human decompiling a program would be able to provide a record of his/her reasoning.

The formal verifier would then merely check the provided proof, which is a simple mechanical process.

This is analogous to a mathematician providing a detailed proof and a computer checking it.

What is impossible due to undecidability is for two arbitrary programs, to either prove or disprove their equivalence. However, the two programs we are talking about are highly correlated, and thus not arbitrary at all with respect to each other. If an LLM is able to provide a correct decompilation, then in principle it should also be able to provide a proof of the correctness of that decompilation.

saagarjha · on March 18, 2024

Yes, and then someone needs to check that proof. It’s not particularly clear if that decompilation proof would be any more helpful than just doing the lifting by hand.

logicchains · on March 18, 2024

No, nobody needs to check the proof; that's the whole point of formal theorem proving, the machine checks it for you.

ngruhn · on March 17, 2024

That doesn’t mean that it’s impossible, right? Just that no tool is guaranteed to give an answer in any case. And those cases might be 90%, 10% or it-doesn’t-matter-in-practice %

djinnandtonic · on March 17, 2024

What if there are hallucinations in the verification tool?

thfuran · on March 17, 2024

Then it's not a formal verification tool. Generative models are profoundly unfit for that purpose.

layer8 · on March 17, 2024

There may be bugs, but not hallucinations. Bugs are at least reproducible, and the source code of the verification tool is much, much smaller than an LLM, so has a much higher chance of its finite number of bugs to be found, whereas with an LLM it is probably impossible to remove all hallucinations.

To turn your question around: What if the compiler that compiles your LLM implementation “hallucinates”? That would be the closer parallel.

smellf · on March 17, 2024

I think the idea is that you'd have two independently-develooed systems, one LLM decompiling the binary and the other LLM formally verifying. If the verifier disagrees with the decompiler you won't know which tool is right and which is wrong, but if they agree then you'll know the decompiled result is correct, since both tools are unlikely to hallucinate the same thing.

layer8 · on March 17, 2024

No, the idea is that the verifier is a human-written program, like the many formal-verification tools that already exist, not an LLM. There is zero reason to make this an LLM.

It makes sense to use LLMs for the decompilation and the proof generation, because both arguably require creativity, but a mere proof verifier requires zero creativity, only correctness.

thfuran · on March 17, 2024

Good luck formally proving Linux.

layer8 · on March 17, 2024

The goal is to prove that the source code matches the machine code, not to prove that the code implements some intended higher-level semantics. This has nothing to do with formally proving the correctness of the Linux kernel.