i think this shows why formal verification, while certainly useful in specialized situations, will likely not be a major boost for software productivity[1]
> I do not believe we will find the magic here. Program verification is a very powerful concept, and it will be very important for such things as secure operating system kernels. The technology does not promise, however, to save labor. Verifications are so much work that only a few substantial programs have ever been verified
It's widely agreed that formal verification does not boost software productivity, in the sense that formal verification doesn't speed up development of "software that compiles and runs and we don't care if it's correct".
The point of formal verification is to ensure that the software meets certain requirements with near certainty (subject to gamma radiation, tool failure, etc.). If mistakes aren't important, formal verification is a terrible tool. If mistakes matter, then formal verification may be what you need.
What this and other articles show is that doing formal verification by hand is completely impractical. For formal verification to be practical at all, it must be supported by tools that can automate a great deal of it.
The need for automation isn't new in computing. Practically no one writes machine code directly, we write in higher-level languages, or rarely assembly languages, and use automated tools to generate the final code. It's been harder to create practical tools for formal verification, but clearly automation is a minimum requirement.
Automation of Hoare logic is quite good these days. Dafny, from MS Research (https://dafny.org), is probably the most friendly formal language of any kind. It's built around Hoare logic and its extension, separation logic. The barrier of entry is low. A seasoned imperative or functional programmer can get going with Dafny in just a few days. Dafny has been used to verify large systems, including many components of AWS.
I am hoping that LLMs make more advanced languages, such as Liquid Haskell or Agda, less tedious to use. Ideally, lots of code should be autocompleted once a human provides a type signature. The advantage of formal verification is that we can be sure the generated code is correct.
Software has such huge economies of scale that, for the most part, being useful or cheaper than the alternative surpasses the need to be correct. Only for some categories of software being correct would be a priority or a selling point.
True, but also the work can be reduced significantly with better tooling, which is still being developed but has improved markedly over the past decade. Eg SMT solvers that can output proofs, or tactics in Coq or Lean.
I'm hoping that this will be a big application of AI actually. If an AI can be built do to this simple but very tedious work, and your verification tool is capable of catching any errors it makes, then you've covered up a major flaw of formal verification (its tediousness) and of AI (its tendency to output bullshit).
Proving safety is just a small part of the problem to be solved. The hard part is actually structuring the program such that its correctness can even be formulated as a formal property which can be proved. For a lot of software that alone is highly nontrivial.
> More seriously, even perfect program verification can only establish that a program meets its specification. The hardest part of the software task is arriving at a complete and consistent specification, and much of the essence of building a program is in fact the debugging of the specification
On the other hand, proofs sometimes give you more than you'd expect. A proof that the implementation of a function always returns some value automatically proves that there's no arbitrary code execution, for instance.
Why should it save labor. What it does is guarantee correctness which is like the holy grail for programmers. If you know shit's correct you can just assume stuff and it will actually work on the first try. You don't even need to write tests for it.
> What it does is guarantee correctness which is like the holy grail for programmers.
Formal verification can never guarantee correctness. For that to happen, you'd need to know that the property you verified is the property you want. If you have the ability to write correct specifications, you also have the ability to write correct programs, and running the verification doesn't add any value.
It guarantees correctness relative to a given requirement.
The real value of formal proofs lies in forcing you to think deeply about the requirement and your implementation of it and to make your assumptions about it explicit.
I have only ever used proof once in my career. We had a problem with an aircraft main computer that it was occasionally failing during start up and then refusing start again on all subsequent restarts. It was a multiprocessor computer and each processor was running start up tests some of which would interfere if run at the same time. I was worried that if the start-up was stopped at an arbitrary time it might leave the control variables in a state that would prevent further start-ups. I used Leslie Lamport's first technique (https://lamport.azurewebsites.net/pubs/pubs.html#proving) in an attempt to prove that it would always start up no matter at what point it was stopped the previous run. But I was unable to complete the proof in one situation. So I added a time delay to the start-up of some of the processors to ensure that this situation never occured. But that didn't fix the original problem. That turned out to be a hardware register being read before it had settled down. I fixed that later.
Correctness is taken to mean "The program matches its specification". Or, more literally, "the program is a correct implementation of it's specification".
Taking the reductio ad absurdum, the program itself perfectly specifies its own behavior. Yet it's hardly something I'd point to as a holy grail. We want programs that do what we want them to, which is generally a human construct rather than a technical one. (And you'd better hope that two different people don't expect the program to do two different things in the same situation!)
Good software, even with bugs, released today, can be used today. Perfect software, released two years from now, cannot be used today, or next week, or next year. "Good now" beats "perfect later", at least now, but often it beats it later too.
While someone is working on "provably perfect", someone who isn't using that approach has released four versions that were not perfect, but were good enough to be useful. That person took the market, and when the provably perfect version finally comes out, nobody cares, because they're used to how the other one works, and it has more features. And if the "provably perfect" person wants to implement those features, they have to change the program, which means they have to re-do the proof...
Also "perfect software" doesn't actually mean "perfectly robust systems" - at the end of the day it has to interact with the Real World, and the Real World is messy. You still have to cope with failures from cosmic rays, power brownouts, hardware bugs (or at least a difference in specification of the hardware to the model used for this sort of formal verification), failures communicating with other devices etc.
So critical software already has to deal with failures and recover, no amount of formal verification will remove that requirement.
> You still have to cope with failures from cosmic rays
Much like you have to cope with hash or GUID collisions... that is, you don't, because it statistically never happens. Unless you're speedrunning super mario or something.
Besides if you have a program that's formally verified, you just need to do what NASA did for its Apollo missions and make all the logic redundant and gate it behind a consensus algorithm.
You can argue that all 4 computers might get hit by a cosmic ray in just the right place and at just the right time... But it will never ever happen in the history of ever.
So my point is that the real world is messy. But the systems we design as engineers are not necessarily as messy. And the interface between the real world and our systems can be managed, and the proof of it is that we wouldn't be chatting across an entire ocean by modulating light itself if that weren't the case.
I've definitely dealt with hash/GUID collisions in the context of safety critical systems before. It's not a particularly uncommon requirement either.
"just" is pulling a lot of weight in your comment. Redundant consensus is difficult and expensive, all to address very particular error models (like the one you're assuming). If we expand our error model from localized error sources like cosmic rays to say, EMI, there are entire categories of fault injection attacks well-known to work against modern redundant systems.
That's assuming your specification is comprehensive and correct in the first place of course. My experience is that all specifications have holes related to their assumptions about the real world, and many of them have bugs or unintended behavior as well.
And unexpected input from other systems. And if it was unexpected in a way that wasn't covered by your proof, then your program is not guaranteed to work correctly on that input. And the real world has a lot of ways for input to do unexpected things...
But you don't have to do the proving. You can let someone who is, like, REALLY good at formal verification do it and then benefit from the libraries they produce
Verification is useful when you're dealing with low level or very declarative stuff. You don't really have to "verify" a CRUD repository implementation for a random SPA. But the primitives it relies on, it would be good if those were verified.
Same problem. Sure, it would be great if the primitives we use were proven correct. But I've got stuff to write today, and it doesn't look like the proven-correct primitives are going to arrive any time soon. If I wait for them, I'm going to be waiting a long time, with my software not getting written.
i think this shows why formal verification, while certainly useful in specialized situations, will likely not be a major boost for software productivity[1]
[1] - https://worrydream.com/refs/Brooks_1986_-_No_Silver_Bullet.p...:
> I do not believe we will find the magic here. Program verification is a very powerful concept, and it will be very important for such things as secure operating system kernels. The technology does not promise, however, to save labor. Verifications are so much work that only a few substantial programs have ever been verified
It's widely agreed that formal verification does not boost software productivity, in the sense that formal verification doesn't speed up development of "software that compiles and runs and we don't care if it's correct".
The point of formal verification is to ensure that the software meets certain requirements with near certainty (subject to gamma radiation, tool failure, etc.). If mistakes aren't important, formal verification is a terrible tool. If mistakes matter, then formal verification may be what you need.
What this and other articles show is that doing formal verification by hand is completely impractical. For formal verification to be practical at all, it must be supported by tools that can automate a great deal of it.
The need for automation isn't new in computing. Practically no one writes machine code directly, we write in higher-level languages, or rarely assembly languages, and use automated tools to generate the final code. It's been harder to create practical tools for formal verification, but clearly automation is a minimum requirement.
Automation of Hoare logic is quite good these days. Dafny, from MS Research (https://dafny.org), is probably the most friendly formal language of any kind. It's built around Hoare logic and its extension, separation logic. The barrier of entry is low. A seasoned imperative or functional programmer can get going with Dafny in just a few days. Dafny has been used to verify large systems, including many components of AWS.
I am hoping that LLMs make more advanced languages, such as Liquid Haskell or Agda, less tedious to use. Ideally, lots of code should be autocompleted once a human provides a type signature. The advantage of formal verification is that we can be sure the generated code is correct.
How do you encode the difference between a method that adds and a method that multiplies two numbers in the type signature?
Software has such huge economies of scale that, for the most part, being useful or cheaper than the alternative surpasses the need to be correct. Only for some categories of software being correct would be a priority or a selling point.
True, but also the work can be reduced significantly with better tooling, which is still being developed but has improved markedly over the past decade. Eg SMT solvers that can output proofs, or tactics in Coq or Lean.
I'm hoping that this will be a big application of AI actually. If an AI can be built do to this simple but very tedious work, and your verification tool is capable of catching any errors it makes, then you've covered up a major flaw of formal verification (its tediousness) and of AI (its tendency to output bullshit).
Proving safety is just a small part of the problem to be solved. The hard part is actually structuring the program such that its correctness can even be formulated as a formal property which can be proved. For a lot of software that alone is highly nontrivial.
From "No Silver Bullet":
> More seriously, even perfect program verification can only establish that a program meets its specification. The hardest part of the software task is arriving at a complete and consistent specification, and much of the essence of building a program is in fact the debugging of the specification
On the other hand, proofs sometimes give you more than you'd expect. A proof that the implementation of a function always returns some value automatically proves that there's no arbitrary code execution, for instance.
Why should it save labor. What it does is guarantee correctness which is like the holy grail for programmers. If you know shit's correct you can just assume stuff and it will actually work on the first try. You don't even need to write tests for it.
> What it does is guarantee correctness which is like the holy grail for programmers.
Formal verification can never guarantee correctness. For that to happen, you'd need to know that the property you verified is the property you want. If you have the ability to write correct specifications, you also have the ability to write correct programs, and running the verification doesn't add any value.
It guarantees correctness relative to a given requirement.
The real value of formal proofs lies in forcing you to think deeply about the requirement and your implementation of it and to make your assumptions about it explicit.
I have only ever used proof once in my career. We had a problem with an aircraft main computer that it was occasionally failing during start up and then refusing start again on all subsequent restarts. It was a multiprocessor computer and each processor was running start up tests some of which would interfere if run at the same time. I was worried that if the start-up was stopped at an arbitrary time it might leave the control variables in a state that would prevent further start-ups. I used Leslie Lamport's first technique (https://lamport.azurewebsites.net/pubs/pubs.html#proving) in an attempt to prove that it would always start up no matter at what point it was stopped the previous run. But I was unable to complete the proof in one situation. So I added a time delay to the start-up of some of the processors to ensure that this situation never occured. But that didn't fix the original problem. That turned out to be a hardware register being read before it had settled down. I fixed that later.
Correctness is taken to mean "The program matches its specification". Or, more literally, "the program is a correct implementation of it's specification".
Taking the reductio ad absurdum, the program itself perfectly specifies its own behavior. Yet it's hardly something I'd point to as a holy grail. We want programs that do what we want them to, which is generally a human construct rather than a technical one. (And you'd better hope that two different people don't expect the program to do two different things in the same situation!)
Because it takes so much work.
Good software, even with bugs, released today, can be used today. Perfect software, released two years from now, cannot be used today, or next week, or next year. "Good now" beats "perfect later", at least now, but often it beats it later too.
While someone is working on "provably perfect", someone who isn't using that approach has released four versions that were not perfect, but were good enough to be useful. That person took the market, and when the provably perfect version finally comes out, nobody cares, because they're used to how the other one works, and it has more features. And if the "provably perfect" person wants to implement those features, they have to change the program, which means they have to re-do the proof...
Also "perfect software" doesn't actually mean "perfectly robust systems" - at the end of the day it has to interact with the Real World, and the Real World is messy. You still have to cope with failures from cosmic rays, power brownouts, hardware bugs (or at least a difference in specification of the hardware to the model used for this sort of formal verification), failures communicating with other devices etc.
So critical software already has to deal with failures and recover, no amount of formal verification will remove that requirement.
> You still have to cope with failures from cosmic rays
Much like you have to cope with hash or GUID collisions... that is, you don't, because it statistically never happens. Unless you're speedrunning super mario or something.
Besides if you have a program that's formally verified, you just need to do what NASA did for its Apollo missions and make all the logic redundant and gate it behind a consensus algorithm.
You can argue that all 4 computers might get hit by a cosmic ray in just the right place and at just the right time... But it will never ever happen in the history of ever.
So my point is that the real world is messy. But the systems we design as engineers are not necessarily as messy. And the interface between the real world and our systems can be managed, and the proof of it is that we wouldn't be chatting across an entire ocean by modulating light itself if that weren't the case.
I've definitely dealt with hash/GUID collisions in the context of safety critical systems before. It's not a particularly uncommon requirement either.
"just" is pulling a lot of weight in your comment. Redundant consensus is difficult and expensive, all to address very particular error models (like the one you're assuming). If we expand our error model from localized error sources like cosmic rays to say, EMI, there are entire categories of fault injection attacks well-known to work against modern redundant systems.
That's assuming your specification is comprehensive and correct in the first place of course. My experience is that all specifications have holes related to their assumptions about the real world, and many of them have bugs or unintended behavior as well.
And unexpected input from other systems. And if it was unexpected in a way that wasn't covered by your proof, then your program is not guaranteed to work correctly on that input. And the real world has a lot of ways for input to do unexpected things...
But you don't have to do the proving. You can let someone who is, like, REALLY good at formal verification do it and then benefit from the libraries they produce
Verification is useful when you're dealing with low level or very declarative stuff. You don't really have to "verify" a CRUD repository implementation for a random SPA. But the primitives it relies on, it would be good if those were verified.
Same problem. Sure, it would be great if the primitives we use were proven correct. But I've got stuff to write today, and it doesn't look like the proven-correct primitives are going to arrive any time soon. If I wait for them, I'm going to be waiting a long time, with my software not getting written.
This is why connecting formal verification to an infinite (possibly-incorrect-)labor machine such as an LLM system can indeed save human labor.