How can we know that formal methods work?

Question

An important goal of formal methods is to prove the correctness of systems, either by automated or human-directed means. However, it seems that even if you can give a correctness proof, you may NOT be able to guarantee that the system will not fail. For example:

The specification may not model the system correctly, or a production system may be too complicated to model, or the system may be flawed inherently due to contradictory requirements. What techniques are known to test whether a specification makes any sense at all?
The proof process may be flawed too! Who knows that those inference rules are correct and legitimate? Furthermore, proofs can be very large, and how do we know that they do not contain errors? This is the heart of the critique in de Millo, Lipton, and Perlis's "Social Processes and Proofs of Theorems and Programs". How do modern formal methods researchers respond to this critique?
At runtime, there are many nondeterministic events and factors that can seriously affect the system. For example, cosmic rays can alter RAM in unpredictable ways, and more generally we have no guarantees that hardware will not suffer Byzantine faults, which Lamport has proved are very difficult to be robust against. So the correctness of the static system does not guarantee the system won't fail! Are there any techniques known to account for the fallibility of real hardware?
At present, testing is the most important tool we have for establishing that software works. It seems like it should be a complementary tool with formal methods. However, I mostly see research which is either focused on formal methods or testing. What is known about combining the two?

Issues 1 and 3 seem to be inherent to system analysis, whatever the method. Formal methods at least make them obvious, unlike other methods. Issue 2 is non-existent as far as I know; the formal systems I have seen used have been proven to be correct; you can mess up every deduction system by modifying it yourself and making mistakes, of course. — Raphael, Jan 05 '11 at 20:10
This question is phrased rather subjectively, and in a way as to provoke. I'd recommend rephrasing or closing. — Suresh Venkat, Jan 05 '11 at 20:42
I made some major revisions to make the question more useful in my judgment. If you think this was too aggressive an alteration, please post in meta. — Neel Krishnaswami, Jan 05 '11 at 21:29
Thanks to Suresh for pointing it out. I didn't realize that, was eager to know what arguments there are to make me think it right. — Jenny, Jan 05 '11 at 21:35
Thanks, Neel Krishnaswami! I should learn from how to write properly, and your revision is urging me to think more...looking forward to more thought-provoking comments on those questions. — Jenny, Jan 05 '11 at 21:41
@Neel: Nice edit. One thing your change omits is a reference to system security, which was part of the essence of the original question. Perhaps Jenny can put that back in, to again make it her question. — Dave Clarke, Jan 05 '11 at 21:41
As for new bullet 4: Big misconception: (realistic) testing can never show the absence of errors. — Raphael, Jan 05 '11 at 22:55
I think the areas don't mingle as much as one would like because of the very different schools of thinking. Engineery minds will test or claim quality by construction process, learning while they go in practice. Sciency minds will want formal proofs -- and realize the restrictions, retreat to the ivory tower and extend their toolset first. Both ways are not very compatible. — Raphael, Jan 05 '11 at 23:02
@Raphael: I waffled about that, but decided to go with the stronger claim, since people do use testing to build belief htat a program works. Sometimes this is even ok! As a simple example, if your programming language is the language of polynomials, you can test whether an expression correctly implements a function of degree $n$ by testing $n$ points. So a question is whether actual programs have any qualities that justify similar (albeit maybe probabilistic/Bayesian) conclusions, and what such properties might be.... — Neel Krishnaswami, Jan 05 '11 at 23:14
@Raphael: I think that is an oversimplification about formal verification, model checking people (e.g. Edmund M. Clarke and Moshe Y. Vardi) are also very rigorous, while people from proof theoretic side (e.g. Thierry Coquand and Gerard Huet) are also seriously interested in practical issues. — Kaveh, Jan 06 '11 at 01:12
@Neel: You are right, some programs can be proven to be correct by testing a few inputs. In general, however, this is not true but that fact is often ignored, therefore my statement. @Kaveh: It is a simplification, for sure, since people mix methodologies. I have the feeling, though, that most people are either engineers or scientists at heart, rarely both. You can see this when engineers do proofs or scientists build something. That impression might be wrong, of course. — Raphael, Jan 06 '11 at 15:53

Dave Clarke · Answer 1 · 2011-01-06T10:39:25.800

Regarding 4: There's plenty of work combining formal methods and testing. Googling "formal methods testing" reveals quite a large amount of work. Although there are many different goals of such work, one of the key goals is to generate more effective test suites. This talk gives a nice introduction, based on model checking.

Also regarding the issue software security, which has been edited out of the question: formal methods need to work harder to deliver in that realm. Typically one writes a specification for a piece of software and verifies that the specification is satisfied, namely, that when the input satisfies the precondition that the output satisfies the postcondition. To ensure secure software one also needs to check also that the software behaves sensibly for input that does not satisfy the precondition. (This is of course equivalent to setting the precondition to true for all input.) The space of all inputs is typically much larger than just well-formed input, but typically this is one place where even functionally correct software can be violated, simply by violating its assumptions.

Regarding point 3: Some work has been done for verifying systems in settings where faults are explicitly modelled, including Faulty Logic: Reasoning About Fault-tolerant Programs. Matthew Meola and David Walker. European Symposium on Programming, 2010. Work on probabilistic model checking and probabilistic verification certainly both also address the issue of faults to some degree.

score 9 · Answer 2 · answered Jan 05 '11 at 22:44

Your points in order:

The correctness of all specifications are ultimately subjective: they are perceived to solve a problem correctly according to their users or they don't. There is no getting away from this is software development, and no method has the silver bullet for this one.
A lot of work goes into proving that the process is correct with respect to its assumptions. Usually there is information available for experts to validate these rules. Any human activity is subject to error but very formalized systems using well-studied approaches are much less subject to error than almost all other human processes.
At some point, any system has a failure mode outside the scope of that system. You are still better off eliminating all the predictable sources of error, even if you leave some unpredictable ones unaccounted for.
Testing and proving can easily coexist. Testing is a less specific, more ad hoc process, so you may find less formal work done on it. But you may be interested in a tool such as QuickCheck that supplements with tests the proofs offered by the Haskell type system.

score 9 · Answer 3 · edited Dec 05 '11 at 09:59

Formal methods have already been shown to work big-time. Without them, we would not have managed to cope with the complexity of designing modern digital systems (microprocessors).

No micro ships without having its logic subject to functional equivalence verification; without its FPU having been subject to FV; without its cache coherence protocols having been subject to FV (this has been the case since 1995).

Barring obvious differences between software and engineered physical systems (e.g. bridges, where one can add safety factors), they play the role of engineering math in CS. However, FM's use always depends on benefit/cost. Fuzz testing pays off big-time at companies such as Microsoft (Google SAGE in one slide).

Even within a micro, there are subsystems (e.g. microprocessor pipelines), where FV hasn't had the same impact as elsewhere (e.g. FPU, where conventional testing was not done at all in many cases when formal symbolic trajectory evaluation proved the absence of bugs: Kaivola et al CAV'09).

Formal methods are also being used in the synthesis of artifacts (code, high-quality tests, schedules for optimally draining battery banks, ...). Of course, all of the issues raised in the question are quite valid, and like in any other use of technology, false ads must be recognized and avoided.

score 8 · Answer 4 · answered Jan 05 '11 at 21:05

8

Regarding 2: if the system is formalized in a proof assistant (e.g, Twelf or Agda or Coq) and the properties of soundness and completeness are verified, and the proofs are done in this encoding, this is a non-issue. You may have formalized a system that isn't describing what you intended, but at the least you won't have incorrect proofs or a broken system in which you are reasoning.

answered Jan 05 '11 at 21:05

Jamie Morgenstern

446
2
3

1

Also, there is something known as "adequacy" of your encoding: that what you've formalized in a proof assistant is in fact the system you've written down on paper (see http://twelf.plparty.org/wiki/Adequacy). This is not, however, addressing your first point, it is constructing a bijection. – Jamie Morgenstern Jan 05 '11 at 22:55

score 6 · Answer 5 · answered Jan 06 '11 at 09:28

However, it seems that even if you can give a correctness proof, you may NOT be able to guarantee that the system will not fail.

Yes, perhaps there are no guarantees. Formal methods are meant to find and eliminate errors and to convince humans.

What techniques are known to test whether a specification makes any sense at all?

You might be interested in model checking tools for specifications of formal systems.

The proof process may be flawed too! Who knows that those inference rules are correct and legitimate?

I think (because of Goedel's incompleteness theorem) showing a system of inference rules is consistent necessarily appeals to an even more powerful set of inference rules.

Furthermore, proofs can be very large, and how do we know that they do not contain errors?

Hopefully, huge proofs are checked by a small proof checker that can be read and understood by humans in a reasonable amount of time. This is sometimes called the "de Bruijn criterion". If you believe that the proof checker will not certify an incorrect proof, you need only check the checker.

Are there any techniques known to account for the fallibility of real hardware?

How about Fault tolerant typed assembly language?

However, I mostly see research which is either focused on formal methods or testing. What is known about combining the two?

"The TAP conference is devoted to the convergence of proofs and tests".

Just googling for "proofs and tests" has several good hits above the fold.

score 5 · Answer 6 · answered Jan 06 '11 at 08:16

What techniques are known to test whether a specification makes any sense at all?

It is definitely a judgement call. In software engineering, people have designed very strict methodology to find/write/confirm the specifications. This is done by real humans and using a non formal (in the sense non mathematical process), so it is still subject to inconsistency, but at the end of the day, it corresponds to what people want to verify, no more no less.

Are there any techniques known to account for the fallibility of real hardware?

Sure, there is a field of verification known as runtime verification, you can also use execution based model checking on the real system running on a totally virtual environment subject to specific scenario (I contributed myself to this with V-DS + APMC). However, this is clearly a major problem to add the interaction between the system and the environment into the verification process.

However, I mostly see research which is either focused on formal methods or testing. What is known about combining the two?

Wow, today I will be totally shameless and cite myself again. With coauthors we work on combining model checking and testing, you can look at the publication list of a former PhD student of our group or search for "approximate probabilistic model checking" or "statistical model checking" in any good search engine (work done by Younes et al., Sen et al. or myself et al. and many others).

ad 1: Note that the need for a formal formulation of specifications is supposed to aid the subjective part as opposed to formulations in natural language. By having to be very precise, inconsistencies and uncertainties are visible and have to be resolved. — Raphael, Jan 06 '11 at 15:55
The process is non formal, but the result is, for model checking, typically a temporal formula (LTL or CTL for example). In my experience (with people from industry), using a formal language does not necessarily improve the consistency of the result :( — Sylvain Peyronnet, Jan 06 '11 at 16:47
But you can check for inconsistencies at least, can't you? What have your experiences been regarding "getting what you want"? — Raphael, Jan 07 '11 at 00:51
Yes, I can check for the inconsistencies, but unfortunately that does not always help to solve them. The problem is that it is very difficult for most engineers/industrial designers to write specifications in classical verification languages. My opinion is that, if you don't have a deep knowledge of a specification language, using it will guide you too much: you end up writing only what you are able to write with a little familiarity of the language. In summary, it kills your creativity. — Sylvain Peyronnet, Jan 07 '11 at 11:39

score 5 · Answer 7 · answered Dec 05 '11 at 10:50

You might be very interested in the works of John Rushby, one of the designers of the PVS theorem prover, who is generically interested in exactly the points you mention. You might enjoy reading this classic report to the FAA about the use of Formal Methods and the Certification of Critical Systems (1993), and his newer writings about assembling a probabilistic, formal safety case out of various means of evidence provided (testing, proofs, analyses, etc).

How can we know that formal methods work?

7 Answers7