Can computer programs be understood through reason alone?
Or will we study computer programs empirically, just as we study the universe or the human brain or any natural phenomena or even artificial ones such as the global economy? Will the guarantees that we provide for program behavior be similar to the laws of physics or assertions in biology (statistically-based)?
A lot of program analysis and understanding is already empirical. However, what is lacking is a widely-accepted mathematical foundation for new empirical approaches (statistical or machine-learning?), that compliments the current logic-based foundations of program analysis and formal methods.
|Summary of my conclusions
The question is whether formal methods (such as human-guided theorem proving and various kinds of model-checking), program analysis and programming language design deliver on program correctness and program understanding (just as logic reasoning helps us understand mathematical structures) or will we need new empirical approaches as well that can give guarantees about program behavior and improve program understanding (just as the scientific method gives us guarantees and understanding about physical or biological phenomenon)? It is very important to emphasize that any empirical approach that aims to be in the same intellectual league as formal methods must provide guarantees on program behavior, correctness and understanding. The definition of the word guarantee may change from mathematical theorem to something akin to a physical law or a statistical-valid statement or a complexity-valid statement (?).
Yes, we need both, but empirical approaches will dominate in the future simply because the sheer size and complexity of computer programs will continue to grow. I hope to convince the reader of the merits of this empirical view of “providing guarantees about program behavior and understanding”. Furthermore, it will lead to completely new ways of thinking about program correctness, program understanding, verification and security.
This is the subject of my first post. First, let me expand on the question itself:
Many in computer science departments in academia believe that we (humans) can understand all the properties of computer programs through appropriate partially automatic or computer-aided reasoning. There are entire fields dedicated to this effort, namely, formal methods, program analysis, programming language theory, and logic in computer science. The output of these fields of research are techniques and methods for understanding behavior of computer programs, and guaranteeing the correctness of programs against well-designed specifications. Typically program behavior is understood through the use of mathematical logic. The pioneers of these ideas include giants of computer science such as John McCarthy, Tony Hoare, Robert Floyd, Jay Moore, Robert Constable, Ed Clarke… Many of them won a well-deserved Turing award for their efforts. The Turing is the Nobel of computer science.
The basic idea in these fields is that we must aim to make computer programs behave correctly with respect to some well-designed specification. The idea is that any deviant behavior (bugs) of the computer program from its well-designed specification is bad, and the computer program must be fixed so that the bad behavior is not manifested. Furthermore, it is assumed that it is possible to fully specify all behaviors of any program using mathematical logic and that techniques such as interactive theorem proving/model-checking/program analysis can be used to check the correctness of program with respect to such well-designed specification.
The belief system of many academics today is that it is possible to specify in a sound and complete manner all the behaviors of a computer program using a well-designed specification, and verify fully its correctness against such a specification using interactive formal methods. Furthermore, software testing typically cannot give guarantees about software correctness. Software testing might increase confidence in the reliability and possibly security of a program, but correctness guarantees are a different matter all together. Hence, this focus on formal approaches to understanding, specifying and verifying computer programs. I call this belief system computation rationalism.
Considerable progress has been made in this research agenda of formal approaches to understanding computer programs, thousands of PhDs have been awarded (including mine), and many start-up companies started. The field has had a huge impact on hardware correctness. Intel and AMD routinely use these ideas and techniques to verify critical sub-systems in their microprocessors.
However, many others in the computer systems research community believe that formal approaches to program correctness will not scale. The reasons for such a belief are many. I will not explain the following in detail for the sake of brevity of the post.
- The state space explosion problem: Your typical computer programs have a very large state space.
- The difficulty of specifying program behavior. Do we really always know what we want from a program?
- The difficulty of coming up with good programming language (PL) abstractions
- The issue of making all these techniques useable by the average programmer
- The increasing complexity of computer programs
Of course, it is important to make software more reliable and secure since billions, if not trillions, of dollars worth of our infrastructure and economic activity depend on reliable software. The systems guys say that it is best to keep improving software testing techniques, and think about software reliability in new ways. I call this belief system computation empiricism.
This belief system, in its refined form, does not say that we should stop all research in formal approaches to understanding computer programs. Instead it says that it is possible to understand, specify and even give guarantees with beyond-formal approaches. We need both formal and beyond-formal approaches to giving guarantees. By beyond-formal I mean techniques and approaches that incorporate all the lessons learned from formal approaches and also testing, but go beyond as explained in subsequent posts.
How can we give guarantees about software correctness without formal methods?
Before I describe my ideas on that (in my subsequent posts), let me explain why I believe that very large computer programs are forever beyond human reason. My reasons go beyond the ones mentioned above:
Why Computer Programs are Beyond Logic Reasoning
Is it really true that for any computer program we can always specify its behavior using mathematical logic and then verify using a human-guided computer aided-verification? I will try to philosophically argue that it may not always be possible.
- I believe that computer programs will eventually become, by far, the most complex systems that we will ever investigate, more complex than biological/chemical systems such as the human brain or body, financial systems or complex physical systems (may be with the exception of the universe itself). The universe itself will come to be viewed as a computer program (already some physicists advocate this view). In such a scenario, we will have no choice but to investigate these systems empirically, i.e., use the scientific or statistical methods to make observations and then generalize them inductively, not necessarily in a strict mathematical sense, but in the sense of everyday usage of the word inductive. Completely new approaches to understanding computer programs will need to be developed.
- Computer programs already display dynamical properties that are very difficult to specify in pure logic. One might be able specify them using a statistical language, but it is hard to see how such dynamical properties can be specified using first-order or higher-order logic. This points to another current limitation of formal methods (besides scalability issues related to handling large complex programs).
- Furthermore, computer programs can be obfuscated deliberately to confuse program analysis (I am, in fact, working on such techniques).
- Finally, going beyond formal methods is an opportunity to provide complimentary yet different foundations to program analysis and understanding, than today’s logic-based foundation. I will expand on such foundations in my future posts.