As long as there are programs there will be bugs. Automated bug-checking isn’t a novel idea. There are many of them around, but they are seen by developers as being inaccurate, slow and liable to highlight bugs when they don’t exist.
But hope is in the ascendency for them if Project Parfait (a multi layered tool rather like the dessert, hence the name) is as successful as the woman who developed it says it will be.
According to Cristina Cifuentes, Parfait can check 10 million lines of code in less than 30 minutes. Not hours, not days, not a week – half an hour.
“Systems code is prevalent
in today’s world: in operating
systems, compilers, virtual
machines, database engines,
games engines, etc. Any systems
programmer should know their C…”
Cristina started on the project after discovering that though Sun (now Oracle of course) employs legions of programmers writing a huge amount of code they weren’t using bug-checkers, even though there were a large number of tools available. The team had tried various combinations but they came up against a number of problems – the tools would not compile the code; others simply took too long, a week in some cases.
Cristina’s theory was that getting the easy bugs out of the way quickly, you would free up time to find the harder ones. Take buffer overflows, for example. This kind of bug represents a potential security threat, because a hacker might be able to insert malicious code into the overflow. So any location where that could happen goes on a list of possible bug locations.
‘We start from a big list, we send it to the first analysis, the cheap one, and that analysis may be able to find some bugs for us to fix, so we can then take those off the list. It may also be able to find that in certain locations there are no bugs whatsoever.
In this way the list of all possible bug locations gets smaller and smaller.
Cristina obtained a PhD in Computer Science from the Queensland University of Technology in 1994 (where she is now an Adjunct Professor) for her work on decompilation of binary programs. She has served in the Program Committees of conferences in the areas of compilers, virtual machines, program maintenance, program comprehension, and software engineering, and has published in these areas as well as legal aspects of computing.
- How did the Parfait project start and was there a specific problem you were trying to solve?
- Three years ago, I was evaluating several possible project areas to work in within a research context, problems that had both a research component and a potential practical application within our product divisions at Sun Microsystems. One of those areas related to finding bugs and security vulnerabilities in systems code. At the time there were various commercial bug checking tools available in the market, however, our engineering teams were not using them. So the question from me to them was “why not?” and the result was a list of drawbacks in the then existing tools. These drawbacks formed the list of requirements for a new tool, and Parfait was born.
The drawbacks that we address in Parfait are: scalability, being able to run the tool in time proportional to the size of the code base, especially large code bases of millions of lines of code; precision, mis-reporting a small number of bugs (i.e., have a small “false positive” rate); recall, being able to give users an idea of the ratio of missed bugs (i.e., the “false negative” rate); and usability, being able to easily understand the results of the tool – why did the tool report a bug as a bug?
Results of Parfait running over the 10 million lines of code of the Oracle Solaris operating system’s “ON” consolidation (the Operating system/Networking consolidation) shows that Parfait can find common memory-related problems (buffer overflows, memory leaks, use after frees, etc) in a short amount of time: 25 minutes on an AMD Opteron machine (time after parsing), with a false positive rate of less than 10%. Usability of Parfait has been via means of a web-based graphical user interface that reports the errors to users of the tool.
For calculating the false negative rate, we ended up creating a benchmark tool called BegBunch that has various suites to measure accuracy and scalability of a bug checker. In measuring accuracy, you have to measure the number of bugs correctly reported by the tool (true positives), the number of bugs incorrectly reported (false positives) and the number of bugs missed (false negatives).
- Did you have a big architectural picture of how Parfait was going to work before launching it, so you knew what the hard to solve areas were likely to be?
- The architecture of Parfait was derived from its requirements. The need for scalability was critical. In looking at program analysis techniques, it became clear that bugs can be found using different types of program analysis, some of these are simple analysis techniques, and others are more complex. A simple analysis does not require a lot of runtime, a complex one does. By trading off between simple (or cheap) analysis with more complex analyses one can find a range of bugs of the same type: as an “easy” bug can be found with a cheap or simple analysis, but the same easy bug can also be found by an expensive analysis. Why spend runtime with the expensive analysis in order to find easy bugs? It’s best to use that time to find “hard” bugs. Therefore, easy bugs can be found by cheap analyses and hard bugs can be found by expensive analyses. By ordering the analyses from cheap to expensive, the Parfait framework maximizes use of runtime.
- What was the biggest challenge?
- The biggest challenge to address initially was scalability: how to ensure the tool would end up scaling well to the size of the code. Once a prototype tool was available that was scalable for finding buffer overflows, it became clear that the biggest challenge was to understand how people use this type of tool, what their expectations are in terms of error reporting and how to integrate it into a build system.
Usability became the biggest challenge and hence some effort has gone into usability. The biggest one thing that has helped the most has been the graphical, web-based, reporting user interface. The Parfait GUI summarizes bug reports in 3 forms: an Overview tab that gives an overview of the types of bugs found in the code and their percentages, a Defects tab that shows the individual bug reports with hyperlinks to the code, and the Inspect tab that shows a fish-eye view of the source code with the statement where the bug was reported highlighted in a shade of red, and other statements highlighted in a shade of blue to show the “path to the bug”, i.e., statement that the tool took into account to determine there was a bug at the line it reported it.
- How for example does Project Parfait overcome problems in buffer overflows?
- Parfait reports problems with arrays (buffers) and strings (arrays that end with a ‘\0’ in the C/C++ language), letting users know which access of the array or string is incorrect, namely because it is an access outside the (legal) bounds of the array or string. A Parfait bug report includes information about the array/string, the line of source code where it happens, the name of the array/string and the name of the index, and a message about what went wrong (e.g., the array is of size N but it was accessed at position N+20). The user then determines how to fix this problem (clearly, the size of the buffer is too short if N+20 elements are needed).
- It’s an obvious question, but wouldn’t it have been easier to ensure that people used bug detectors as they went on instead of retrofitting?
- It would have if it would have been computationally feasible to run some of the analyses that are run by bug checkers these days, but it wasn’t possible: analyses would take too long to run on machines of two or three decades ago. Tools like “lint” were developed at the time, focusing on light-weight analysis. Unfortunately, lint-like tools produce too many false positives, making users feel frustrated with the tool (due to the amount of time to weed-out misreports) and deciding not to use it.
- C++ is a language that was designed to cater to everybody’s perceived needs and so the language has become complex, difficult to understand, and likely to contain errors forever. Is simplicity the only answer? Wouldn’t it be easier to design a simpler language?
- C and C++ are languages that were designed for systems programming, allowing developers to have easy access to the underlying machine. That access comes at the expense of the language being low-level and needing the developer to understand how (parts of) the underlying machine work. For example, the memory management system, a developer needs to know how to use pointers. Simpler languages do help developers write easier-to-maintain programs. For example, in the Java, Smalltalk, Self and other programming languages, memory is handled by the underlying virtual machine; the developer does not need to know about pointers or worry about them.
- Do you think C has outlived its usefulness or do you see it as the perfect language for really good systems programmers but not one for the average systems and applications programmers?
- I think that the C language has and will continue to serve systems programmers for years to come. I don’t believe everyone will agree with the statement C being “a perfect language” but it is flexible enough that it allows systems-level developers to write efficient code. These days, with simpler languages designed for general purpose applications, your average applications programmer doesn’t need to know about the C family of languages. But for those programmers that need to developer systems code, they need to understand how the language works. Systems code is prevalent in today’s world: in operating systems, compilers, virtual machines, database engines, games engines, etc. Any systems programmer should know their C.
This is not to say that in the future systems-level code could not be written in another language. For example, there are research projects looking into writing of a JVM fully in the Java language, with reasonable performance and ease of maintainability that comes with the use of the Java language.
- A few decades ago people thought women would be good programmers because they were thought to be keen on detail. These days the conventional take on this is that men have the ability to focus on detail usually to the detriment of everything else and that’s why most programmers are male. Do you think that’s an accurate assessment?
- I have never heard this argument before. In my experience, differences in cultures seem to have an impact on the number of women in Computer Science. I grew up in Colombia, South America, and at the time there were 50:50 women:male students in the CS degree. Once I arrived in Australia I noticed a different ratio between women and males (20:80), that ratio also happens in the US and other countries. However, in Asian countries like Singapore and China for example, the ratio of women is also high (as it was in Colombia at the time that I studied there). Why this is the case, I do not know, but it is argued that primary and high school schooling have something to do with it.
- Can you tell me a little about the ‘Walkabout’ project and why your interest in binary translation and reverse engineering?
- The Walkabout project aimed at automatically translating executable (binary) files from one platform to another at runtime, to appear to the user as if he was running the original executable on a different platform. This process is known as binary translation. In the compilers community, a re-targetable compiler is one that uses specifications of machine instructions to be able to generate code for different machines or targets. By analogy, in Walkabout, the aim was to have a retargetable (i.e., generate code for different target machines) and a resourceable (i.e., take as input code from different source machines) framework, with machine specifications of the various different instruction sets, for example, the i386, SPARC v8, 68000 and PA-RISC instruction sets.
Walkabout abstracted away the handling of source and target machines via the instruction specifications (for syntax and semantics of instructions) and then performed analyses to lift the level of abstraction of the decoded instructions from low to high level, to resemble instructions from a high-level language (procedural) that are machine independent and hence easily translated into the target machine (using traditional compiler code generation techniques). The novelty of the framework was on the processing of the low-level of abstraction into a higher level which was machine independent.
My interest in binary translation grew out of my PhD work on decompilation, as binary translation can be seen as a “cousin” of decompilation. In my PhD thesis the aim was to determine how much decompilation could be done automatically. So starting from an executable program, can we derive a higher-level program that resembles one written in a high-level (procedural) language? The answer is yes, and in my thesis I described algorithms for reconstructing high-level control structures and data structures. The interest in this area was purely from a challenge point of view: how much can be done automatically was the main question I wanted to answer; my supervisor, Prof. John Gough was a compilers person. The tool dcc was developed as part of my PhD and the drawbacks of that tool, namely, that it would only take as input one type of executable program, were addressed by the later UQBT and Walkabout binary translators (i.e., that’s where the retargetability idea came from). I remember getting requests at the time for being able to support one or another form of executable code, and I used to think “you can write the code yourself”, but clearly, conceptually you are doing the same thing each time you write a decoder of an executable file format, so why not specify the (executable file) format? And then why not specify the instructions that the machine supports and decode them?