Click here to monitor SSC
  • Av rating:
  • Total votes: 15
  • Total comments: 3
Viet Yen Nguyen

Optimising a High-Performance Computing Tool

22 October 2008

Model Checking

Many computer systems nowadays have their ‘correctness’ checked using sample testing, but this isn't enough where failure can lead to catastrophe. Model Checking techniques are far better because they explore all scenarios and verify whether responses meet expectations in each instance. Now there is a publicly available .NET Model Checker that introduces a new way for .NET developers to test their systems.

Many computer systems nowadays have their ‘correctness’ checked using sample testing: an engineer programs a scenario and runs the test system against it. The system’s actual output is captured and measured against the expected output, and if these match up, then the test is passed. But for some systems, you need to test all scenarios.

This is the case for highly critical (and complex) systems such as satellites, rockets, railway signals, on-board flight controllers and flood-control barriers, where simply sample testing is just not enough. They are not much noticed, but have a huge impact on daily life, especially when they start failing. Imagine space rockets crashing from the sky or trains running into each other because of small bugs in the system. Nobody wants that; failures are too costly in terms of lives and money. Lots of engineers specialising in quality assurance, safety, dependability, reliability and performability put every effort into prevent them from happening, but it's not an easy task. The kind of systems they analyse so thoroughly are generally highly multi-threaded or event-driven, and it is fiendishly difficult to test them exhaustively for correctness because of the huge number of ways they can respond.


Imagine space rockets
crashing from the
sky, or trains running
into each other
because of small bugs
in the system.
Nobody wants that.
                   ”

In order to test all the possible responses of a given system, the scientific community has for the last few decades been investigating model checking.

Model checking tools explore all scenarios and verify whether responses meet expectations in each instance. These techniques have been successfully applied to verify NASA’s Mars Rover, Lucent Technologies’ PathStar telephone switch and the storm surge barrier control system in the Dutch Delta Works dikes. So far, so good. But past model checking techniques required engineers to create and verify a model of the system, when it is the system itself which ultimately needs to be functioning correctly. So that is why, in the past decade, people from NASA started to apply model checking techniques at a system-level.

With that in mind, a team comprising of myself, my predecessor and our supervisor - Theo Ruys - developed the MoonWalker model checker at the University of Twente. MoonWalker verifies multithreaded .NET programs by inspecting the CIL bytecode instructions and exploring all possible interleavings of the threads. It is also the only publicly available .NET model checker to date that employs a set of efficient algorithms and is, in terms of performance, on par with other state-of-the-art tools. You can read all about it in Software Model Checking for Mono, which contains an introductory chapter on model checking and an in-depth explanation of the algorithms out there.

Figure 1 - Conceptual overview of using MoonWalker. The user loads up a .NET assembly and the assertions specified within it to the MoonWalker model checker, which explores the state space of the assembly. In case no assertion violations are found, OK is outputted. Otherwise a trace of CIL instructions that leads to the assertion violation is generated.

Algorithms


We’ve seen many
case studies where
model checkers
completely consume
clusters with dozens
of GB's of memory
and dozens of
processing cores.
                   ”

Efficient model checking is all about employing the best algorithms, or else you easily end up hitting the limits of your computer's processing power. We’ve seen many case studies where model checkers completely consume clusters with dozens of GB's of memory and dozens of processing cores. It's a generally known problem in the model checking community, and it drives us and our colleagues around the world to eagerly develop better algorithms.

So recently we spent months working out three new algorithms for MoonWalker; Theoretically they looked much better, but we wanted to see whether they would really speed up MoonWalker in practice. Naturally, we needed to implement them as efficiently as possible to squeeze the most out of them, but this is not easily done in a tool as complex as a model checker, which is constructed with some of the most advanced algorithms around. So we made heavy use of a profiler to understand bottlenecks and aggressively optimise the algorithms.

Measure First, Optimise Later

When developing MoonWalker we read some interesting papers on hashing in model checking, and recalled from our experiences that the hash function is called up to billions of times during an average verification. Here was an opportunity to devise a faster hash function! Based on cyclic polynomials and Knuth hashing with a constant time-complexity, this would be much faster than the linear time-complexity of traditional algorithms. And so with growing excitement, this new algorithm was worked out on paper and called incremental hashing. When we proved that the theory behind it was sound, we enthusiastically implemented it in MoonWalker.

The results of the first verifications runs were not what we expected. We used small .NET models, and simply used the cygwin time command to measure the execution times. There was no decrease and we initially thought this was because of the simplicity of our models. We crafted a few more complex ones and reran the verifications - also with no decrease in execution time. This was puzzling and disappointing because, in theory, our incremental hashing should have been much faster.

Figure 2 - On the left is a simple datarace programmed in C#. The MoonWalker model checker (in the command prompt on the left) successfully detects the datarace.

We started wondering whether the .NET virtual machine might be able to optimise the old algorithm to the point where it was as efficient as the new one. So the old hashing algorithm was blown up by adding some unoptimisable loops, but this also didn’t give us decreases in execution time.

Completely stumped, we started to google for answers, specifically trying to understand how the compiler and the CPU optimised the instruction code. We even went through the optimised CIL bytecode step-by-step using the debugger built in to Visual Studio.

In a flash of insight, we realised we should measure the cumulative time spent in the old hash function and the incremental hash function, and look for an absolute difference.

We immediately began googling for .NET profilers and found several. We finally settled on ANTS profiler because it shows the profiling results in a way that gives a quick overview and is fast to comprehend. Using the profiler in detailed mode, we found that the absolute amount of time spent in the old hash function was much higher than the incremental hash function. Multiple times higher, in fact. So the big question was:

Why wasn’t this difference reflected in the total running time?

When in detailed mode, ANTS can show a piece of the source code with indications of the amount of time spent on each line. We analysed both the old hashing function and the incremental hashing function in this incredibly fine-grained way… and still couldn’t explain why there wasn’t a difference!

Then old optimisation wisdom came to mind - stuff that we’d read years back in an old ‘optimising Java code’ article:

We should only optimise the bottlenecks, which are measured by their relative stake in the total execution time.

A quick look at the table showed that the old hashing algorithm had a relative stake of near zero, so plugging in a better hashing algorithm would have no effect on the total execution time. We also realised that the absolute time difference, while still big, was in milliseconds... So what we had spent all that time and energy doing was optimising a part of MoonWalker that didn’t need it.

Fortunately, the merits of our incremental hashing did not have to be limited to a neat theoretic result. We profiled another model checker developed at NASA - the SPIN model checker - and observed that their hashing was a bottleneck. We implemented our incremental algorithm there and immediately saw up to a two-fold performance increase, depending on the model. Our algorithm and extensive experimental results have also recently been published at SPIN’08 (a well-known model checking conference)

Also, while profiling and measuring the effects of hashing in MoonWalker, we did detect that its garbage collector algorithm (not to be confused with the garbage collector in the virtual machine) had a far more significant stake - on average, 55%. We saw this as a great optimisation opportunity and also worked out a new algorithm to reduce this stake.

Figure 3 - Profiling data from MoonWalker. As can be shown, most of the time is spent in the method MarkAndSweepGC.Run().

Measure, Interpret and Optimise

There are a lot of opportunities to optimise a model checker. But it’s important is that one optimises the things that matter: the algorithms that take the biggest relative stake during execution. We invented a slick algorithm that worked well on paper, but didn’t make a difference in MoonWalker, and we optimised prematurely.

Along with measuring bottlenecks, interpretation of the results is just as important - when we initially used ANTS profiler, we looked at the absolute times when we should have looked at the percentages. Proper use of a profiler is important here! The real lesson to be learned here is:

Measure first, interpret carefully and optimise sensibly.

MoonWalker is open source (written in C#) and available from the MoonWalker site.

Viet Yen Nguyen

Author profile:

Viet Yen Nguyen is a research engineer at the RWTH Aachen University where he works on the European Space Agency funded COMPASS-project. He conducts research in the area of stochastic model checking and supervises a team of scientific programmers for developing a comprehensive model checking toolset that is usable for the aerospace industry.

Search for other articles by Viet Yen Nguyen

Rate this article:   Avg rating: from a total of 15 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Questions
Posted by: Viet Yen Nguyen (not signed in)
Posted on: Thursday, October 23, 2008 at 3:13 AM
Message: Hi! I hope you like the article and feel free to post here your questions. I'll look from now and then and try to answer them all!

Subject: Anonymous comments disabled
Posted by: Chris Massey (view profile)
Posted on: Friday, November 07, 2008 at 4:38 AM
Message: Anonymous commenting has been disabled in this article, due to spam. You'll need to sign in or sign up to comment here. Sorry for the inconvenience.

Subject: My performance tuning secret
Posted by: MikeDunlavey (view profile)
Posted on: Friday, June 19, 2009 at 4:52 PM
Message: ... just between you, me, and the world ...

I've been doing what is now called "profiling" since before such things existed. My method? This link gives a blow-by-blow: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773.
In a nutshell, a profiler is only useful if it
1) samples, and
2) it samples the whole call stack, not just the PC, and
3) it filters the samples to the time interval you care about, and
4) it shows you, for each instruction (especially call instructions) what % of stack samples contained that instruction. This last directly measures the cost of that instruction.

How did I do it without a profiler? Well, I just took the samples by hand. Some fibbers tell people that you need a lot of samples. You don't - think about it. In addition, I would look at the stack samples. The % of samples tell me what an instruction costs, but looking at the stack tells me how _necessary_ it is, and that is the key to finding slowness bugs.

So in your story, you wander around a bit between timing, guessing, and sampling. In my experience, this method goes straight to the heart of the problem with no detours.

Thanks for your good story.

 

Top Rated

Acceptance Testing with FitNesse: Multiplicities and Comparisons
 FitNesse is one of the most popular tools for unit testing since it is designed with a Wiki-style... Read more...

Acceptance Testing with FitNesse: Symbols, Variables and Code-behind Styles
 Although FitNesse can be used as a generic automated testing tool for both applications and databases,... Read more...

Acceptance Testing with FitNesse: Documentation and Infrastructure
 FitNesse is a popular general-purpose wiki-based framework for writing acceptance tests for software... Read more...

TortoiseSVN and Subversion Cookbook Part 11: Subversion and Oracle
 It is only recently that the tools have existed to make source-control easy for database developers.... Read more...

TortoiseSVN and Subversion Cookbook Part 10: Extending the reach of Subversion
 Subversion provides a good way of source-controlling a database, but many operations are best done from... Read more...

Most Viewed

A Complete URL Rewriting Solution for ASP.NET 2.0
 Ever wondered whether it's possible to create neater URLS, free of bulky Query String parameters?... Read more...

Visual Studio Setup - projects and custom actions
 This article describes the kinds of custom actions that can be used in your Visual Studio setup project. Read more...

.NET Application Architecture: the Data Access Layer
 Find out how to design a robust data access layer for your .NET applications. Read more...

Calling Cross Domain Web Services in AJAX
 The latest craze for mashups involves making cross-domain calls to Web Services from APIs made publicly... Read more...

Web Parts in ASP.NET 2.0
 Most Web Parts implementations allow users to create a single portal page where they can personalize... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.