Holger Schmeling’s work in SQL Server Distribution Statistics was first published as a pair of long articles for Simple-Talk. The booklet takes a practical, and well-informed approach to a subject that is not always easy to explain or understand.
A Distribution Statistics object is a sampled map of the distribution of data in a table. Such statistics help the Query Optimizer of SQL Server’s Database Engine to translate the ‘logical’ SQL Query into the best strategy for doing all the physical operations of joining, sorting, and selecting the data. Most of the time, this work happens effectively without the database administrator having to be aware of what is going on, but occasionally, if the statistics no longer reflect what is actually in the table, then queries can suddenly run grindingly slow. At this point, the DBA must intervene to correct the problem.
Holger divides his work into two parts. In the first part, he explains what ‘statistics’ are, why they are there, how they are created, updated and removed. He shows how to inspect them and to maintain them. In the second part, he lists all the problems that are related to these statistics objects, and how to solve them.
DBAs and Database developers need a reasonable understanding of distribution statistics so as to be prepared for those occasions when things go wrong. SQL Server Distribution Statistics are used by the Query Optimizer to translate what the user has declared to be his wishes for a result, the SQL, into a ‘physical’ strategy that is quick and economical for the database. The relative performance of the various ‘Physical’ strategies for executing the query will vary according to the quantity and distribution of the data within the tables. The Query optimizer uses the statistics to work out the relative merits of the possible strategies. When the statistics are outdated, or missing then queries will miss out on the best strategy for execution and can, at the worst, will get assigned a disastrously poor strategy.
Reading Holger’s booklet is a great way of brushing up on the subject of Statistics and query-optimization. It is reasonably short, and geared to the practitioner rather than the theoretician. It is careful to stick with the things you need to know, and resists the temptation to launch into discourses on the esoteric science of query optimization. For this we are all grateful.