11 January 2011

Showplan Operator of the Week – Merge Interval

When Fabiano agreed to undertake the epic task of describing each showplan operator, none of us quite predicted the interesting ways that the series helps to understand how the query optimiser works. With the Merge Interval, Fabiano comes up with some insights about the way that the Query optimiser handles overlapping ranges efficiently

1214-merge_1.gif
Merge
Interval

Hello dear readers. Once again here I am to talk with you about ShowPlan operators. Over the past few weeks and months, we’ve featured the ShowPlan operators that are used by SQL Server to build the query plan. If you’re just getting started with my Showplan series, you can find a list of all my articles here.

Introduction

In my last article I wrote about the Merge Join operator, and now I would like to continue with the subject “Merge” but now is time to feature another kind of merge, the Merge Interval operator.

Last week I was working, with a customer in Finland, to optimize some queries when I saw this operator in the execution plan. Because this is not very well documented, I’ll try to cover all aspects and bugs related to this operator (a.k.a. iterators).

In short, this is used to remove duplicated predicates in a query, and to find possible overlapping intervals in order to optimize these filters so as to avoid  scanning the same data more than once.

As always, I completely understand that this is not as simple as I’ve just stated. You may have to read what I wrote more than three times to understand what I mean: Don’t worry about that, because I’ll go deep into this subject step by step so as to make it easier for you to understand.

Creating sample data

To illustrate the Merge Interval behaviour, I’ll start by creating one table called “Pedidos” (which means ‘Orders’ in Portuguese). The following script will create the tables and populate them with some garbage data:

Now that we have the table, we have to create two non-clustered indexes. The first uses the column ID_Cliente as a Key, including the column Valor to create a covered index to our query. And another using the column Data as a Key and including the column Valor.

Merge Interval

Now that we have the data, we can write a query to see the merge interval. The following query is selecting the amount of sales for four customers:

For the query above we’ve the following execution plan:

1214-image002-630x303.jpg

Figure 1 – Execution Plan (click on this to see it full-size)

In the execution plan above we can see that QO chose to use the index ix_ID_Cliente to seek the data for each ID_Cliente specified in the IN clause, and then uses the Stream Aggregate to perform the sum.

This is a classic Index Seek task, for each value SQL Server will read the data throw the balanced index tree searching for the ID_Cliente. For now, It doesn’t require the Merge Interval.

Now let’s looks at a similar query:

For the query above we’ve the following execution plan:

1214-image005.jpg

Figure 2 – Execution Plan  (click on this to see it full-size)

As you can see, the only difference between the queries is that now we are using variables instead of constant values, but the Query Optimizer creates a very different execution plan for this query. So the question is: What do you think? Do you think that SQL should have used the same execution plan for this query?

The right answer is No. Why not? Because at the compile time SQL Server doesn’t know the values of the constants, and if the values turn out to be duplicates, then it will read the same data twice. Suppose that the value of the @v2 is also “1”, SQL will read the ID 1 twice, one for variable @v1 and another for variable @v2, something that we don’t expect to see since we expect performance, read the same data twice is not good. So it has to uses the Merge Interval to remove the duplicate occurrences.

Let’s wait a minute Fabiano! Are you saying that for the first query, QO automatically removes the duplicated occurrences in the IN clause?

Yes. Do want to see it?

For the query above we’ve the following execution plan:

1214-image006-630x230.jpg

Figure 3 – Execution Plan  (click on this to see it full-size)

You will see that now we only have three Seek Predicates. Perfect.

Let’s back to Merge Interval plan.

The plan is using the operators Compute Scalar, Concatenation, Sort and Merge Interval to eliminate the duplicated values at the execution plan phase.

At this time, maybe some questions are rising in your mind. First: Why SQL Server don’t just uses a DISTINCT in the IN variables to remove the joins? Second: Why this is called a “Merge”, I didn’t see anything related to a merge here.

The answer is that the Query Optimizer (QO) uses this operator to perform the DISTINCT because, with this code, the QO also recognize overlapping intervals and will potentially merge these to non-overlapping intervals that will then be used to seek the values. To understand this better let’s suppose that we have the following query that doesn’t use variables.

Now, let’s look at the execution plan:

1214-image008-630x239.jpg

Figure 4 – Execution Plan  (click on this to see it full-size)

Notice how smart the Query Optimizer was. (That’s is why I love it!) It recognizes the overlap between the predicates, and instead of doing two seeks in the index (one for each between filter), it creates a plan that performs just one seek.

Now let’s change the query to use the variables.

For this  query  we’ve the following execution plan:

1214-ExecutionPlan.jpg

Figure 5 – Execution Plan  (click on this to see it full-size)

Let’s check what the plan is doing using a different perspective. First let’s understand the overlap.

1214-image013.jpg

Figure 6 – Overlap between 20 and 25

In the figure 6 we can see that if SQL Server reads the ranges separately, it will read the range from 20 to 25 twice. I’ve used a small range to test with, but think in terms of a very large scan that we’d see in a production database;  if we can avoid this step, then we’ll see a great performance improvement.

1214-image015.jpg

Figure 7 – After Merge Interval

After the Merge Interval runs, SQL Server can seek only the final range. It knows that is possible to go to @v_a1 to @vb_2 directly.

Finally

To finish this subject I would like to recommend you to read about a bug in SQL Server 2005 caused by a mistake in this process, you could take a better look in the blog of  Mladen Prajdic a SQL Server MVP from Slovenia.

I wouldn’t miss the opportunity to congratulate the Microsoft guys that build Icons in SQL Server/Windows. I once read a book called “The Icon Book”, it was amazing how beautiful and meaningful the icons in the graphical query plan  are. The Merge Interval icon is perfect, if you look at the icon you will see what it is exactly doing. Brilliant, it’s incredible how they can express something in a small picture. Well Done!

That’s all folks, I hope you’ve enjoyed learning about Merge Join operator, and I’ll see you soon with more “Showplan Operators”.

Keep up to date with Simple-Talk

For more articles like this delivered fortnightly, sign up to the Simple-Talk newsletter

This post has been viewed 14025 times – thanks for reading.

  • Rate
    [Total: 40    Average: 4.6/5]
  • Share

Fabiano Amorim

View all articles by Fabiano Amorim

Related articles

Also in BI

Relational Algebra and its implications for NoSQL databases

With the rise of NoSQL databases that are exploiting aspects of SQL for querying, and are embracing full transactionality, is there a danger of the data-document model's hierarchical nature causing a fundamental conflict with relational theory? We asked our relational expert, Hugh Bin-Haad to expound a difficult area for database theorists.… Read more

Also in Learn SQL Server

SQL Server System Views: The Basics

When maintaining or refactoring an unfamiliar database, you'll need a fast way to uncover all sorts of facts about the database, its tables, columns keys and indexes. SQL Server's plethora of system catalog views, INFORMATION_SCHEMA views, and dynamic management views contain all the metadata you need, but it isn't always obvious which views are best to use for which sort of information. Many of us could do with a simple explanation, and who better to provide one than Rob Sheldon?… Read more

Also in Optimiser

When AUTO_UPDATE_STATISTICS Doesn't Happen

When your SQL Server database is set to have its statistics automatically updated, you will probably conclude that, whenever the distribution statistics are out-of-date, they will be updated before the next query is executed against that index or table. Curiously, this isn't always the case. What actually happens is that the statistics only gets updated if needed by the query optimiser to determine an effective query plan.… Read more

Also in Showplan

Complete Showplan Operators

Fabiano Amorim has taken the time to reallv drill into the behavior of a small set of execution plan operators in an effort to explain the optimizer's behavior. He's explored why things happen, how you can change them, positively or negatively, and he's done it all in an approachable style.… Read more
  • JohnA

    Good Article
    Comprehensive explanation of a rather obscure operator.
    Nice.

  • mcflyamorim

    Tks
    Thanks John, I’m glad you like it 🙂

  • Chen Noam

    I think you your blog will be more popular in english 🙂
    Hi Fabiano,

    I like your articles.

    I would read your blog as well, but I can’t since it’s not in English.

    Thanks,
    Chen

  • Seth

    Excellent Article
    Iam a newbie into this DBA World , your articles

    Wakeup my thinking brain, and lead me to new
    directions.

    Thanks Fabiano

  • salvagedog

    Wow!
    Very nice.

Join Simple Talk

Join over 200,000 Microsoft professionals, and get full, free access to technical articles, our twice-monthly Simple Talk newsletter, and free SQL tools.

Sign up

See what's happening behind the scenes

Take a peek at the bowels of the ship – the lower decks – the actual servers of SQL Server Central itself.

See what's happening