Click here to monitor SSC
  • Av rating:
  • Total votes: 37
  • Total comments: 5
Fabiano Amorim

Showplan Operator of the Week - Merge Interval

11 January 2011

When Fabiano agreed to undertake the epic task of describing each showplan operator, none of us quite predicted the interesting ways that the series helps to understand how the query optimizer works. With the Merge Interval, Fabiano comes up with some insights about the way that the Query optimizer handles overlapping ranges efficiently.


Merge
Interval

Hello dear readers. Once again here I am to talk with you about ShowPlan operators. Over the past few weeks and months, we’ve featured the ShowPlan operators that are used by SQL Server to build the query plan. If you’re just getting started with my Showplan series, you can find a list of all my articles here.

Introduction

In my last article I wrote about the Merge Join operator, and now I would like to continue with the subject “Merge” but now is time to feature another kind of merge, the Merge Interval operator.

Last week I was working, with a customer in Finland, to optimize some queries when I saw this operator in the execution plan. Because this is not very well documented, I’ll try to cover all aspects and bugs related to this operator (a.k.a. iterators).

In short, this is used to remove duplicated predicates in a query, and to find possible overlapping intervals in order to optimize these filters so as to avoid  scanning the same data more than once.

As always, I completely understand that this is not as simple as I’ve just stated. You may have to read what I wrote more than three times to understand what I mean: Don’t worry about that, because I’ll go deep into this subject step by step so as to make it easier for you to understand.

Creating sample data

To illustrate the Merge Interval behaviour, I’ll start by creating one table called “Pedidos” (which means ‘Orders’ in Portuguese). The following script will create the tables and populate them with some garbage data:

USE tempdb

GO

 

IF OBJECT_ID('Pedidos') IS NOT NULL

  DROP TABLE Pedidos

GO

 

CREATE TABLE Pedidos (ID INT IDENTITY(1,1) PRIMARY KEY,

        ID_Cliente INT NOT NULL,

        Quantidade SmallInt NOT NULL,

        Valor Numeric(18,2) NOT NULL,

        Data DATETIME NOT NULL)

GO

 

DECLARE @I SmallInt

SET @I = 0

 

WHILE @I < 10000

BEGIN

  INSERT INTO Pedidos(ID_Cliente, Quantidade, Valor, Data)

    SELECT ABS(CheckSUM(NEWID()) / 100000000),

           ABS(CheckSUM(NEWID()) / 10000000),

           ABS(CONVERT(Numeric(18,2), (CheckSUM(NEWID()) / 1000000.5))),

           GETDATE() - (CheckSUM(NEWID()) / 1000000)

  SET @I = @I + 1

END

GO

Now that we have the table, we have to create two non-clustered indexes. The first uses the column ID_Cliente as a Key, including the column Valor to create a covered index to our query. And another using the column Data as a Key and including the column Valor.

CREATE NONCLUSTERED INDEX ix_ID_Cliente ON Pedidos(ID_Cliente) INCLUDE (Valor)

GO

CREATE NONCLUSTERED INDEX ix_Data ON Pedidos(Data) INCLUDE (Valor)

GO

Merge Interval

Now that we have the data, we can write a query to see the merge interval. The following query is selecting the amount of sales for four customers:

SELECT SUM(Valor) AS Val

  FROM Pedidos

 WHERE ID_Cliente IN (1,2,3,4)

GO

For the query above we’ve the following execution plan:

Figure 1 – Execution Plan (click on this to see it full-size)

In the execution plan above we can see that QO chose to use the index ix_ID_Cliente to seek the data for each ID_Cliente specified in the IN clause, and then uses the Stream Aggregate to perform the sum.

This is a classic Index Seek task, for each value SQL Server will read the data throw the balanced index tree searching for the ID_Cliente. For now, It doesn’t require the Merge Interval.

Now let’s looks at a similar query:

DECLARE @v1 Int = 1,

        @v2 Int = 2,

        @v3 Int = 3,

        @v4 Int = 4

 

SELECT SUM(Valor) AS Val

  FROM Pedidos

 WHERE ID_Cliente IN (@v1, @v2, @v3, @v4)

GO

For the query above we’ve the following execution plan:

Figure 2 – Execution Plan  (click on this to see it full-size)

As you can see, the only difference between the queries is that now we are using variables instead of constant values, but the Query Optimizer creates a very different execution plan for this query. So the question is: What do you think? Do you think that SQL should have used the same execution plan for this query?

The right answer is No. Why not? Because at the compile time SQL Server doesn’t know the values of the constants, and if the values turn out to be duplicates, then it will read the same data twice. Suppose that the value of the @v2 is also “1”, SQL will read the ID 1 twice, one for variable @v1 and another for variable @v2, something that we don’t expect to see since we expect performance, read the same data twice is not good. So it has to uses the Merge Interval to remove the duplicate occurrences.

Let’s wait a minute Fabiano! Are you saying that for the first query, QO automatically removes the duplicated occurrences in the IN clause?

Yes. Do want to see it?

SELECT SUM(Valor) AS Val

  FROM Pedidos

 WHERE ID_Cliente IN (1,1,3,4)

GO

For the query above we’ve the following execution plan:

Figure 3 – Execution Plan  (click on this to see it full-size)

You will see that now we only have three Seek Predicates. Perfect.

Let’s back to Merge Interval plan.

The plan is using the operators Compute Scalar, Concatenation, Sort and Merge Interval to eliminate the duplicated values at the execution plan phase.

At this time, maybe some questions are rising in your mind. First: Why SQL Server don’t just uses a DISTINCT in the IN variables to remove the joins? Second: Why this is called a “Merge”, I didn’t see anything related to a merge here.

The answer is that the Query Optimizer (QO) uses this operator to perform the DISTINCT because, with this code, the QO also recognize overlapping intervals and will potentially merge these to non-overlapping intervals that will then be used to seek the values. To understand this better let’s suppose that we have the following query that doesn’t use variables.

SELECT SUM(Valor) AS Val

  FROM Pedidos

 WHERE ID_Cliente BETWEEN 10 AND 25

    OR ID_Cliente BETWEEN 20 AND 30

GO

Now, let’s look at the execution plan:

Figure 4 – Execution Plan  (click on this to see it full-size)

Notice how smart the Query Optimizer was. (That’s is why I love it!) It recognizes the overlap between the predicates, and instead of doing two seeks in the index (one for each between filter), it creates a plan that performs just one seek.

Now let’s change the query to use the variables.

DECLARE @v_a1 Int = 10,

        @v_b1 Int = 20,

        @v_a2 Int = 25,

        @v_b2 Int = 30

 

SELECT SUM(Valor) AS Val

  FROM Pedidos

 WHERE ID_Cliente BETWEEN @v_a1 AND @v_a2

    OR ID_Cliente BETWEEN @v_b1 AND @v_b2

GO

For this  query  we’ve the following execution plan:

Figure 5 – Execution Plan  (click on this to see it full-size)

Let’s check what the plan is doing using a different perspective. First let’s understand the overlap.

Figure 6 – Overlap between 20 and 25

In the figure 6 we can see that if SQL Server reads the ranges separately, it will read the range from 20 to 25 twice. I’ve used a small range to test with, but think in terms of a very large scan that we’d see in a production database;  if we can avoid this step, then we’ll see a great performance improvement.

Figure 7 – After Merge Interval

After the Merge Interval runs, SQL Server can seek only the final range. It knows that is possible to go to @v_a1 to @vb_2 directly.

Finally

To finish this subject I would like to recommend you to read about a bug in SQL Server 2005 caused by a mistake in this process, you could take a better look in the blog of  Mladen Prajdic a SQL Server MVP from Slovenia.

I wouldn’t miss the opportunity to congratulate the Microsoft guys that build Icons in SQL Server/Windows. I once read a book called “The Icon Book”, it was amazing how beautiful and meaningful the icons in the graphical query plan  are. The Merge Interval icon is perfect, if you look at the icon you will see what it is exactly doing. Brilliant, it’s incredible how they can express something in a small picture. Well Done!

That’s all folks, I hope you’ve enjoyed learning about Merge Join operator, and I’ll see you soon with more “Showplan Operators”.

Fabiano Amorim

Author profile:

Fabiano is fascinated by the SQL Server Query Processor and the way it works to optimize queries, procedures and functions. He graduated as a Technical Processor from Colégio Bezerra de Menezes, SP- Brazil, and has worked for several years with SQL Server, focusing in creating Data Warehouses and optimizing T-SQL codes for many companies in Brazil and Argentina. Fabiano is a SQL Server MVP, MCP for SQL Server 2000, MCTS and MCITP Data Base Developer for SQL Server 2005 and 2008. He also is actively involved in SQL Server community though forums such as MSDN and TechNet Brazil, writes articles for Simple-Talk and SQL Server Magazine Brazil, and he also presents online Webcasts and In-Person events for Microsoft Brazil. His blog is on http://blogfabiano.com

Search for other articles by Fabiano Amorim

Rate this article:   Avg rating: from a total of 37 votes.


Poor

OK

Good

Great

Must read
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Good Article
Posted by: JohnA (not signed in)
Posted on: Wednesday, January 19, 2011 at 9:41 AM
Message: Comprehensive explanation of a rather obscure operator.
Nice.

Subject: Tks
Posted by: mcflyamorim (view profile)
Posted on: Thursday, January 20, 2011 at 6:25 AM
Message: Thanks John, I'm glad you like it :-)

Subject: I think you your blog will be more popular in english :)
Posted by: Chen Noam (not signed in)
Posted on: Monday, January 24, 2011 at 1:16 AM
Message: Hi Fabiano,

I like your articles.

I would read your blog as well, but I can't since it's not in English.

Thanks,
Chen

Subject: Excellent Article
Posted by: Seth (not signed in)
Posted on: Tuesday, January 25, 2011 at 12:16 AM
Message: Iam a newbie into this DBA World , your articles

Wakeup my thinking brain, and lead me to new
directions.

Thanks Fabiano


Subject: Wow!
Posted by: salvagedog (view profile)
Posted on: Friday, January 28, 2011 at 8:22 AM
Message: Very nice.

 

Phil Factor
Searching for Strings in SQL Server Databases

Sometimes, you just want to do a search in a SQL Server database as if you were using a search engine like Google.... Read more...

 View the blog

Top Rated

Continuous Delivery and the Database
 Continuous Delivery is fairly generally understood to be an effective way of tackling the problems of... Read more...

The SQL Server Sqlio Utility
 If, before deployment, you need to push the limits of your disk subsystem in order to determine whether... Read more...

The PoSh DBA - Reading and Filtering Errors
 DBAs regularly need to keep an eye on the error logs of all their SQL Servers, and the event logs of... Read more...

MySQL Compare: The Manual That Time Forgot, Part 1
 Although SQL Compare, for SQL Server, is one of Red Gate's best-known products, there are also 'sister'... Read more...

Highway to Database Recovery
 Discover the best backup and recovery articles on Simple-Talk, all in one place. Read more...

Most Viewed

Beginning SQL Server 2005 Reporting Services Part 1
 Steve Joubert begins an in-depth tour of SQL Server 2005 Reporting Services with a step-by-step guide... Read more...

Ten Common Database Design Mistakes
 If database design is done right, then the development, deployment and subsequent performance in... Read more...

SQL Server Index Basics
 Given the fundamental importance of indexes in databases, it always comes as a surprise how often the... Read more...

Reading and Writing Files in SQL Server using T-SQL
 SQL Server provides several "standard" techniques by which to read and write to files but, just... Read more...

Concatenating Row Values in Transact-SQL
 It is an interesting problem in Transact SQL, for which there are a number of solutions and... Read more...

Why Join

Over 400,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.