This blog post tries to explain the results of the first series of tests. To accomplish this goal I will do a new series of tests using (almost) the same methodology. The only difference is how I select the rows (164 rows or 1640 rows) used for testing. Please remember that the clustered index of Sales.Customer table (PK_Customer_CustomerID) has 121 leaf pages and 19820 rows. This is the script used for this:
INSERT INTO dbo.[Rows](CustomerID)
SELECT x.CustomerID
FROM(
SELECT c.CustomerID,(ROW_NUMBER() OVER(ORDER BY c.CustomerID)-1) % 121 AS RowNum
FROM Sales.Customer c
) x
WHERE x.RowNum = 0; /* x.RowNum BETWEEN 0 AND 9; */
This time I got these results:
To compare both results in a simple manner I added the previous results:
As you can see, the current test series (#2) has some interesting facts:
-
Merge Join and Scan Merge Join operators give the best performance only from the point of view of logical reads.
-
However the best overall performance (Duration, CPU) is achieved by Nested Loops and Index Seek (which isn’t all surprising).
-
And, again, these series of tests show that [Logical] Reads shouldn’t be the only key performance indicator used when we evaluate different execution plans for the same query.
Why ?
Visually, the main difference between these two series of tests is as follows:
Test series #1 has to find 164 records which are physical stored in the “first” leaf page of PK_Customer_CustomerID clustered index (or it has to find 1640 records stored in the “first” 10 leaf pages of the same index). On the other hand test series #2 reads reads the same number of records (164 records, 1640 records) which are equally distributed in all those 121 leaf pages of PK_Customer_CustomerID clustered index. As you can see, this small detail (the physically distribution of records in the clustered index: PK_Customer_CustomerID) has a huge impact on the query performance.
The last question
The last question and the main question is (I think) why Merge Join + Index Scan (Query 3; all 164 records are stored in the “first” leaf page) give the best results for the Test series #1 ?
We would expect that Clustered Index Scan operator to read all records from all leaf pages:
But, the output of SET STATISTICS IO ON shows something unexpected:
Table ‘Rows’. Scan count 1, logical reads 2
As you can see, instead of showing 1 (root page) + 121 leaf pages + 1 page IAM = the output of STATISTICS IO shows only 3 logical reads representing 1 root page + 1 leaf page + 1 IAM = 123 logical reads. This output shows that after reading 164 records from the “first” leaf page the Clustered Index Scan stops and doesn’t tries to read following leaf pages:
From this point of view, the behavior of Scan operator in this context is partially similar to behavior of Index Seek – range scan :
FROM Sales.Customer c
WHERE c.CustomerID <= 164
ORDER BY c.CustomerID;
The behavior of this Scan operator is also partially similar to behavior of Clustered Index Scan in the following query:
FROM Sales.Customer c
ORDER BY c.CustomerID;
Table ‘Customer’. Scan count 1, logical reads 2
Remarks
-
A Scan operator is not always something “bad”. In some cases, Scan will give better results than Index Seek (at least than Index Seek – singleton seek).
-
Not always Merge Join gives the best results even if there are proper indexes.
-
Logical reads shouldn’t be the only query performance metric used to measure the query performance. You should look at CPU and, of course, Duration.
Note: To understand why I used all these table/index hints
– index_id 1 is the clustered index (PK_Customer_CustomerID)
MyTable WITH(INDEX(1),FORCESCAN)
please read this blog post.
Leave a Reply