Overview
There is an old debate in SQL Server community: is always Index Seek better than Index Scan ?
First, I would like to explain what represents each of them. Visually, difference between them can be described thus:
As you can see, Seek has two subtypes: singleton lookup and range scan. A singleton lookup is specific to an unique index and it searches a single key value starting from root page towards leaf pages. This means that a singleton lookup will touch at most one record from leaf pages.
Usually, a Seek of range scan subtype supposes searching key values from root page towards first leaf page (which includes the first or last key value in range) and then from that point it scans forward or backward the leaf pages. This means that a range scan could touch more records from leaf pages.
An Index Scan starts from root page and navigates towards first or last record (leaf page) and from that point it reads forward or backward all (usually) records within leaf pages.
What you have seen above is a simplified description for what is happens for a single execution. But these operators could have many executions. For example, an Index Seek – singleton lookup could search 100 key values:
Above picture shows two properties regarding to number of executions: the estimated value (1; which is generated at compile time before the query is executed) and the actual number of executions (100; which is generated after the query is executed). Because of that, it’s important the performance of the operator as a all. In this example, because the index (PK_Customer_CustomerID) has two levels
Code Snippet
USE AdventureWorks2008R2;
GO
SELECT f.page_count,f.index_level
FROM sys.dm_db_index_physical_stats
(
DB_ID(),
OBJECT_ID(N'Sales.Customer'),
1, – Clustered index
NULL,
'DETAILED'
) f;
/*
page_count index_level
———- ———–
121 0
1 1
*/
every singleton lookup generates two logical reads and the overall performance is 100 key values x 2 logical reads = 200 logical reads. But this value is greater than 1 + 121 logical reads + 1 logical read (IAM) = 123 logical reads which is the I/O performance of a simple Index Scan on the same index.
It’s just about logical reads ? No. What it matters is the overall performance (logical reads, CPU and duration) of current query and of every operator. To better understand I have done the following test.
Test
For this test I used AdventureWorks2008R2 database. I created a table named dbo.Rows
Code Snippet
CREATE TABLE dbo.[Rows](CustomerID INT NOT NULL PRIMARY KEY);
which is used to store 164 rows (for the first test) or 1640 rows (for the second test). Every row from dbo.Rows table is sought in Sales.Customer table (19820 rows) using these queries:
Code Snippet
– Query 1 SELECT @cnt
=COUNT(*) FROM dbo
.[Rows] r
INNER JOIN Sales
.Customer c
WITH(FORCESEEK
(PK_Customer_CustomerID
(CustomerID
))) ON c
.CustomerID
=r
.CustomerID
;
– Query 2
SELECT @cnt=COUNT(*)
FROM dbo.[Rows] r
INNER HASH JOIN Sales.Customer c WITH(INDEX(1),FORCESCAN)
ON c.CustomerID=r.CustomerID;
– Query 3
SELECT @cnt=COUNT(*)
FROM dbo.[Rows] r
INNER MERGE JOIN Sales.Customer c WITH(INDEX(1),FORCESCAN)
ON c.CustomerID=r.CustomerID;
Execution plans:
First execution plan (Query 1) includes a Seek operator on PK_Customer_CustomerID clustered index. Second execution plan (Query 2) includes an Index Scan on PK_Customer_CustomerID and a Hash Join and third execution plan (Query 3) includes an Index Scan on the same index and a Merge Join.
Test methodology
1. Insert into dbo.Rows 164 rows (the first 164 rows from Sales.Customer consumes almost one 8K page without B-Tree overhead). Second test inserts into dbo.Rows first 1640 rows.
2. Execute 100 interations for Query 1. Execute 100 interations for Query 2. Execute 100 interations for Query 3
3. Execute 1000 interations for Query 1. Execute 1000 interations for Query 2. Execute 1000 interations for Query 3.
4. Execute 10000 interations for Query 1. Execute 10000 interations for Query 2. Execute 10000 interations for Query 3.
Download script
Note: 164 is the average number of rows in every leaf page of PK_Customer_CustomerID clustered index.
Results
Below are the results (SQL Profiler, SQL:BatchCompleted) :
Remarks
Query 2 and Query 3 (Scan) have a better [Logical] Reads performance than Query 1 (Seek) but Query 2 (Scan + HASH JOIN) has a bigger CPU time and the Duration is also bigger than Query 1. So, when we have to optimize a query, it isn’t enough to optimize Logical Reads but we have to pay attention also at CPU time and Duration. On the other hand, Query 3 (Scan + Merge Join) seems to offer the best performance in a consistently manner. If I would say that for this test Scan is a better than Seek it would be misleading.
Instead, my remark is that for this test Scan + Merge Join is a better option than Seek + Nested Loops.
In following post I will try to explain these results.
Revision History
2013-08-18: Updated [Download script] link and the Results table (column Interations#)