Yet another SQL Server enthusiast

2014-04-08

SQL Server: SARG-able predicates and Scan operators

Filed under: index,optimization,SARG,scan,seek,sql server — ---- @ 12:15 AM

I will give a presentation at .Net Bucharest software developers meeting about SARG-able predicates and those reasons that can cause scans even there are proper indexes. For every example I will present the problem and the solutions.

Files: Net_User_Group_Bucharest_2014-04-08.zip

 

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Filed under: concurrency,EXISTS,index,MERGE,safe,sql server,UPSERT — ---- @ 10:47 PM

Is it safe the following approach for UPSERT ?

Code Snippet
IF EXISTS (SELECT FROM MyTable WHERE ID = @param1)
    UPDATE    MyTable
    SET        Col1 = NewValue
    WHERE    ID = @param1
ELSE
    INSERT    MyTable(ID, Col1)
    VALUES    (@param1, @param2)

The short answer is no!

Example:

  1. The transaction isolation level is READ COMMITTED which is the default isolation level.
  2. For this demo I’ve used a customer table to store the customer’s email and the number of sales orders (for every customer).
  3. I’ve inserted two customers (1 – [email protected]; 2 – [email protected]).
  4. I’ve created a stored procedure for UPSERT (dbo.InsertOrder). This stored procedure receive the customer’s email, it checks if exists this email and if not then it inserts the new customer (sales orders 1). Otherwise it increments the number of sales orders by 1.

T-SQL script:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
IF OBJECT_ID(N'dbo.InsertOrder') IS NOT NULL
    DROP PROCEDURE dbo.InsertOrder;
GO

CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

INSERT    dbo.Customer (Email)
VALUES    (N'[email protected]'), – CustomerID 1
        (N'[email protected]'); – CustomerID 2
GO

CREATE PROCEDURE dbo.InsertOrder (
    @Email NVARCHAR(100)
)
AS
BEGIN
    IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
    BEGIN
        UPDATE    dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE    Email = @Email;
    END
    ELSE
    BEGIN
        WAITFOR DELAY '00:00:03';
        INSERT    dbo.Customer (Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END
END;
GO

Note: WAITFOR DELAY is used to delay the INSERTion of the new customers with just 3 seconds.

Now we can start this test executing following statements in separate windows:

Is.NOT.Safe.If.Exists.Update.Insert.UPSERT

Query window #1 (Step 1):

Code Snippet
BEGIN TRANSACTION
    UPDATE    dbo.Customer
    SET        Note = N'Note#2'
    WHERE    Email = N'[email protected]';
– ROLLBACK — or COMMIT

At this moment the CustomerID = 2 is locked X (exclusive) and no other transaction can read (excepting NOLOCK and READ UNCOMMITTED), update or delete this record. Because this transaction remains open (no COMMIT or ROLLBACK) this record will be locked till the end of transaction.

Query window #2 & #3 (Step 2 & Step 3):

Code Snippet
EXEC dbo.InsertOrder N'[email protected]';
SELECT * FROM dbo.Customer;

At this moment the IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email) statement will try to read the record with CustomerID = 1 and then the record with CustomerID = 2 requesting a S lock (shared). Because X and S locks aren’t compatible (see Lock Compatibility) both (Step 2 & 3) SELECT * FROM dbo.Customer WHERE Email = @Email statements will be blocked. Note: you can see the “Executing ….” message within tab title.

Query window #1 (Step 4):

I execute the ROLLBACK or COMMIT statement which ends the initial transaction. The X lock for CustomerID = 2 record is released and the SELECT statements (SELECT * FROM dbo.Customer WHERE Email = @Email; Step 2 & 3)  continue to be executed. Every EXISTS(SELECT …. WHERE Email = [email protected]) statement will return FALSE and both stored procedures will INSERT the customer `[email protected]`.

This is the reason why I get duplicated emails at the end (Step 5) of my example:

Results.Step5Solutions ?

Simplest solution is to create an unique index/constraint on Email column.

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This UNIQUE index/constraint guarantees that we can’t have duplicated emails. But this doesn’t means this UPSERT approach is 100% safe because dbo.InsertOrder stored procedure still tries to INSERT duplicated emails (at Step 5):

Results.Step5.With.UNIQUE.index

Next blog On Jan 27 I will discuss what solutions do we have for safe UPSERTs.

See also:

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #2

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #3

Revision History:

2014-01-12: I changed the last paragraph.

2014-01-27: I changed the last paragraph.

2014-04-06: I added the [See also] section

2013-12-09

Non SARG-able predicates #1

Sometimes isn’t enough to create an index. to optimize a query. Even we create the proper index we may discover by examining the execution plan that SQL Server doesn’t use that an Index Seek to find the rows  using instead an Index Scan.

For example, if we want to count the number of sales order for December 2005 a first attempt can use following query and an index on OrderDate:

Code Snippet
CREATE INDEX IX_SalesOrderHeader_OrderDate
ON Sales.SalesOrderHeader (OrderDate);
GO
SET STATISTICS IO ON;
DBCC FREEPROCCACHE;
GO
SELECT COUNT(*)
FROM Sales.SalesOrderHeader h
WHERE YEAR(h.OrderDate) = 2005 AND MONTH(h.OrderDate) = 12
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 73
*/

Execution.Plan.Index.Scan

We can see that for this query the execution plan includes an Index Scan instead of Index Seek. The reason is these two predicates aren’t SARG-able. A SARG-able predicates allows an Index Seek if an index is available.

The second attempt uses a BETWEEN predicate (which is SARG-able) thus:

Code Snippet
DBCC FREEPROCCACHE;
GO
SELECT    COUNT(*)
FROM    Sales.SalesOrderHeader h
WHERE    h.OrderDate BETWEEN '20051201' AND '20051231 23:59:59.997'
– or WHERE    h.OrderDate >= '20051201' AND h.OrderDate < '20060101'
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 3
*/

Execution.Plan.Index.Seek

This time the execution plan includes an Index Seek and the performance (Logical Reads) is better: just 3 pages / logical reads (Seek) instead of 73 pages logical reads (Index Scan).

More information on SARG-able predicates can be found here.

Note: For this test I used Adventure Works for SQL Server 2012 (data file) database

2013-10-30

SQL Server 2014: inline non-unique indexes

Filed under: index,sql server,sql server 2014,Uncategorized — ---- @ 7:49 AM

SQL Server 2014 (download CTP 2) has new T-SQL features. For example you can create inline non-unique indexes. Till SQL Server 2014 the only option to create inline indexes was to define PRIMARY KEY [NON]CLUSTERED or UNIQUE [NON]CLUSTERED constraints because SQL Server maintains these primary key/unique constraints by creating unique indexes (see section “PRIMARY KEY or UNIQUE constraint” from Create Unique Indexes).

Example:

Code Snippet
CREATE TABLE dbo.SalesOrder
(
    SalesOrderID INT IDENTITY(1,1),
        – Old method to create an CLUSTERED unique index
        CONSTRAINT PK_SalesOrder_SalesOrderID PRIMARY KEY CLUSTERED (SalesOrderID),
    OrderNumber CHAR(8) NOT NULL,
        – Old method to create an NON-CLUSTERED unique index
        CONSTRAINT UN_SalesOrder_OrderNumber UNIQUE(OrderNumber),
    – [New] This will create a non-unique index on (OrderDate)
    OrderDate DATE NOT NULL INDEX IX_SalesOrder_1 NONCLUSTERED ,
    CustomerID INT NOT NULL,
        – [New] This will create a non-unique index on (CustomerID, OrderDate)
        INDEX IX_SalesOrder_2 NONCLUSTERED (CustomerID, OrderDate)
);

2013-10-04

Finding Scans

Filed under: index,scan,sql server 2008 r2 — ---- @ 8:19 AM

Starting with SQL Server 2005, the execution plans are stored as XML. Combining this fact with powerful XQueries the result is endless (well, almost) possibilities. For example, somebody could write a simple SELECT statement to find out all execution plans including [ [ Clustered] Index | Table ] Scan operators.

Step #1 Run this script to create a stored procedure:

Code Snippet
USE AdventureWorks2008R2;
GO
CREATE PROCEDURE Sales.spTest
AS
SELECT  soh.SalesOrderID
FROM    Sales.SalesOrderHeader soh
WHERE   soh.ShipDate>='20070101';
GO

Step #2 Execute the newly created stored procedure (EXEC Sales.spTest) using Actual Execution Plan option activated (see here how). The result will be the following execution plan:

XmlExecutionPlan

Step #3: Now, run the following SELECT statement to find out all execution plans which include an [ [ Clustered] Index | Table ] Scan operator:

Code Snippet
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
WITH XMLNAMESPACES
(
    DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan&#039;
)
SELECT    TOP(10)
        crt_db.[dbid],
        DB_NAME(crt_db.[dbid]) AS database_name,
        pl.objectid,
        QUOTENAME(OBJECT_SCHEMA_NAME(pl.objectid, crt_db.[dbid]))+'.'+QUOTENAME(OBJECT_NAME(pl.objectid, crt_db.[dbid])) AS object_name,
        pl.query_plan,
        src.text AS source_code,
        pl.query_plan.exist('//RelOp[@PhysicalOp="Clustered Index Scan"]') AS [Has Clustered Index Scan],
        pl.query_plan.exist('//RelOp[@PhysicalOp="Index Scan"]') AS [Has Index Scan],
        pl.query_plan.exist('//RelOp[@PhysicalOp="Table Scan"]') AS [Has Table Scan],
        cp.size_in_bytes,
        cp.usecounts
FROM    sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) pl
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle) src
CROSS APPLY (
    SELECT    CONVERT(INT, plat.value) AS [dbid]
    FROM    sys.dm_exec_plan_attributes(cp.plan_handle) plat
    WHERE    plat.attribute = 'dbid'
) crt_db
WHERE   pl.query_plan.exist('//RelOp[
    @PhysicalOp = "Clustered Index Scan"
    or @PhysicalOp = "Index Scan"
    or @PhysicalOp = "Table Scan"
]')=1
ORDER BY cp.usecounts DESC;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

Results:

QueryCache

Revision History

2013-10-07 Update #1 (Changes: step 2 and 3)

2013-09-05

Misinterpretation of the missing indexes

When we analyze the execution plans (estimated: Ctrl+L; actual:Ctrl + M, F5) we could see sometimes information regarding the missing indexes thus:

ActualExecutionPlanMissingIndexes And when we open the contextual menu and then select the “Missing Index Details …” submenu

ActualExecutionPlanMissingIndexes#2 we get the script to create the index:

Code Snippet
/*
Missing Index Details from SQLQuery33.sql – (local)\SQL2008R2.AdventureWorks2008R2 (N4FG80D9FGDFJGO\sqlservr.exe (53))
The Query Processor estimates that implementing the following index could improve the query cost by 94.1312%.
*/

/*
USE [AdventureWorks2008R2]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [Sales].[SalesOrderHeader] ([OrderDate])
INCLUDE ([SalesOrderID],[DueDate])
GO
*/

Now, when we see the CREATE INDEX statement some of us would think that this is an imperative command: “you should create this index”. Wrong. In my opinion, this is the way to say what index could improve the performance of current query. For example, the performance of this query could be improved if you have an index with the following key (OrderDate) and having the following covering columns (SalesOrderID and DueDate). What we could do is to check if you have a similar index. In my example, we have the following index having the same key (OrderDate) with the missing index:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
GO

In this case, what we should do is to simply change the existing index by adding SalesOrderID and DueDate columns thus:

Code Snippet
CREATE NONCLUSTERED INDEX [IX_SalesOrderHeader_OrderDate]
ON [Sales].[SalesOrderHeader]
(
    [OrderDate] ASC
)
INCLUDE ([SalesOrderID],[DueDate])
WITH (DROP_EXISTING = ON);
GO
EXEC sp_rename  
    @objname = 'Sales.SalesOrderHeader.IX_SalesOrderHeader_OrderDate',
    @newname = 'IX_SalesOrderHeader_OrderDate#DueDate_SalesOrderID',
    @objtype = 'INDEX';
GO    

Finally, if we rerun the query the execution plan is changed and it includes the Index Seek operator instead of Clustered Index Scan:

ActualExecutionPlanMissingIndexes#3

2013-08-20

Seek vs. Scan (II)

Filed under: index,optimization,scan,seek,sql server,sql server 2008 r2 — ---- @ 7:45 PM

This blog post tries to explain the results of the first series of tests. To accomplish this goal I will do a new series of tests using (almost) the same methodology. The only difference is how I select the rows (164 rows or 1640 rows) used  for testing. Please remember that the clustered index of Sales.Customer table (PK_Customer_CustomerID) has 121 leaf pages and 19820 rows. This is the script used for this:

Code Snippet
– Test Series #2: Seek vs. Scan (II)
INSERT INTO dbo.[Rows](CustomerID)
SELECT x.CustomerID
FROM(
SELECT    c.CustomerID,(ROW_NUMBER() OVER(ORDER BY c.CustomerID)-1) % 121 AS RowNum
FROM    Sales.Customer c
) x
WHERE x.RowNum = 0; /* x.RowNum BETWEEN 0 AND 9; */

This time I got these results:

TestSeries#2_current

To compare both results in a simple manner I added the previous results:

TestSeries#1_previous

As you can see, the current  test series (#2) has some interesting facts:

  • Merge Join and Scan Merge Join operators give the best performance only from the point of view of logical reads.
  • However the best overall performance (Duration, CPU) is achieved by Nested Loops and Index Seek (which isn’t all surprising).
  • And, again, these series of tests show that [Logical] Reads shouldn’t be the only key performance indicator used when we evaluate different execution plans for the same query.

Why ?

Visually, the main difference between these two series of tests is as follows:

TestSeries#1vsTestSeries#2Test series #1 has to find 164 records which are physical stored in the “first” leaf page of PK_Customer_CustomerID clustered index (or it has to find 1640 records stored in the “first” 10 leaf pages of the same index). On the other hand test series #2 reads reads the same number of records (164 records, 1640 records) which are equally distributed in all those 121 leaf pages of PK_Customer_CustomerID clustered index. As you can see, this small detail (the physically distribution of records in the clustered index: PK_Customer_CustomerID) has a huge impact on the query performance.

The last question

The last question and the main question is (I think) why Merge Join + Index Scan (Query 3; all 164 records are stored in the “first” leaf page) give the best results for the Test series #1 ?

ActualExecutionPlansTestSeries#1Query3We would expect that Clustered Index Scan operator to read all records from all leaf pages:

ClusteredindexScan1

But, the output of SET STATISTICS IO ON shows something unexpected:

Code Snippet
Table ‘Customer’. Scan count 1, logical reads 3
Table ‘Rows’. Scan count 1, logical reads 2

As you can see, instead of showing 1 (root page) + 121 leaf pages + 1 page IAM =  the output of STATISTICS IO shows only 3 logical reads representing 1 root page + 1 leaf page + 1 IAM = 123 logical reads. This output shows that after reading 164 records from the “first” leaf page the Clustered Index Scan stops and doesn’t tries to read following leaf pages:

ClusteredindexScan2 From this point of view, the behavior of Scan operator in this context is partially similar to behavior of Index Seek – range scan :

Code Snippet
SELECT *
FROM Sales.Customer c
WHERE c.CustomerID <= 164
ORDER BY c.CustomerID;

The behavior of this Scan operator is also partially similar to behavior of Clustered Index Scan in the following query:

Code Snippet
SELECT TOP(164) *
FROM Sales.Customer c
ORDER BY c.CustomerID;

ActualExecutionPlansTopClusteredIndexScan

Code Snippet
– SET STATISTICS IO ON
Table ‘Customer’. Scan count 1, logical reads 2

Remarks

  1. A Scan operator is not always something “bad”. In some cases, Scan will give better results than Index Seek (at least than Index Seek – singleton seek).
  2. Not always Merge Join gives the best results even if there are proper indexes.
  3. Logical reads shouldn’t be the only query performance metric used to measure the query performance. You should look at CPU and, of course, Duration.

Note: To understand why I used all these table/index hints

Code Snippet
MyTable WITH(FORCESEEK(PK_Customer_CustomerID(CustomerID)))
– index_id 1 is the clustered index (PK_Customer_CustomerID)
MyTable WITH(INDEX(1),FORCESCAN)

please read this blog post.

2013-08-09

Seek vs. Scan (I)

Filed under: index,optimization,scan,seek,sql server — ---- @ 8:55 PM

Overview

There is an old debate in SQL Server community: is always Index Seek better than Index Scan ?

First, I would like to  explain what represents each of them. Visually, difference between them can be described thus:

SeekScan1 As you can see, Seek has two subtypes: singleton lookup and range scan. A singleton lookup is specific to an unique index and it searches a single key value starting from root page towards leaf pages. This means that a singleton lookup will touch at most one record from leaf pages.

Usually, a Seek of range scan subtype supposes searching key values from root page towards first leaf page (which includes the first or last key value in range) and then from that point it scans forward or backward the leaf pages. This means that a range scan could touch more records from leaf pages.

An Index Scan starts from root page and navigates towards first or last record (leaf page) and from that point it reads forward or backward all (usually) records within leaf pages.

What you have seen above is a simplified description for what is happens for a single execution. But these operators could have many executions. For example, an Index Seek – singleton lookup could search 100 key values:SeekCIProperties

Above picture shows two properties regarding to number of executions: the estimated value (1; which is generated at compile time before the query is executed) and the actual number of executions (100; which is generated after the query is executed). Because of that, it’s important the performance of the operator as a all. In this example,  because the index (PK_Customer_CustomerID) has two levels

Code Snippet
USE AdventureWorks2008R2;
GO
SELECT    f.page_count,f.index_level
FROM    sys.dm_db_index_physical_stats
(
    DB_ID(),
    OBJECT_ID(N'Sales.Customer'),
    1, – Clustered index
    NULL,
    'DETAILED'
) f;
/*
page_count index_level
———- ———–
121        0
1          1
*/

every singleton lookup generates two logical reads and the overall performance is 100 key values x 2 logical reads = 200 logical reads. But this value is greater than 1 + 121 logical reads + 1 logical read (IAM) = 123 logical reads which is the I/O performance of a simple Index Scan on the same index.

It’s just about logical reads ? No. What it matters is the overall performance (logical reads, CPU and duration) of current query and of every operator. To better understand I have done the following test.

Test

For this test I used AdventureWorks2008R2 database. I created a table named dbo.Rows

Code Snippet
CREATE TABLE dbo.[Rows](CustomerID INT NOT NULL PRIMARY KEY);

which is used to store 164 rows (for the first test) or 1640 rows (for the second test). Every row from dbo.Rows table is sought in Sales.Customer table (19820 rows) using these queries:

Code Snippet
– Query 1
SELECT @cnt=COUNT(*)
FROM dbo.[Rows] r
INNER JOIN Sales.Customer c WITH(FORCESEEK (PK_Customer_CustomerID(CustomerID)))
ON c.CustomerID=r.CustomerID;

– Query 2
SELECT @cnt=COUNT(*)
FROM dbo.[Rows] r
INNER HASH JOIN Sales.Customer c WITH(INDEX(1),FORCESCAN)
ON c.CustomerID=r.CustomerID;

– Query 3
SELECT @cnt=COUNT(*)
FROM dbo.[Rows] r
INNER MERGE JOIN Sales.Customer c WITH(INDEX(1),FORCESCAN)
ON c.CustomerID=r.CustomerID;

Execution plans:

ActualExecutionPlans

First execution plan (Query 1) includes a Seek operator on PK_Customer_CustomerID clustered index. Second execution plan (Query 2) includes an Index Scan on PK_Customer_CustomerID and a Hash Join and third execution plan (Query 3) includes an Index Scan on the same index and a Merge Join.

Test methodology

1. Insert into dbo.Rows 164 rows (the first 164 rows from Sales.Customer consumes almost one 8K page without B-Tree overhead). Second test inserts into dbo.Rows first 1640 rows.

2. Execute 100 interations for Query 1. Execute 100 interations for Query 2. Execute 100 interations for Query 3

3.  Execute 1000 interations for Query 1. Execute 1000 interations for Query 2. Execute 1000 interations for Query 3.

4. Execute 10000 interations for Query 1. Execute 10000 interations for Query 2. Execute 10000 interations for Query 3.

Download script

Note: 164 is the average number of rows in every leaf page of PK_Customer_CustomerID clustered index.

Results

Below are the results (SQL Profiler, SQL:BatchCompleted) :

Results

Remarks

Query 2 and Query 3 (Scan) have a better [Logical] Reads performance than Query 1 (Seek) but Query 2 (Scan + HASH JOIN) has a bigger CPU time and the Duration is also bigger than Query 1. So, when we have to optimize a query, it isn’t enough to optimize Logical Reads but we have to pay attention also at CPU time and Duration. On the other hand, Query 3 (Scan + Merge Join) seems to offer the best performance in a consistently manner. If I would say that for this test Scan is a better than Seek it would be misleading.

Instead, my remark is that for this test Scan + Merge Join is a better option than Seek + Nested Loops.

In following post I will try to explain these results.

Revision History

2013-08-18: Updated [Download script] link and the Results table (column Interations#)

2013-08-02

Choosing the right data type

Filed under: data type,index,indexed view,sql server — ---- @ 5:18 PM

Overview

Column’s data type influences the storage on disk (less important because of low price) and in memory (buffer pool) of table (clustered index or heap) but it affects, also, the storage of others objects, an aspect which is sometimes less obvious. Why is so important this aspect ? More storage space on disk means, also, many more 8K pages on disk and in memory (buffer pool), which means more LIO (logical reads) and more PIO (physical reads). Bellow diagram shows column’s dependencies on three levels: current object (table or indexed view), child tables (tables which have foreign key pointing towards current table.column) and indexed views. The main purpose of this diagram is to show how column’s type could influence the storage size of others objects (clustered and non-clustered indexes, child tables, indexed views).

Diagram01

Note #1: The expression “column is duplicated” means that the column’s values as a whole or as part (see full text indexes) are duplicated within another column (usually this means, also, another object) using the same data type or using a compatible data type (see full text indexes defined on varbinary(max) columns).

Note #2: Term “object” refers to a table or to an indexed view.

Note #3: All examples use AdventureWorks2008R2  database.

If an object (a table or an indexed view) has a clustered index then all non-clustered indexes will include the clustered index’s key.  An example is the clustered primary key SalesOrderID defined on Sales.SalesOrderHeader and a non-clustered index which is defined on SalesPersonID column. This means that following indexes are equivalent because SalesOrderID is clustered index’s key:

Code Snippet
CREATE INDEX ix1_v1
ON Sales.SalesOrderHeader(SalesPersonID);

CREATE INDEX ix1_v2
ON Sales.SalesOrderHeader(SalesPersonID)INCLUDE(SalesOrderID);

An UNIQUE or a PRIMARY KEY constraint will be enforced using an unique index but an unique index could exists without one of these constraints. Data type (including precision, scale, max. length) and collation for every column (child table) which is included in a foreign key constraint  and should be identical with those columns which are included in referenced unique index’s key (parent table).  An example:

Code Snippet
ALTER TABLE Purchasing.PurchaseOrderHeader
ADD CONSTRAINT FK_PurchaseOrderHeader_ShipMethod_ShipMethodID
FOREIGN KEY(ShipMethodID)
REFERENCES Purchasing.ShipMethod (ShipMethodID)

In this case ShipMethodID columns from [Purchasing].[PurchaseOrderHeader] (child table) and [Purchasing].[ShipMethod] (parent table) must have the same data type (which is INT).

Note #4: A foreign key constraint itself does not requires storage space except for metadata.

If the clustered index is table’s primary key then the clustered key will be duplicated in every primary and secondary XML (FOR PROPERTY, FOR VALUE, FOR PATH) index and, also, in every spatial index. This happens because these type of indexes requires an clustered primary key.

Full-text indexes requires a single column unique index:

Code Snippet
CREATE FULLTEXT INDEX ON table_name [({column_name…}[,...n])]
KEY INDEX index_name

If a regular column is part of index’s key (ex. MyColumn1) or is included as covered column (ex. MyColumn2)

Code Snippet
CREATE [UNIQUE] INDEX ix_name ON MySchema.MyTable(…MyColum1…)
INCLUDE [(...MyColumn2...)]

then the values of that column are “duplicated” within that index. 

A indexed view must have a clustered unique index but, also, it could have non-clustered indexes. These are other opportunities for SQL Server to “duplicate” the content of those columns.

Test

The following test shows how changing the data type for Purchasing.ShipMethod.ShipMethodID column (which is a clustered primary key) from INT to TINYINT, SMALLINT and DECIMAL(18,0) influences the size of others objects. This table has only five rows. The following objects are referenced by this column:

No. Object Rows
1 PK_ShipMethod_ShipMethodID 5
2 PK_PurchaseOrderHeader_PurchaseOrderID 4012
3 PK_SalesOrderHeader_SalesOrderID 31465
4 IX_SalesOrderHeader_ShipMethodID__OnlineOrderFlag_OrderDate* 31465
5 PK_vOrdersWithShipMethod_SalesOrderID* 31465
6 IX_vOrdersWithShipMethod_ShipMethodID_ShipDate* 31465

Note #5: * New objects.

Note #6: All PK_ objects are clustered unique indexes. All IX_ objects are non-clustered indexes.

Note #7: PurchaseOrderHeader and SalesOrderHeader tables include a ShipMethodID column (FK).

Test Methodology

For every data type (TINYINT, SMALLINT, INT, BIGINT and DECIMAL(18,0)) :

1. I deleted (DROP) some objects that reference ShipMethodID columns.

2. I changed the data type for the following columns: Purchasing.ShipMethod.ShipMethodID, Purchasing.PurchaseOrderHeader.ShipMethodID and Sales.SalesOrderHeader.ShipMethodID

3. I recreated objects.

4. Rebuilding indexes.

5. Page count.

Download script

Results

Results

Remark

Simply changing the data type of Purchasing.ShipMethod.ShipMethodID column from INT to TINYINT reduces the number of pages for all referenced objects in this test with 53 pages. This means that for 3 bytes (INT – TINYINT) x 5 rows = 15 bytes difference just for PK_ShipMethod_ShipMethodID clustered primary key the real impact of this change is 53 pages x 8K = 424 KB = 434176 bytes.

15 bytes vs. 434176 bytes

Revision History:

2013-08-18: Updated [Download script] link

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.