Yet another SQL Server enthusiast

2014-09-28

Simple solution to optimize bulk insert operations

Filed under: optimization,sql server,sql server 2012 — ---- @ 11:44 PM

What can be done to optimize (minimize execution time of) bulk insert operations of large *.csv files into a SQL Server database ? Here you may find an excellent article on this subject.

But there is also a simple thing you should consider: autogrowth events. These operations can slow down some operations executed on a SQL Server database (including bulk insert operations). This means that if we eliminate these auto growth events the execution time will decrease.

To test this hypothesis I’ve done two sets of tests using a *.csv file (4.366.563 rows) which has a size of 462 MB uncompressed:

  1. One test using default database options (for size and auto growth) and
  2. The other test use a database with an initial size of 1GB and a log file with an initial size of 500 MB. Because source file uses UTF-8 encoding the text columns are defined using NVARCHAR data type, for every char SQL Server uses two bytes (without data compression).

I’ve used following scripts:

  1. First test:

    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf')
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf')
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 3136              1024 KB
    2       TestBulkInsert_01_Log  1024                10 %
    */

  2. Second test:
    Code Snippet
    CREATE DATABASE TestBulkInsert
    ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf', SIZE = 2GB, FILEGROWTH = 500MB)
    LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf', SIZE = 100MB, FILEGROWTH = 100MB)
    GO
    ALTER DATABASE TestBulkInsert
    SET RECOVERY SIMPLE
    GO
    USE TestBulkInsert;
    GO
    SELECT    file_id,  name, size * 8 AS [size KB],
            CASE
                WHEN growth = 0 THEN 'fixed size'
                WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
                ELSE STR(growth * 8, 10, 0) + ' KB'
            END  AS growth_description
    FROM    sys.database_files
    GO
    /*
    file_id name                   size KB     growth_description
    ——- ———————- ———– ——————
    1       TestBulkInsert_01_Data 2097152         512000 KB
    2       TestBulkInsert_01_Log  102400          102400 KB
    */

Target table was created using the next script:

Code Snippet
CREATE TABLE [dbo].[currenttmp](
    [ColA] [bigint] NOT NULL PRIMARY KEY,
    [ColB] [nvarchar](20) NULL,
    [ColC] [nvarchar](4000) NULL
) ON [PRIMARY]
GO

To import data I’ve used an SSIS package:

ssis.dataflowtask

Bellow are my results:

  Test 1 Test 2 Difference
Average execution time (sec.) 326 222 104
Average execution time (mm:ss) 5:26 3:42 1:44
Autogrowth events 946 0 946

As you can see, simply eliminating auto growth events can reduce significantly the execution  time when importing large files. This is possible because autogrowth events are expensive.

Note: It’s worth reading about Instant File Initialization (not used in these tests).

2014-07-08

sp_DeadlockAnalyzer v01

Filed under: concurrency,deadlock,sp_DeadlockAnalyzer,sql server — ---- @ 7:09 AM

 

sp_DeadlockAnalyzer is a stored procedure created with one goal: to help DBAs to debug deadlocks in SQL Server. It has one parameter: @xdl which is the XML content from [deadlock graph event] intercepted with SQL Trace (SQL Profiler). It shows information about isolation level, blocked resources, connections and T-SQL statements with their execution plans. The stored procedure should be executed in the context of the original database (ex. SimpleDeadlock in my example).

Future plans:

  1. Information about execution plan operators (and corresponding T-SQL statements) which cause every lock (see section [blocked resources]).
  2. To correlate information about statement’s type (INSERT, UPDATE, DELETE, MERGE, SELECT), table hints (ex. XLOCK), isolation levels, foreign keys.
  3. To show information about how can be deadlock solved.
  4. To test using deadlock graph events generated by SQL Trace/SQL Profiler != SQL Server 2012 (I’ve used only this version for tests) and by Extended Events sessions (including here system_health; >= SQL Server 2008).

Output:

sp_DeadlockAnalyzer.Example

Source code:

Code Snippet
  1. USE master;
  2. GO
  3.  
  4. SET ANSI_NULLS ON;
  5. GO
  6. SET QUOTED_IDENTIFIER ON;
  7. GO
  8.  
  9. IF NOT EXISTS (
  10.     SELECT    *
  11.     FROM    sys.procedures p
  12.     JOIN    sys.schemas s ON p.schema_id = s.schema_id
  13.     WHERE    s.name = ‘dbo’
  14.     AND        p.name = ‘sp_DeadlockAnalyzer’
  15. )
  16. BEGIN
  17.     EXEC (‘CREATE PROCEDURE dbo.sp_DeadlockAnalyzer AS SELECT 1′);
  18. END
  19. GO
  20.  
  21. ALTER PROCEDURE dbo.sp_DeadlockAnalyzer
  22. (
  23.     @xdl XML – Deadlock graph event
  24. )
  25. AS
  26. BEGIN
  27.     DECLARE @Resources TABLE (
  28.         Id                            INT IDENTITY PRIMARY KEY,
  29.         Resource_LockGranularity    SYSNAME NOT NULL,
  30.         Resource_DatabaseId            INT NOT NULL,
  31.         Resource_HoBtID                BIGINT NOT NULL,
  32.         Resource_ObjectName            SYSNAME NULL,
  33.         Resource_IndexName            SYSNAME NULL,
  34.     
  35.         ProcessOwner_Id                SYSNAME NOT NULL,
  36.         ProcessOwner_LockMode        SYSNAME NOT NULL,
  37.  
  38.         ProcessWaiter_Id            SYSNAME NOT NULL,
  39.         ProcessWaiter_LockMode        SYSNAME NOT NULL
  40.     );
  41.  
  42.     INSERT    @Resources (
  43.         Resource_LockGranularity    ,
  44.         Resource_DatabaseId            ,
  45.         Resource_HoBtID                ,
  46.         Resource_ObjectName            ,
  47.         Resource_IndexName            ,
  48.     
  49.         ProcessOwner_Id                ,
  50.         ProcessOwner_LockMode        ,
  51.  
  52.         ProcessWaiter_Id            ,
  53.         ProcessWaiter_LockMode
  54.     )
  55.     SELECT    Resource_LockGranularity= x.XmlCol.value(‘local-name(.)’,‘SYSNAME’),
  56.             Resource_Database        = x.XmlCol.value(‘@dbid’,‘INT’),
  57.             Resource_HoBtID            = x.XmlCol.value(‘@hobtid’,‘BIGINT’),
  58.             Resource_ObjectName        = QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 3)) + ‘.’ + QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 2)) + ‘.’ + QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 1)),
  59.             Resource_IndexName        = QUOTENAME(x.XmlCol.value(‘@indexname’,‘SYSNAME’)),
  60.  
  61.             ProcessOwner_Id            = own.XmlCol.value(‘@id’, ‘SYSNAME’),
  62.             ProcessOwner_LockMode    = own.XmlCol.value(‘@mode’, ‘SYSNAME’),
  63.  
  64.             ProcessWaiter_Id        = wtr.XmlCol.value(‘@id’, ‘SYSNAME’),
  65.             ProcessWaiter_Mode        = wtr.XmlCol.value(‘@mode’, ‘SYSNAME’)
  66.     FROM    @xdl.nodes(‘deadlock-list/deadlock/resource-list/*’) x(XmlCol)
  67.     OUTER APPLY x.XmlCol.nodes(‘owner-list/owner’) own(XmlCol)
  68.     OUTER APPLY x.XmlCol.nodes(‘waiter-list/waiter’) wtr(XmlCol);
  69.  
  70.     – Information about databases
  71.     SELECT    [Object]        = db.name + ‘ (‘ + CONVERT(NVARCHAR(11), db.database_id) + ‘)’,
  72.             [Object_Type]    = ‘database’,
  73.             IsolationLevel    = ‘*snapshot_isolation_state ‘ + db.snapshot_isolation_state_desc
  74.     FROM    sys.databases db
  75.     WHERE    db.database_id IN (SELECT r.Resource_DatabaseId FROM @Resources r)
  76.     UNION ALL
  77.     SELECT    [Object]        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
  78.             [Object_Type]    = ‘process’,
  79.             IsolationLevel    = x.XmlCol.value(‘(@isolationlevel)[1]‘, ‘SYSNAME’)
  80.     FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
  81.  
  82.  
  83.     – Information about resources
  84.     SELECT    *
  85.     FROM    @Resources
  86.  
  87.     DECLARE @Processes TABLE (Id SYSNAME PRIMARY KEY);
  88.     INSERT    @Processes (Id)
  89.     SELECT    DISTINCT p.Id
  90.     FROM    @Resources r
  91.     CROSS APPLY (
  92.         SELECT    r.ProcessOwner_Id
  93.         UNION ALL
  94.         SELECT r.ProcessWaiter_Id
  95.     ) p (Id);
  96.  
  97.     – Information about serve processes / conections
  98.     SELECT    Process_Id        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
  99.             SPID            = x.XmlCol.value(‘(@spid)[1]‘, ‘INT’),
  100.             IsDeadlockVictim = x.XmlCol.exist(‘(.)[(@id)[1] = (../../@victim)[1]]’),
  101.  
  102.             LogUsed            = x.XmlCol.value(‘(@logused)[1]‘, ‘INT’),
  103.             TransactionName    = x.XmlCol.value(‘(@transactionname)[1]‘, ‘SYSNAME’),
  104.             IsolationLevel    = x.XmlCol.value(‘(@isolationlevel)[1]‘, ‘SYSNAME’),
  105.             DeadlockPriority= x.XmlCol.value(‘(@priority)[1]‘, ‘SMALLINT’),
  106.         
  107.             CurrentDatabase    =  DB_NAME(x.XmlCol.value(‘(@currentdb)[1]‘, ‘SMALLINT’)) + N’ (‘ + x.XmlCol.value(‘(@currentdb)[1]‘, ‘NVARCHAR(5)’) + N’)’,
  108.         
  109.             ClientApp        = x.XmlCol.value(‘(@clientapp)[1]‘, ‘SYSNAME’),
  110.             HostName        = x.XmlCol.value(‘(@hostname)[1]‘, ‘SYSNAME’),
  111.             LoginName        = x.XmlCol.value(‘(@loginname)[1]‘, ‘SYSNAME’),
  112.             InputBuffer        = x.XmlCol.query(‘inputbuf’)
  113.     FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
  114.  
  115.     – Statements & execution plan
  116.     DECLARE @Statements TABLE (
  117.         Process_Id            SYSNAME NOT NULL,
  118.         SPID                INT NOT NULL,
  119.         [Statement]            SYSNAME NOT NULL,   
  120.         [SqlHandle]            VARBINARY(64)   
  121.     );
  122.     INSERT    @Statements
  123.     SELECT    Process_Id        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
  124.             SPID            = x.XmlCol.value(‘(@spid)[1]‘, ‘INT’),
  125.             [Statement]        = y.XmlCol.value(‘(@procname)[1]‘, ‘SYSNAME’),
  126.             [SqlHandle]        = CONVERT(VARBINARY(64), y.XmlCol.value(‘(@sqlhandle)[1]‘, ‘VARCHAR(128)’), 1)
  127.     FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
  128.     CROSS APPLY x.XmlCol.nodes(‘executionStack/frame’) y(XmlCol)
  129.  
  130.     SELECT    stm.Process_Id, stm.SPID,
  131.             Batch            = stm.[Statement],
  132.             StatementText    = SUBSTRING(src.[text], ISNULL(NULLIF(sts.statement_start_offset, -1) / 2, 1),  ISNULL((NULLIF(sts.statement_end_offset, -1) - NULLIF(sts.statement_start_offset, -1)) / 2, 200)),
  133.             DatabasebId        = pln.dbid,
  134.             DatabaseName    = plnattr.query_db_name,
  135.             ObjectId        = pln.objectid,
  136.             ObjectName        = plnattr.query_object_name,
  137.             QueryPlan        = pln.query_plan,
  138.             SqlHandle        = stm.SqlHandle,
  139.             QueryPlanHandle    = sts.plan_handle
  140.     FROM    @Statements stm
  141.     LEFT JOIN sys.dm_exec_query_stats sts ON stm.SqlHandle = sts.sql_handle
  142.     OUTER APPLY sys.dm_exec_query_plan(sts.plan_handle) pln
  143.     OUTER APPLY (
  144.         SELECT    DB_NAME(pvt.dbid) AS query_db_name, OBJECT_NAME(pvt.objectid, pvt.dbid) query_object_name
  145.         FROM (
  146.             SELECT    fplanatt.attribute, CONVERT(INT, fplanatt.value) AS id
  147.             FROM    sys.dm_exec_plan_attributes(sts.plan_handle) fplanatt    – ref: http://msdn.microsoft.com/en-us/library/ms189472.aspx
  148.             WHERE    fplanatt.attribute IN (‘dbid’, ‘objectid’)
  149.         ) srcpvt
  150.         PIVOT( MAX(srcpvt.id) FOR srcpvt.attribute IN ([dbid], [objectid]) ) pvt
  151.     ) plnattr
  152.     OUTER APPLY sys.dm_exec_sql_text(stm.SqlHandle) src
  153. END;
  154. GO

History:

2014-06-10 Small revision of source code (SUBSTRING: ISNULL, NULLIF)

2014-06-29

Cursuri .Net / ASP.NET / MVC / C# / SQL Server

Filed under: .Net,ASP.NET,C#,MVC,sql server — ---- @ 11:34 PM

ADCES -  Asociatia pentru dezvoltare , creativitate și excelență în software organizează cursuri .Net / ASP.NET / MVC / C# și SQL Server.

2014-06-22

Common table expressions and the number of executions

Filed under: common table expression,CTE,optimization,sql server — ---- @ 10:53 PM

A common usage pattern for common table expressions till SQL Server 2012 is to get the previous row values using a combination of ROW_NUMBER, CTE and self-join thus:

Code Snippet
CREATE TABLE dbo.MyTable (
    MyID INT  PRIMARY KEY,
    Filler NCHAR(3500) NOT NULL
);
GO
INSERT  MyTable (MyID, Filler)
VALUES  (11, N'A'), (22, N'B'), (33, N'C'), (44, N'D'), (55, N'E');
GO

– Test #1
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

In this case, for every row is displayed the previous row values:

Code Snippet
– current row  – — previous row –
MyID Filler RowNum MyID Filler RowNum
—- —— —— —- —— ——
11   A      1      NULL NULL   NULL
22   B      2      11   A      1
33   C      3      22   B      2
44   D      4      33   C      3
55   E      5      44   D      4

Because a common table expression is a local view, the source code is expanded within the source of caller (ex. a INSERT, UPDATE, DELETE or a SELECT statement) and the execution plan for above SELECT statement looks like this:

CTE.Outer.Table.Inner.Table

As you can see from above execution plan, the common table expression (MyCTE) is expanded two times: crt and prev. If we want to see how many time is executed every “instance” of MyCTE then we can inspect the value of Number of Executions property for every operator. Bellow, you can see that for Number of Executions for Clustered Index Scan for crt is 1 and for prev is 5. Basically, in this example, the prev part is executed five times: one time for every row from crt part and overall the MyCTE is executed six times:

CTE.Execution.Plans.LOOP.HASH.MERGE.Join.#1 

It’s possible to change this behavior ? Yes, it’s possible and the answer (at least one answer) comes from changing the physical type of JOIN which in this case is LOOP / NESTED LOOPS.

I’ve changed the physical join from LOOP to HASH and MERGE JOIN and the result was that the prev part is executed, in my example, just one time and overall the MyCTE is executed two times:

Code Snippet
– Test #2
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT HASH JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

– Test #3
WITH MyCTE
AS (
    SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
    FROM    dbo.MyTable
)
SELECT  *
FROM    MyCTE crt
LEFT MERGE JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
GO

CTE.Execution.Plans.LOOP.HASH.MERGE.Join.#2&3

Another aspect is, also, interesting: in this small test, the number of logical reads is smaller for HASH and MERGE JOIN than for LOOP JOIN:

SQLTrace.SQLProfiler.Logical.Reads 

Note #1: because of join hints (LOOP, MERGE, HASH) the join order is enforced. Don’t use these hints without proper testing your queries. The usage of these hints without carefully analyzing the execution plans can lead to suboptimal plans.

Note #2: Starting with SQL Server 2012 you can use LAG function to get previous row’s values. See also LEAD function.

2014-06-12

SQL Server: deadlocks

Filed under: concurrency,conference,sql server — ---- @ 7:01 AM

I gave a presentation at .Net Bucharest software developers meeting about deadlocks (resources, locks, blocking, deadlocks, how to simulate a simple deadlock, deadlocks detection, *.xdl – XML deadlock graph, how to minimize deadlocks).

Files: Net_User_Group_Bucharest_2014-06-10.zip

2014-04-08

SQL Server: SARG-able predicates and Scan operators

Filed under: index,optimization,SARG,scan,seek,sql server — ---- @ 12:15 AM

I will give a presentation at .Net Bucharest software developers meeting about SARG-able predicates and those reasons that can cause scans even there are proper indexes. For every example I will present the problem and the solutions.

Files: Net_User_Group_Bucharest_2014-04-08.zip

 

2014-01-22

UPSERT – safe solutions for SQL Server [DRAFT]

Filed under: concurrency,EXISTS,MERGE,sql server,UPSERT — ---- @ 9:39 PM

This article tackles how to implement safe UPSERTs from concurrency point of view. A previous blog has presented a common solution for UPSERT and it has demonstrated that this solution isn’t safe. Current blog specifies common solutions for UPSERT operations with a single row.

To discuss these solutions I will use the same table dbo.Customer:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

If we want to insert a new customer then the identifier used for customers is the email address. If there is no email address stored within dbo.Customer then we need to insert a new customer with 1 order. Otherwise (the customer’s email already exists) we need to increment by 1 the OrdersCount column.

From this point of view we need also an unique index on Email column:

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This index has two important roles:

  1. It prevents duplicated emails and
  2. It helps those queries which tries to find customers using the email address as identifier (WHERE Email = constant / @variable).

Solutions for a single row

Code Snippet
– Solution #1: IF EXISTS
– Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = '[email protected]';
– End of Parameters
BEGIN TRANSACTION;
    IF EXISTS(SELECT * FROM dbo.Customer WITH(XLOCK, HOLDLOCK) WHERE Email = @Email)
    BEGIN
        UPDATE  dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE   Email = @Email;
    END
    ELSE
    BEGIN
        INSERT    dbo.Customer(Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END    
COMMIT; – Use ROLLBACK for testing
GO

– Solution #2: MERGE
– Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = '[email protected]';
– End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    (VALUES(@Email)) src(Email) ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (@Email);
COMMIT;
GO

– Solution #3: UPDATE + INSERT
– Parameters
DECLARE @Email NVARCHAR(100)
– End of Parameters
SET @Email = '[email protected]';
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    WHERE    trg.Email = @Email;

    IF @@ROWCOUNT = 0
        INSERT    dbo.Customer(Email)
        VALUES    (@Email);        
COMMIT;
GO

Solutions for multiple rows

Code Snippet
– Solution #1: EXISTS
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters
DECLARE @CustomerEmailForUPDATE TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);

BEGIN TRANSACTION;
    INSERT    @CustomerEmailForUPDATE (Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    EXISTS(SELECT * FROM dbo.Customer c WITH(XLOCK, HOLDLOCK) WHERE c.Email = ce.Email);

    UPDATE  dbo.Customer
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer c
    INNER JOIN @CustomerEmailForUPDATE cefu ON c.Email = cefu.Email
    
    INSERT  dbo.Customer(Email)
    SELECT    ce.Email – OrdersCount = Default 1
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailForUPDATE cefu WHERE cefu.Email = ce.Email)
COMMIT; – Use ROLLBACK for testing
GO

– Solution #2: MERGE
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    @CustomerEmail src ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (src.Email);
COMMIT; – Use ROLLBACK for testing
GO

– Solution #3: UPDATE + INSERT
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters

DECLARE @CustomerEmailUPDATED TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    OUTPUT    inserted.Email INTO @CustomerEmailUPDATED
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    INNER JOIN @CustomerEmail src ON trg.Email = src.Email;

    INSERT    dbo.Customer(Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailUPDATED upd WHERE upd.Email = ce.Email);
COMMIT;
GO

The next blog

  • will present solutions for scenarios where we need to insert or update multiple rows (2014-02-18) and
  • will discuss pros and cons of these solutions.

Revision History:

2014-02-08: First update. I have added a new section for multiple rows. Solutions for a single row: some minor changes.

2014-01-13

Conditional joins

Let’s start with following example (based on Supertypes and Subtypes modeling approach) which creates three tables:

  1. dbo.BankAccountOwner (BankAccountOwnerID-PK, OwnerType, CreateDate)
  2. dbo.Person (BankAccountOwnerID – PK, FK, FirstName, LastName)
  3. dbo.Company (BankAccountOwnerID – PK, FK, CompanyName)

T-SQL Script:

Code Snippet
CREATE TABLE dbo.BankAccountOwner (
    BankAccountOwnerID INT PRIMARY KEY,
    OwnerType CHAR(1) NOT NULL CHECK( OwnerType IN ('P', 'C') ), – P=Person, C=Company
    CreateDate DATETIME NOT NULL DEFAULT (GETDATE())
);
CREATE TABLE dbo.Person (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL
);
CREATE TABLE dbo.Company (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    CompanyName NVARCHAR(100) NOT NULL
);

INSERT    dbo.BankAccountOwner (BankAccountOwnerID, OwnerType)
VALUES    (1, 'P'), (2, 'P'), (3, 'C'), (4, 'C'), (5, 'C');
INSERT    dbo.Person (BankAccountOwnerID, FirstName, LastName)
VALUES    (1, N'John', N'Doe'), (2, N'Mary', N'Doe');
INSERT    dbo.Company (BankAccountOwnerID, CompanyName)
VALUES (3, N'MyComputer'), (4, N'Control Panel'), (5, N'Device Manager');
GO

Problem: how can we get FirstName, LastName and CompanyName values for the following owners: 1, 2, 3, 4, 5 ?

First solution:

Code Snippet
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

My solution:

Because

  1. dbo.Person table contains only rows with OwnerType = ‘P’ and dbo.Company table contains only rows with OwnerType = ‘C’ and
  2. SQL Server doesn’t knows this

I added these predicates to every left join thus:

Code Snippet
PRINT 'Test #2';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.OwnerType = 'P' AND bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.OwnerType = 'C' AND bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

These are the execution plans:

Conditional.Joins.Execution.Plans

First solution: as you can see from the properties of Index Seek operators SQL Server will try to find every bank account owner (1, 2, …, 5) within dbo.Person table and within dbo.Company table (Number of Execution  = 5).

Second solution: this time because SQL Server knows that dbo.Person table contains only OwnerType = ‘P’ rows it will read from dbo.Person table using an Index Seek (on Person.PK_Person_…) just two times (Number of Executions = 2) because there are only two persons among those five owners. This was possible because of Filter operator which will verify before executing Index Seek if the current owner is ‘P’:

Filter.Operator

Also, SQL Server will read from dbo.Company table using an Index Seek (on Company.PK_Company_…) three times (Number of Executions = 3) because there are three companies.

Because we have reduced the number of execution for Index Seek operators, this is means that we have reduced also the number of logical reads as we can see from the output of STATISTICS IO ON:

Code Snippet
Test #1
Table 'Company'. Scan count 0, logical reads 10
Table 'Person'. Scan count 0, logical reads 10
Table 'BankAccountOwner'. Scan count 5, logical reads 10
Test #2
Table 'Company'. Scan count 0, logical reads 6
Table 'Person'. Scan count 0, logical reads 4
Table 'BankAccountOwner'. Scan count 5, logical reads 10

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Filed under: concurrency,EXISTS,index,MERGE,safe,sql server,UPSERT — ---- @ 10:47 PM

Is it safe the following approach for UPSERT ?

Code Snippet
IF EXISTS (SELECT FROM MyTable WHERE ID = @param1)
    UPDATE    MyTable
    SET        Col1 = NewValue
    WHERE    ID = @param1
ELSE
    INSERT    MyTable(ID, Col1)
    VALUES    (@param1, @param2)

The short answer is no!

Example:

  1. The transaction isolation level is READ COMMITTED which is the default isolation level.
  2. For this demo I’ve used a customer table to store the customer’s email and the number of sales orders (for every customer).
  3. I’ve inserted two customers (1 – [email protected]; 2 – [email protected]).
  4. I’ve created a stored procedure for UPSERT (dbo.InsertOrder). This stored procedure receive the customer’s email, it checks if exists this email and if not then it inserts the new customer (sales orders 1). Otherwise it increments the number of sales orders by 1.

T-SQL script:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
IF OBJECT_ID(N'dbo.InsertOrder') IS NOT NULL
    DROP PROCEDURE dbo.InsertOrder;
GO

CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

INSERT    dbo.Customer (Email)
VALUES    (N'[email protected]'), – CustomerID 1
        (N'[email protected]'); – CustomerID 2
GO

CREATE PROCEDURE dbo.InsertOrder (
    @Email NVARCHAR(100)
)
AS
BEGIN
    IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
    BEGIN
        UPDATE    dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE    Email = @Email;
    END
    ELSE
    BEGIN
        WAITFOR DELAY '00:00:03';
        INSERT    dbo.Customer (Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END
END;
GO

Note: WAITFOR DELAY is used to delay the INSERTion of the new customers with just 3 seconds.

Now we can start this test executing following statements in separate windows:

Is.NOT.Safe.If.Exists.Update.Insert.UPSERT

Query window #1 (Step 1):

Code Snippet
BEGIN TRANSACTION
    UPDATE    dbo.Customer
    SET        Note = N'Note#2'
    WHERE    Email = N'[email protected]';
– ROLLBACK — or COMMIT

At this moment the CustomerID = 2 is locked X (exclusive) and no other transaction can read (excepting NOLOCK and READ UNCOMMITTED), update or delete this record. Because this transaction remains open (no COMMIT or ROLLBACK) this record will be locked till the end of transaction.

Query window #2 & #3 (Step 2 & Step 3):

Code Snippet
EXEC dbo.InsertOrder N'[email protected]';
SELECT * FROM dbo.Customer;

At this moment the IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email) statement will try to read the record with CustomerID = 1 and then the record with CustomerID = 2 requesting a S lock (shared). Because X and S locks aren’t compatible (see Lock Compatibility) both (Step 2 & 3) SELECT * FROM dbo.Customer WHERE Email = @Email statements will be blocked. Note: you can see the “Executing ….” message within tab title.

Query window #1 (Step 4):

I execute the ROLLBACK or COMMIT statement which ends the initial transaction. The X lock for CustomerID = 2 record is released and the SELECT statements (SELECT * FROM dbo.Customer WHERE Email = @Email; Step 2 & 3)  continue to be executed. Every EXISTS(SELECT …. WHERE Email = [email protected]) statement will return FALSE and both stored procedures will INSERT the customer `[email protected]`.

This is the reason why I get duplicated emails at the end (Step 5) of my example:

Results.Step5Solutions ?

Simplest solution is to create an unique index/constraint on Email column.

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This UNIQUE index/constraint guarantees that we can’t have duplicated emails. But this doesn’t means this UPSERT approach is 100% safe because dbo.InsertOrder stored procedure still tries to INSERT duplicated emails (at Step 5):

Results.Step5.With.UNIQUE.index

Next blog On Jan 27 I will discuss what solutions do we have for safe UPSERTs.

See also:

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #2

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #3

Revision History:

2014-01-12: I changed the last paragraph.

2014-01-27: I changed the last paragraph.

2014-04-06: I added the [See also] section

2014-01-02

Happy New Year!

Filed under: sql server — ---- @ 5:48 PM
Code Snippet
  1. SELECT (
  2. SELECT CHAR(h+i)
  3. FROM (
  4. SELECT    e.f.value('(@value)', 'TINYINT'), ROW_NUMBER() OVER(ORDER BY e.f DESC)
  5. FROM (VALUES (N'<row value="57"/><row value="83"/><row value="99"/><row value="100"/><row value="110"/><row value="22"/><row value="69"/><row value="93"/><row value="112"/><row value="26"/><row value="84"/><row value="97"/><row value="94"/><row value="112"/><row value="32"/>')) a(b)
  6. CROSS APPLY (VALUES (CONVERT(XML, a.b))) c(d)
  7. CROSS APPLY c.d.nodes('/row') e(f)
  8. ) g(h,i)
  9. ORDER BY i DESC
  10. FOR XML PATH(''), TYPE
  11. ).value('.', 'NVARCHAR(MAX)')

SQL Fiddle demo

Next Page »

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.