sql server | Yet another SQL Server enthusiast

2014-09-28

Simple solution to optimize bulk insert operations

Filed under: optimization,sql server,sql server 2012 — ---- @ 11:44 PM

What can be done to optimize (minimize execution time of) bulk insert operations of large *.csv files into a SQL Server database ? Here you may find an excellent article on this subject.

But there is also a simple thing you should consider: autogrowth events. These operations can slow down some operations executed on a SQL Server database (including bulk insert operations). This means that if we eliminate these auto growth events the execution time will decrease.

To test this hypothesis I’ve done two sets of tests using a *.csv file (4.366.563 rows) which has a size of 462 MB uncompressed:

One test using default database options (for size and auto growth) and
The other test use a database with an initial size of 1GB and a log file with an initial size of 500 MB. Because source file uses UTF-8 encoding the text columns are defined using NVARCHAR data type, for every char SQL Server uses two bytes (without data compression).

I’ve used following scripts:

First test:

Code Snippet

CREATE DATABASE TestBulkInsert
ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf')
LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf')
GO
ALTER DATABASE TestBulkInsert
SET RECOVERY SIMPLE
GO
USE TestBulkInsert;
GO
SELECT    file_id,  name, size * 8 AS [size KB],
        CASE
            WHEN growth = 0 THEN 'fixed size'
            WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
            ELSE STR(growth * 8, 10, 0) + ' KB'
        END  AS growth_description
FROM    sys.database_files
GO
/*
file_id name                   size KB     growth_description
——- ———————- ———– ——————
1       TestBulkInsert_01_Data 3136              1024 KB
2       TestBulkInsert_01_Log  1024                10 %
*/
Second test:

Code Snippet

CREATE DATABASE TestBulkInsert
ON PRIMARY  (NAME = 'TestBulkInsert_01_Data', FILENAME = 'E:\BD\TestBulkInsert.mdf', SIZE = 2GB, FILEGROWTH = 500MB)
LOG ON (NAME = 'TestBulkInsert_01_Log', FILENAME = 'E:\BD\TestBulkInsert.ldf', SIZE = 100MB, FILEGROWTH = 100MB)
GO
ALTER DATABASE TestBulkInsert
SET RECOVERY SIMPLE
GO
USE TestBulkInsert;
GO
SELECT    file_id,  name, size * 8 AS [size KB],
        CASE
            WHEN growth = 0 THEN 'fixed size'
            WHEN is_percent_growth = 1 THEN STR(growth, 10, 0) + ' %'
            ELSE STR(growth * 8, 10, 0) + ' KB'
        END  AS growth_description
FROM    sys.database_files
GO
/*
file_id name                   size KB     growth_description
——- ———————- ———– ——————
1       TestBulkInsert_01_Data 2097152         512000 KB
2       TestBulkInsert_01_Log  102400          102400 KB
*/

Target table was created using the next script:

Code Snippet
CREATE TABLE [dbo].[currenttmp](
     [ColA] [bigint] NOT NULL PRIMARY KEY,
     [ColB] [nvarchar](20) NULL,
     [ColC] [nvarchar](4000) NULL
 ) ON [PRIMARY]
 GO

To import data I’ve used an SSIS package:

Bellow are my results:

	Test 1	Test 2	Difference
Average execution time (sec.)	326	222	104
Average execution time (mm:ss)	5:26	3:42	1:44
Autogrowth events	946	0	946

As you can see, simply eliminating auto growth events can reduce significantly the execution time when importing large files. This is possible because autogrowth events are expensive.

Note: It’s worth reading about Instant File Initialization (not used in these tests).

Leave a Comment

2014-07-08

sp_DeadlockAnalyzer v01

Filed under: concurrency,deadlock,sp_DeadlockAnalyzer,sql server — ---- @ 7:09 AM

sp_DeadlockAnalyzer is a stored procedure created with one goal: to help DBAs to debug deadlocks in SQL Server. It has one parameter: @xdl which is the XML content from [deadlock graph event] intercepted with SQL Trace (SQL Profiler). It shows information about isolation level, blocked resources, connections and T-SQL statements with their execution plans. The stored procedure should be executed in the context of the original database (ex. SimpleDeadlock in my example).

Future plans:

Information about execution plan operators (and corresponding T-SQL statements) which cause every lock (see section [blocked resources]).
To correlate information about statement’s type (INSERT, UPDATE, DELETE, MERGE, SELECT), table hints (ex. XLOCK), isolation levels, foreign keys.
To show information about how can be deadlock solved.
To test using deadlock graph events generated by SQL Trace/SQL Profiler != SQL Server 2012 (I’ve used only this version for tests) and by Extended Events sessions (including here system_health; >= SQL Server 2008).

Output:

Source code:

Code Snippet

USE master;
GO
 
SET ANSI_NULLS ON;
GO
SET QUOTED_IDENTIFIER ON;
GO
 
IF NOT EXISTS (
    SELECT    *
    FROM    sys.procedures p
    JOIN    sys.schemas s ON p.schema_id = s.schema_id
    WHERE    s.name = ‘dbo’
    AND        p.name = ‘sp_DeadlockAnalyzer’
)
BEGIN
    EXEC (‘CREATE PROCEDURE dbo.sp_DeadlockAnalyzer AS SELECT 1′);
END
GO
 
ALTER PROCEDURE dbo.sp_DeadlockAnalyzer
(
    @xdl XML – Deadlock graph event
)
AS
BEGIN
    DECLARE @Resources TABLE (
        Id                            INT IDENTITY PRIMARY KEY,
        Resource_LockGranularity    SYSNAME NOT NULL,
        Resource_DatabaseId            INT NOT NULL,
        Resource_HoBtID                BIGINT NOT NULL,
        Resource_ObjectName            SYSNAME NULL,
        Resource_IndexName            SYSNAME NULL,
    
        ProcessOwner_Id                SYSNAME NOT NULL,
        ProcessOwner_LockMode        SYSNAME NOT NULL,
 
        ProcessWaiter_Id            SYSNAME NOT NULL,
        ProcessWaiter_LockMode        SYSNAME NOT NULL
    );
 
    INSERT    @Resources (
        Resource_LockGranularity    ,
        Resource_DatabaseId            ,
        Resource_HoBtID                ,
        Resource_ObjectName            ,
        Resource_IndexName            ,
    
        ProcessOwner_Id                ,
        ProcessOwner_LockMode        ,
 
        ProcessWaiter_Id            ,
        ProcessWaiter_LockMode
    )
    SELECT    Resource_LockGranularity= x.XmlCol.value(‘local-name(.)’,‘SYSNAME’),
            Resource_Database        = x.XmlCol.value(‘@dbid’,‘INT’),
            Resource_HoBtID            = x.XmlCol.value(‘@hobtid’,‘BIGINT’),
            Resource_ObjectName        = QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 3)) + ‘.’ + QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 2)) + ‘.’ + QUOTENAME(PARSENAME(x.XmlCol.value(‘@objectname’,‘SYSNAME’), 1)),
            Resource_IndexName        = QUOTENAME(x.XmlCol.value(‘@indexname’,‘SYSNAME’)),
 
            ProcessOwner_Id            = own.XmlCol.value(‘@id’, ‘SYSNAME’),
            ProcessOwner_LockMode    = own.XmlCol.value(‘@mode’, ‘SYSNAME’),
 
            ProcessWaiter_Id        = wtr.XmlCol.value(‘@id’, ‘SYSNAME’),
            ProcessWaiter_Mode        = wtr.XmlCol.value(‘@mode’, ‘SYSNAME’)
    FROM    @xdl.nodes(‘deadlock-list/deadlock/resource-list/*’) x(XmlCol)
    OUTER APPLY x.XmlCol.nodes(‘owner-list/owner’) own(XmlCol)
    OUTER APPLY x.XmlCol.nodes(‘waiter-list/waiter’) wtr(XmlCol);
 
    – Information about databases
    SELECT    [Object]        = db.name + ‘ (‘ + CONVERT(NVARCHAR(11), db.database_id) + ‘)’,
            [Object_Type]    = ‘database’,
            IsolationLevel    = ‘*snapshot_isolation_state ‘ + db.snapshot_isolation_state_desc
    FROM    sys.databases db
    WHERE    db.database_id IN (SELECT r.Resource_DatabaseId FROM @Resources r)
    UNION ALL
    SELECT    [Object]        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
            [Object_Type]    = ‘process’,
            IsolationLevel    = x.XmlCol.value(‘(@isolationlevel)[1]‘, ‘SYSNAME’)
    FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
 
 
    – Information about resources
    SELECT    *
    FROM    @Resources
 
    DECLARE @Processes TABLE (Id SYSNAME PRIMARY KEY);
    INSERT    @Processes (Id)
    SELECT    DISTINCT p.Id
    FROM    @Resources r
    CROSS APPLY (
        SELECT    r.ProcessOwner_Id
        UNION ALL
        SELECT r.ProcessWaiter_Id
    ) p (Id);
 
    – Information about serve processes / conections
    SELECT    Process_Id        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
            SPID            = x.XmlCol.value(‘(@spid)[1]‘, ‘INT’),
            IsDeadlockVictim = x.XmlCol.exist(‘(.)[(@id)[1] = (../../@victim)[1]]’),
 
            LogUsed            = x.XmlCol.value(‘(@logused)[1]‘, ‘INT’),
            TransactionName    = x.XmlCol.value(‘(@transactionname)[1]‘, ‘SYSNAME’),
            IsolationLevel    = x.XmlCol.value(‘(@isolationlevel)[1]‘, ‘SYSNAME’),
            DeadlockPriority= x.XmlCol.value(‘(@priority)[1]‘, ‘SMALLINT’),
        
            CurrentDatabase    =  DB_NAME(x.XmlCol.value(‘(@currentdb)[1]‘, ‘SMALLINT’)) + N’ (‘ + x.XmlCol.value(‘(@currentdb)[1]‘, ‘NVARCHAR(5)’) + N’)’,
        
            ClientApp        = x.XmlCol.value(‘(@clientapp)[1]‘, ‘SYSNAME’),
            HostName        = x.XmlCol.value(‘(@hostname)[1]‘, ‘SYSNAME’),
            LoginName        = x.XmlCol.value(‘(@loginname)[1]‘, ‘SYSNAME’),
            InputBuffer        = x.XmlCol.query(‘inputbuf’)
    FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
 
    – Statements & execution plan
    DECLARE @Statements TABLE (
        Process_Id            SYSNAME NOT NULL,
        SPID                INT NOT NULL,
        [Statement]            SYSNAME NOT NULL,    
        [SqlHandle]            VARBINARY(64)    
    );
    INSERT    @Statements
    SELECT    Process_Id        = x.XmlCol.value(‘(@id)[1]‘, ‘SYSNAME’),
            SPID            = x.XmlCol.value(‘(@spid)[1]‘, ‘INT’),
            [Statement]        = y.XmlCol.value(‘(@procname)[1]‘, ‘SYSNAME’),
            [SqlHandle]        = CONVERT(VARBINARY(64), y.XmlCol.value(‘(@sqlhandle)[1]‘, ‘VARCHAR(128)’), 1)
    FROM    @xdl.nodes(‘deadlock-list/deadlock/process-list/process’) x(XmlCol)
    CROSS APPLY x.XmlCol.nodes(‘executionStack/frame’) y(XmlCol)
 
    SELECT    stm.Process_Id, stm.SPID,
            Batch            = stm.[Statement],
            StatementText    = SUBSTRING(src.[text], ISNULL(NULLIF(sts.statement_start_offset, -1) / 2, 1),  ISNULL((NULLIF(sts.statement_end_offset, -1) - NULLIF(sts.statement_start_offset, -1)) / 2, 200)),
            DatabasebId        = pln.dbid,
            DatabaseName    = plnattr.query_db_name,
            ObjectId        = pln.objectid,
            ObjectName        = plnattr.query_object_name,
            QueryPlan        = pln.query_plan,
            SqlHandle        = stm.SqlHandle,
            QueryPlanHandle    = sts.plan_handle
    FROM    @Statements stm
    LEFT JOIN sys.dm_exec_query_stats sts ON stm.SqlHandle = sts.sql_handle
    OUTER APPLY sys.dm_exec_query_plan(sts.plan_handle) pln
    OUTER APPLY (
        SELECT    DB_NAME(pvt.dbid) AS query_db_name, OBJECT_NAME(pvt.objectid, pvt.dbid) query_object_name
        FROM (
            SELECT    fplanatt.attribute, CONVERT(INT, fplanatt.value) AS id
            FROM    sys.dm_exec_plan_attributes(sts.plan_handle) fplanatt    – ref: http://msdn.microsoft.com/en-us/library/ms189472.aspx
            WHERE    fplanatt.attribute IN (‘dbid’, ‘objectid’)
        ) srcpvt
        PIVOT( MAX(srcpvt.id) FOR srcpvt.attribute IN ([dbid], [objectid]) ) pvt
    ) plnattr
    OUTER APPLY sys.dm_exec_sql_text(stm.SqlHandle) src
END;
GO 

History:

2014-06-10 Small revision of source code (SUBSTRING: ISNULL, NULLIF)

Leave a Comment

2014-06-29

Cursuri .Net / ASP.NET / MVC / C# / SQL Server

Filed under: .Net,ASP.NET,C#,MVC,sql server — ---- @ 11:34 PM

ADCES - Asociatia pentru dezvoltare , creativitate și excelență în software organizează cursuri .Net / ASP.NET / MVC / C# și SQL Server.

Leave a Comment

2014-06-22

Common table expressions and the number of executions

Filed under: common table expression,CTE,optimization,sql server — ---- @ 10:53 PM

A common usage pattern for common table expressions till SQL Server 2012 is to get the previous row values using a combination of ROW_NUMBER, CTE and self-join thus:

Code Snippet
CREATETABLEdbo.MyTable (
MyIDINTPRIMARYKEY,
FillerNCHAR(3500)NOTNULL
);
GO
INSERTMyTable (MyID,Filler)
VALUES  (11,N'A'), (22,N'B'), (33,N'C'), (44,N'D'), (55,N'E');
GO
 – Test #1
 WITH MyCTE
 AS (
     SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
     FROM    dbo.MyTable
 )
 SELECT  *
 FROM    MyCTE crt
 LEFT JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
 GO

In this case, for every row is displayed the previous row values:

Code Snippet
– current row  – — previous row –
 MyID Filler RowNum MyID Filler RowNum
 —- —— —— —- —— ——
 11   A      1      NULL NULL   NULL
 22   B      2      11   A      1
 33   C      3      22   B      2
 44   D      4      33   C      3
 55   E      5      44   D      4

Because a common table expression is a local view, the source code is expanded within the source of caller (ex. a INSERT, UPDATE, DELETE or a SELECT statement) and the execution plan for above SELECT statement looks like this:

As you can see from above execution plan, the common table expression (MyCTE) is expanded two times: crt and prev. If we want to see how many time is executed every “instance” of MyCTE then we can inspect the value of Number of Executions property for every operator. Bellow, you can see that for Number of Executions for Clustered Index Scan for crt is 1 and for prev is 5. Basically, in this example, the prev part is executed five times: one time for every row from crt part and overall the MyCTE is executed six times:

It’s possible to change this behavior ? Yes, it’s possible and the answer (at least one answer) comes from changing the physical type of JOIN which in this case is LOOP / NESTED LOOPS.

I’ve changed the physical join from LOOP to HASH and MERGE JOIN and the result was that the prev part is executed, in my example, just one time and overall the MyCTE is executed two times:

Code Snippet
– Test #2
WITHMyCTE
AS (
SELECT*,ROW_NUMBER()OVER(ORDERBYMyID)ASRowNum
FROMdbo.MyTable
)
SELECT*
FROMMyCTEcrt
LEFTHASHJOINMyCTEprevONcrt.RowNum=prev.RowNum+1;
GO
 – Test #3
 WITH MyCTE
 AS (
     SELECT  *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
     FROM    dbo.MyTable
 )
 SELECT  *
 FROM    MyCTE crt
 LEFT MERGE JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
 GO

Another aspect is, also, interesting: in this small test, the number of logical reads is smaller for HASH and MERGE JOIN than for LOOP JOIN:

Note #1: because of join hints (LOOP, MERGE, HASH) the join order is enforced. Don’t use these hints without proper testing your queries. The usage of these hints without carefully analyzing the execution plans can lead to suboptimal plans.

Note #2: Starting with SQL Server 2012 you can use LAG function to get previous row’s values. See also LEAD function.

Leave a Comment

2014-06-12

SQL Server: deadlocks

Filed under: concurrency,conference,sql server — ---- @ 7:01 AM

I gave a presentation at .Net Bucharest software developers meeting about deadlocks (resources, locks, blocking, deadlocks, how to simulate a simple deadlock, deadlocks detection, *.xdl – XML deadlock graph, how to minimize deadlocks).

Files: Net_User_Group_Bucharest_2014-06-10.zip

Leave a Comment

2014-04-08

SQL Server: SARG-able predicates and Scan operators

Filed under: index,optimization,SARG,scan,seek,sql server — ---- @ 12:15 AM

I will give a presentation at .Net Bucharest software developers meeting about SARG-able predicates and those reasons that can cause scans even there are proper indexes. For every example I will present the problem and the solutions.

Files: Net_User_Group_Bucharest_2014-04-08.zip

Comments (2)

2014-01-22

UPSERT – safe solutions for SQL Server [DRAFT]

Filed under: concurrency,EXISTS,MERGE,sql server,UPSERT — ---- @ 9:39 PM

This article tackles how to implement safe UPSERTs from concurrency point of view. A previous blog has presented a common solution for UPSERT and it has demonstrated that this solution isn’t safe. Current blog specifies common solutions for UPSERT operations with a single row.

To discuss these solutions I will use the same table dbo.Customer:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
     DROP TABLE dbo.Customer;
 CREATE TABLE dbo.Customer (
     CustomerID INT IDENTITY(1,1) PRIMARY KEY,
     Email NVARCHAR(100) NOT NULL,
     Note NVARCHAR(50) NULL,    
     OrdersCount INT NOT NULL DEFAULT (1)
 );
 GO

If we want to insert a new customer then the identifier used for customers is the email address. If there is no email address stored within dbo.Customer then we need to insert a new customer with 1 order. Otherwise (the customer’s email already exists) we need to increment by 1 the OrdersCount column.

From this point of view we need also an unique index on Email column:

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
 ON dbo.Customer(Email);

This index has two important roles:

It prevents duplicated emails and
It helps those queries which tries to find customers using the email address as identifier (WHERE Email = constant / @variable).

Solutions for a single row

Code Snippet
– Solution #1: IF EXISTS
– Parameters
DECLARE@EmailNVARCHAR(100)
SET@Email='[email protected]';
– End of Parameters
BEGINTRANSACTION;
IFEXISTS(SELECT*FROMdbo.CustomerWITH(XLOCK,HOLDLOCK)WHEREEmail=@Email)
BEGIN
UPDATEdbo.Customer
SET        OrdersCount=OrdersCount+ 1
WHEREEmail=@Email;
END
ELSE
BEGIN
INSERTdbo.Customer(Email)
VALUES    (@Email)– OrdersCount = Default 1
END    
COMMIT;– Use ROLLBACK for testing
GO
 – Solution #2: MERGE
 – Parameters
 DECLARE @Email NVARCHAR(100)
 SET @Email = '[email protected]';
 – End of Parameters
 BEGIN TRANSACTION;
     MERGE    dbo.Customer WITH(HOLDLOCK) trg 
     USING    (VALUES(@Email)) src(Email) ON trg.Email = src.Email
     WHEN MATCHED 
         THEN    
         UPDATE 
         SET        OrdersCount = trg.OrdersCount + 1
     WHEN NOT MATCHED BY TARGET 
         THEN    
         INSERT (Email)
         VALUES    (@Email);
 COMMIT;
 GO
 – Solution #3: UPDATE + INSERT
 – Parameters
 DECLARE @Email NVARCHAR(100)
 – End of Parameters
 SET @Email = '[email protected]';
 BEGIN TRANSACTION;
     UPDATE    trg
     SET        OrdersCount = OrdersCount + 1
     FROM    dbo.Customer trg WITH(HOLDLOCK) 
     WHERE    trg.Email = @Email;
     IF @@ROWCOUNT = 0
         INSERT    dbo.Customer(Email)
         VALUES    (@Email);        
 COMMIT;
 GO

Solutions for multiple rows

Code Snippet
– Solution #1: EXISTS
– Parameters
DECLARE@CustomerEmailTABLE(EmailNVARCHAR(100)NOTNULLPRIMARYKEY);
INSERT    @CustomerEmail (Email)VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email)VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email)VALUES (N'[email protected]');
– End of Parameters
DECLARE@CustomerEmailForUPDATETABLE(
EmailNVARCHAR(100)NOTNULLPRIMARYKEY
);
 BEGIN TRANSACTION;
     INSERT    @CustomerEmailForUPDATE (Email)
     SELECT    ce.Email
     FROM    @CustomerEmail ce
     WHERE    EXISTS(SELECT * FROM dbo.Customer c WITH(XLOCK, HOLDLOCK) WHERE c.Email = ce.Email);
     UPDATE  dbo.Customer
     SET        OrdersCount = OrdersCount + 1
     FROM    dbo.Customer c 
     INNER JOIN @CustomerEmailForUPDATE cefu ON c.Email = cefu.Email
     
     INSERT  dbo.Customer(Email)
     SELECT    ce.Email – OrdersCount = Default 1
     FROM    @CustomerEmail ce
     WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailForUPDATE cefu WHERE cefu.Email = ce.Email)
 COMMIT; – Use ROLLBACK for testing
 GO
 – Solution #2: MERGE
 – Parameters
 DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 – End of Parameters
 BEGIN TRANSACTION;
     MERGE    dbo.Customer WITH(HOLDLOCK) trg 
     USING    @CustomerEmail src ON trg.Email = src.Email
     WHEN MATCHED 
         THEN    
         UPDATE 
         SET        OrdersCount = trg.OrdersCount + 1
     WHEN NOT MATCHED BY TARGET 
         THEN    
         INSERT (Email)
         VALUES    (src.Email);
 COMMIT; – Use ROLLBACK for testing
 GO
 – Solution #3: UPDATE + INSERT
 – Parameters
 DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
 – End of Parameters
 DECLARE @CustomerEmailUPDATED TABLE( 
     Email NVARCHAR(100) NOT NULL PRIMARY KEY
 );
 BEGIN TRANSACTION;
     UPDATE    trg
     SET        OrdersCount = OrdersCount + 1
     OUTPUT    inserted.Email INTO @CustomerEmailUPDATED
     FROM    dbo.Customer trg WITH(HOLDLOCK) 
     INNER JOIN @CustomerEmail src ON trg.Email = src.Email;
     INSERT    dbo.Customer(Email)
     SELECT    ce.Email
     FROM    @CustomerEmail ce
     WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailUPDATED upd WHERE upd.Email = ce.Email);
 COMMIT;
 GO

The next blog

will present solutions for scenarios where we need to insert or update multiple rows (2014-02-18) and
will discuss pros and cons of these solutions.

Revision History:

2014-02-08: First update. I have added a new section for multiple rows. Solutions for a single row: some minor changes.

Leave a Comment

2014-01-13

Conditional joins

Filed under: conditional,execution plan,join,optimization,predicate,sql server — ---- @ 12:36 AM

Let’s start with following example (based on Supertypes and Subtypes modeling approach) which creates three tables:

dbo.BankAccountOwner (BankAccountOwnerID-PK, OwnerType, CreateDate)
dbo.Person (BankAccountOwnerID – PK, FK, FirstName, LastName)
dbo.Company (BankAccountOwnerID – PK, FK, CompanyName)

T-SQL Script:

Code Snippet
CREATETABLEdbo.BankAccountOwner (
BankAccountOwnerIDINTPRIMARYKEY,
OwnerTypeCHAR(1)NOTNULLCHECK(OwnerTypeIN ('P','C')),– P=Person, C=Company
CreateDateDATETIMENOTNULLDEFAULT (GETDATE())
);
CREATETABLEdbo.Person (
BankAccountOwnerIDINTPRIMARYKEY
REFERENCESdbo.BankAccountOwner(BankAccountOwnerID),
FirstNameNVARCHAR(50)NOTNULL,
LastNameNVARCHAR(50)NOTNULL
);
CREATETABLEdbo.Company (
BankAccountOwnerIDINTPRIMARYKEY
REFERENCESdbo.BankAccountOwner(BankAccountOwnerID),
CompanyNameNVARCHAR(100)NOTNULL
);
 INSERT    dbo.BankAccountOwner (BankAccountOwnerID, OwnerType) 
 VALUES    (1, 'P'), (2, 'P'), (3, 'C'), (4, 'C'), (5, 'C');
 INSERT    dbo.Person (BankAccountOwnerID, FirstName, LastName) 
 VALUES    (1, N'John', N'Doe'), (2, N'Mary', N'Doe');
 INSERT    dbo.Company (BankAccountOwnerID, CompanyName) 
 VALUES (3, N'MyComputer'), (4, N'Control Panel'), (5, N'Device Manager');
 GO

Problem: how can we get FirstName, LastName and CompanyName values for the following owners: 1, 2, 3, 4, 5 ?

First solution:

Code Snippet
SET NOCOUNT ON;
 SET STATISTICS IO ON;
 PRINT 'Test #1';
 SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
 FROM    dbo.BankAccountOwner bao
 LEFT JOIN dbo.Person p ON bao.BankAccountOwnerID = p.BankAccountOwnerID
 LEFT JOIN dbo.Company c ON bao.BankAccountOwnerID = c.BankAccountOwnerID
 WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

My solution:

Because

dbo.Person table contains only rows with OwnerType = ‘P’ and dbo.Company table contains only rows with OwnerType = ‘C’ and
SQL Server doesn’t knows this

I added these predicates to every left join thus:

Code Snippet
PRINT 'Test #2';
 SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
 FROM    dbo.BankAccountOwner bao
 LEFT JOIN dbo.Person p ON bao.OwnerType = 'P' AND bao.BankAccountOwnerID = p.BankAccountOwnerID
 LEFT JOIN dbo.Company c ON bao.OwnerType = 'C' AND bao.BankAccountOwnerID = c.BankAccountOwnerID
 WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

These are the execution plans:

First solution: as you can see from the properties of Index Seek operators SQL Server will try to find every bank account owner (1, 2, …, 5) within dbo.Person table and within dbo.Company table (Number of Execution = 5).

Second solution: this time because SQL Server knows that dbo.Person table contains only OwnerType = ‘P’ rows it will read from dbo.Person table using an Index Seek (on Person.PK_Person_…) just two times (Number of Executions = 2) because there are only two persons among those five owners. This was possible because of Filter operator which will verify before executing Index Seek if the current owner is ‘P’:

Also, SQL Server will read from dbo.Company table using an Index Seek (on Company.PK_Company_…) three times (Number of Executions = 3) because there are three companies.

Because we have reduced the number of execution for Index Seek operators, this is means that we have reduced also the number of logical reads as we can see from the output of STATISTICS IO ON:

Code Snippet
Test #1
 Table 'Company'. Scan count 0, logical reads 10
 Table 'Person'. Scan count 0, logical reads 10
 Table 'BankAccountOwner'. Scan count 5, logical reads 10
 Test #2
 Table 'Company'. Scan count 0, logical reads 6
 Table 'Person'. Scan count 0, logical reads 4
 Table 'BankAccountOwner'. Scan count 5, logical reads 10

Leave a Comment

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Filed under: concurrency,EXISTS,index,MERGE,safe,sql server,UPSERT — ---- @ 10:47 PM

Is it safe the following approach for UPSERT ?

Code Snippet
IF EXISTS (SELECT … FROM MyTable WHERE ID = @param1)
     UPDATE    MyTable 
     SET        Col1 = NewValue 
     WHERE    ID = @param1
 ELSE
     INSERT    MyTable(ID, Col1)
     VALUES    (@param1, @param2)

The short answer is no!

Example:

The transaction isolation level is READ COMMITTED which is the default isolation level.
For this demo I’ve used a customer table to store the customer’s email and the number of sales orders (for every customer).
I’ve inserted two customers (1 – [email protected]; 2 – [email protected]).
I’ve created a stored procedure for UPSERT (dbo.InsertOrder). This stored procedure receive the customer’s email, it checks if exists this email and if not then it inserts the new customer (sales orders 1). Otherwise it increments the number of sales orders by 1.

T-SQL script:

Code Snippet
IFOBJECT_ID(N'dbo.Customer')ISNOTNULL
DROPTABLEdbo.Customer;
IFOBJECT_ID(N'dbo.InsertOrder')ISNOTNULL
DROPPROCEDUREdbo.InsertOrder;
GO
 CREATE TABLE dbo.Customer (
     CustomerID INT IDENTITY(1,1) PRIMARY KEY,
     Email NVARCHAR(100) NOT NULL,
     Note NVARCHAR(50) NULL,    
     OrdersCount INT NOT NULL DEFAULT (1)
 );
 GO
 INSERT    dbo.Customer (Email)
 VALUES    (N'[email protected]'), – CustomerID 1
         (N'[email protected]'); – CustomerID 2
 GO
 CREATE PROCEDURE dbo.InsertOrder (
     @Email NVARCHAR(100)
 )
 AS
 BEGIN
     IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
     BEGIN
         UPDATE    dbo.Customer
         SET        OrdersCount = OrdersCount + 1
         WHERE    Email = @Email;
     END
     ELSE
     BEGIN
         WAITFOR DELAY '00:00:03';
         INSERT    dbo.Customer (Email)
         VALUES    (@Email) – OrdersCount = Default 1
     END
 END;
 GO

Note: WAITFOR DELAY is used to delay the INSERTion of the new customers with just 3 seconds.

Now we can start this test executing following statements in separate windows:

Query window #1 (Step 1):

Code Snippet
BEGIN TRANSACTION
     UPDATE    dbo.Customer
     SET        Note = N'Note#2'
     WHERE    Email = N'[email protected]';
 – ROLLBACK — or COMMIT

At this moment the CustomerID = 2 is locked X (exclusive) and no other transaction can read (excepting NOLOCK and READ UNCOMMITTED), update or delete this record. Because this transaction remains open (no COMMIT or ROLLBACK) this record will be locked till the end of transaction.

Query window #2 & #3 (Step 2 & Step 3):

Code Snippet
EXEC dbo.InsertOrder N'[email protected]';
 SELECT * FROM dbo.Customer;

At this moment the IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email) statement will try to read the record with CustomerID = 1 and then the record with CustomerID = 2 requesting a S lock (shared). Because X and S locks aren’t compatible (see Lock Compatibility) both (Step 2 & 3) SELECT * FROM dbo.Customer WHERE Email = @Email statements will be blocked. Note: you can see the “Executing ….” message within tab title.

Query window #1 (Step 4):

I execute the ROLLBACK or COMMIT statement which ends the initial transaction. The X lock for CustomerID = 2 record is released and the SELECT statements (SELECT * FROM dbo.Customer WHERE Email = @Email; Step 2 & 3) continue to be executed. Every EXISTS(SELECT …. WHERE Email = ‘[email protected]’) statement will return FALSE and both stored procedures will INSERT the customer `[email protected]`.

This is the reason why I get duplicated emails at the end (Step 5) of my example:

Solutions ?

Simplest solution is to create an unique index/constraint on Email column.

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
 ON dbo.Customer(Email);

This UNIQUE index/constraint guarantees that we can’t have duplicated emails. But this doesn’t means this UPSERT approach is 100% safe because dbo.InsertOrder stored procedure still tries to INSERT duplicated emails (at Step 5):

Next blog On Jan 27 I will discuss what solutions do we have for safe UPSERTs.

Revision History:

2014-01-12: I changed the last paragraph.

2014-01-27: I changed the last paragraph.

2014-04-06: I added the [See also] section

Comments (1)

2014-01-02

Happy New Year!

Filed under: sql server — ---- @ 5:48 PM

Code Snippet

SELECT (
SELECT CHAR(h+i)
FROM (
SELECT    e.f.value('(@value)', 'TINYINT'), ROW_NUMBER() OVER(ORDER BY e.f DESC) 
FROM (VALUES (N'<row value="57"/><row value="83"/><row value="99"/><row value="100"/><row value="110"/><row value="22"/><row value="69"/><row value="93"/><row value="112"/><row value="26"/><row value="84"/><row value="97"/><row value="94"/><row value="112"/><row value="32"/>')) a(b)
CROSS APPLY (VALUES (CONVERT(XML, a.b))) c(d)
CROSS APPLY c.d.nodes('/row') e(f)
) g(h,i)
ORDER BY i DESC
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')

SQL Fiddle demo

Leave a Comment

Yet another SQL Server enthusiast

2014-09-28

Simple solution to optimize bulk insert operations

2014-07-08

sp_DeadlockAnalyzer v01

2014-06-29

Cursuri .Net / ASP.NET / MVC / C# / SQL Server

2014-06-22

Common table expressions and the number of executions

2014-06-12

SQL Server: deadlocks

2014-04-08

SQL Server: SARG-able predicates and Scan operators

2014-01-22

UPSERT – safe solutions for SQL Server [DRAFT]

Solutions for a single row

Solutions for multiple rows

Revision History:

2014-01-13

Conditional joins

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

See also:

Revision History:

2014-01-02

Happy New Year!

Archives

Categories

Certifications

Follow “Yet another SQL Server enthusiast”