Yet another SQL Server enthusiast

2014-03-04

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #2

Filed under: concurrency,EXISTS,safe,UPSERT — ---- @ 8:45 AM

One of previous articles showed how easily we might get duplicate rows if we use following approach for UPSERT:

Code Snippet
IF EXISTS(SELECT  FROM MyTable WHERE ID = @param1)
    UPDATE    MyTable
    SET     Col1 = NewValue
    WHERE   ID = @param1
ELSE
    INSERT  MyTable(ID, Col1)
    VALUES  (@param1, @param2)    

This time I decided to do other tests but without WAITFOR. I’ve used following table (the same table) without any non-clustered indexes (like IUN_Customer_Email) and with just two rows:

Code Snippet
CREATE DATABASE TestUPSERT
GO

DBCC FREEPROCCACHE;
GO

USE TestUPSERT
GO

IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

INSERT dbo.Customer (Email) VALUES (N'[email protected]');
INSERT dbo.Customer (Email) VALUES (N'[email protected]');
GO

For my test I’ve used three scripts which try to insert or updated multiple times three emails: [email protected] (SQLQuery1.sql), [email protected] (SQLQuery2.sql) and [email protected] (SQLQuery3.sql). These files are available in D:\SqlOstressInput folder.

Content of SQLQuery{1|2|3}.sql script:

Code Snippet
– Parameters
DECLARE @Email NVARCHAR(100);
SET @Email = '[email protected]'; – or '[email protected]' or '[email protected]'
– End of Parameters
BEGIN TRANSACTION;
    IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
    BEGIN
        UPDATE  dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE   Email = @Email;
    END
    ELSE
    BEGIN
        INSERT    dbo.Customer(Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END    
COMMIT;

To run these scripts multiple times (100 times) using 10 concurrent connections I’ve used ostress.exe which is included into RML utilities for SQL Server. Before test I’ve used DBCC DROPCLEANBUFFERS and for ostress.exe I’ve used following arguments:

Code Snippet
[RML] C:\Program Files\Microsoft Corporation\RMLUtils>
ostress -S(local)\SQL2012 -E -dTestUPSERT -o"d:\SqlOstressOutput" -i"d:\SqlOstressInput\SQLQuery*.sql" -n10 -r100 -q

sqlserver.ostress.upsert.test

After few tries I succeeded to get duplicate rows (sample output, you might get different results) and for some emails ([email protected] and [email protected]) the OrderCount is wrong:

Code Snippet
CustomerID  Email Note OrdersCount
———– —– —- ———–
1           [email protected] NULL 1
2           [email protected] NULL 1001
3           [email protected] NULL 999
4           [email protected] NULL 999
5           [email protected] NULL 996
6           [email protected] NULL 996
7           [email protected] NULL 996
8           [email protected] NULL 996
9           [email protected] NULL 996

Conclusion: This approach for UPSERT (without any unique indexes and table hints or without serializable transaction isolation level) isn’t safe.

Note: for this test I used

Code Snippet
Microsoft SQL Server 2012 (SP1) – 11.0.3000.0 (X64)
    Oct 19 2012 13:38:57
    Copyright (c) Microsoft Corporation
    Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

Transaction isolation level was READ COMMITTED (default).

See also:

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #3

2014-02-12

Diviziunea relațională

Filed under: Uncategorized — ---- @ 8:52 AM

1) PPTX + scripturi SQL + funcție SQL CLR download 

Funcția SQL CLR DivideBy este prezentată doar cu titlu de exemplu. Nu este optimizată și nu este pe deplin testată.

2) Exemplu de utilizare a funcției DivideBy:

Code Snippet
SELECT    *
FROM    dbo.DivideBy('dbo.Rezultat', 'dbo.Disciplina', 'IdStudent', 'IdDisciplina') dr;
/*
Id

1
2
3
*/

SELECT    s.IdStudent, s.Nume, s.Prenume
FROM    dbo.DivideBy('dbo.Rezultat', 'dbo.Disciplina', 'IdStudent', 'IdDisciplina') dr
JOIN    dbo.Student s ON dr.Id = s.IdStudent;
/*
IdStudent Nume  Prenume
——— —– ———-
1         A     Alexandru
2         B     Bogdan
3         C     Constantin
*/

3) Resurse:

Divided We Stand: The SQL of Relational Division

Proper Relational Division With Sets

 

Mulțumesc

2014-02-10

How to avoid SQL injections: one simple solution

Filed under: Uncategorized — ---- @ 8:30 AM

SQL injections are one of the top threats for web applications. This kind of attack is possible if application concatenates and executes strings received from the end-users. This article shows a simple technique to avoid this attacks: [at database level] instead of receiving (and concatenating) strings (for example column’s or table’s name) from the client app it receives only integers. Every column will have an INT identifier associated.

Example:

If the column Name included into Person.PhoneNumberType could have the ID 4

Code Snippet
CREATE TABLE dbo.AllowedTable (
    IDAllowedTable INT PRIMARY KEY,
    TableSchema SYSNAME NOT NULL,
    TableName SYSNAME NOT NULL,
        UNIQUE (TableSchema,TableName),
    Caption NVARCHAR(50) NOT NULL – This is what end-user will see
);
INSERT    INTO dbo.AllowedTable
        (IDAllowedTable, TableSchema, TableName, Caption)
VALUES    (1, N'Person', N'AddressType', 'Type of address'),
        (2, N'Person', N'PhoneNumberType', 'Type of phone number');

CREATE TABLE dbo.AllowedColumn (
    IDAllowedColumn INT PRIMARY KEY, – This is what client-all will send to SQL Server
    IDAllowedTable INT NOT NULL REFERENCES dbo.AllowedTable(IDAllowedTable),
    ColumnName SYSNAME NOT NULL,
        UNIQUE(IDAllowedTable, ColumnName),
    IsPKColumn BIT NOT NULL,
    Caption NVARCHAR(50) NOT NULL – This is what end-user will see
);
– Every table has a simple PK (single column)
CREATE UNIQUE INDEX IUF_AllowedColumn_IDAllowedTable_WhereIsPKColumnIs1
ON dbo.AllowedColumn(IDAllowedTable)
WHERE IsPKColumn = 1;
GO
INSERT    INTO dbo.AllowedColumn
        (IDAllowedColumn, IDAllowedTable, ColumnName, IsPKColumn, Caption)
VALUES    (1, 1, N'AddressTypeID', 1, 'ID'),
        (2, 1, N'Name', 0, 'Name'),
        (3, 2, N'PhoneNumberTypeID', 1, 'ID'),
        (4, 2, N'Name', 0, 'Name'),
        (5, 2, N'ModifiedDate', 0, 'The last modification date');
GO

then if the client app needs a list of all values from this column it could execute a stored procedure having an INT parameter instead of [N]VARCHAR parameter:

Code Snippet
EXEC dbo.ExecuteDynamicQuery @IDColumn = 4;

Code Snippet
CREATE PROCEDURE dbo.ExecuteDynamicQuery
(
    @IDColumn INT – Client app will send only the ID of object (the id of column in this example)
)
AS
BEGIN
    DECLARE    @IDTable INT, @SecondColumnName SYSNAME;
    
    SELECT    @IDTable    = ac.IDAllowedTable,
            @SecondColumnName = ac.ColumnName
    FROM    dbo.AllowedColumn ac
    WHERE    ac.IDAllowedColumn = @IDColumn;

    IF @IDTable IS NULL – This will check if the source table exists
        RAISERROR('Error', 16, 1); – Object not found
    ELSE
    BEGIN
        DECLARE  @PkColumnName SYSNAME;
        SELECT    @PkColumnName = ac.ColumnName
        FROM    dbo.AllowedColumn ac
        WHERE    ac.IDAllowedTable = @IDTable
        AND        ac.IsPKColumn = 1;

        DECLARE @SqlStatement NVARCHAR(MAX);
        SELECT    @SqlStatement =
                N'SELECT ' + QUOTENAME(@PkColumnName)
                + N' , ' + QUOTENAME(@SecondColumnName)
                + N' FROM '  + QUOTENAME(at.tableSchema) + N'.' + QUOTENAME(at.TableName)
        FROM    dbo.AllowedTable at
        WHERE    at.IDAllowedTable = @IDTable;

        PRINT @SqlStatement;

        EXEC sp_executesql @SqlStatement;
    END
END;
GO

First, this stored procedure will validate the value of parameter (the column’s ID: @IDColumn INT) and then if the requested ID exists will generate and execute a SQL statement.

Usage:

Code Snippet
EXEC dbo.ExecuteDynamicQuery @IDColumn = 100;
/*
Msg 50000, Level 16, State 1, Procedure ExecuteDynamicQuery, Line 16
Error
*/
EXEC dbo.ExecuteDynamicQuery @IDColumn = NULL;
/*
Msg 50000, Level 16, State 1, Procedure ExecuteDynamicQuery, Line 16
Error
*/
EXEC dbo.ExecuteDynamicQuery @IDColumn = 2;
/*
Executed @SqlStatement:
SELECT [AddressTypeID] , [Name] FROM [Person].[AddressType]
*/
EXEC dbo.ExecuteDynamicQuery @IDColumn = 4;
/*
Executed @SqlStatement:
SELECT [PhoneNumberTypeID] , [Name] FROM [Person].[PhoneNumberType]
*/

Note: Microsoft has published a comprehensive article on this subject here .

2014-01-22

UPSERT – safe solutions for SQL Server [DRAFT]

Filed under: concurrency,EXISTS,MERGE,sql server,UPSERT — ---- @ 9:39 PM

This article tackles how to implement safe UPSERTs from concurrency point of view. A previous blog has presented a common solution for UPSERT and it has demonstrated that this solution isn’t safe. Current blog specifies common solutions for UPSERT operations with a single row.

To discuss these solutions I will use the same table dbo.Customer:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

If we want to insert a new customer then the identifier used for customers is the email address. If there is no email address stored within dbo.Customer then we need to insert a new customer with 1 order. Otherwise (the customer’s email already exists) we need to increment by 1 the OrdersCount column.

From this point of view we need also an unique index on Email column:

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This index has two important roles:

  1. It prevents duplicated emails and
  2. It helps those queries which tries to find customers using the email address as identifier (WHERE Email = constant / @variable).

Solutions for a single row

Code Snippet
– Solution #1: IF EXISTS
– Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = '[email protected]';
– End of Parameters
BEGIN TRANSACTION;
    IF EXISTS(SELECT * FROM dbo.Customer WITH(XLOCK, HOLDLOCK) WHERE Email = @Email)
    BEGIN
        UPDATE  dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE   Email = @Email;
    END
    ELSE
    BEGIN
        INSERT    dbo.Customer(Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END    
COMMIT; – Use ROLLBACK for testing
GO

– Solution #2: MERGE
– Parameters
DECLARE @Email NVARCHAR(100)
SET @Email = '[email protected]';
– End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    (VALUES(@Email)) src(Email) ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (@Email);
COMMIT;
GO

– Solution #3: UPDATE + INSERT
– Parameters
DECLARE @Email NVARCHAR(100)
– End of Parameters
SET @Email = '[email protected]';
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    WHERE    trg.Email = @Email;

    IF @@ROWCOUNT = 0
        INSERT    dbo.Customer(Email)
        VALUES    (@Email);        
COMMIT;
GO

Solutions for multiple rows

Code Snippet
– Solution #1: EXISTS
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters
DECLARE @CustomerEmailForUPDATE TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);

BEGIN TRANSACTION;
    INSERT    @CustomerEmailForUPDATE (Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    EXISTS(SELECT * FROM dbo.Customer c WITH(XLOCK, HOLDLOCK) WHERE c.Email = ce.Email);

    UPDATE  dbo.Customer
    SET        OrdersCount = OrdersCount + 1
    FROM    dbo.Customer c
    INNER JOIN @CustomerEmailForUPDATE cefu ON c.Email = cefu.Email
    
    INSERT  dbo.Customer(Email)
    SELECT    ce.Email – OrdersCount = Default 1
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailForUPDATE cefu WHERE cefu.Email = ce.Email)
COMMIT; – Use ROLLBACK for testing
GO

– Solution #2: MERGE
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters
BEGIN TRANSACTION;
    MERGE    dbo.Customer WITH(HOLDLOCK) trg
    USING    @CustomerEmail src ON trg.Email = src.Email
    WHEN MATCHED
        THEN    
        UPDATE
        SET        OrdersCount = trg.OrdersCount + 1
    WHEN NOT MATCHED BY TARGET
        THEN    
        INSERT (Email)
        VALUES    (src.Email);
COMMIT; – Use ROLLBACK for testing
GO

– Solution #3: UPDATE + INSERT
– Parameters
DECLARE @CustomerEmail TABLE( Email NVARCHAR(100) NOT NULL PRIMARY KEY );
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
INSERT    @CustomerEmail (Email) VALUES (N'[email protected]');
– End of Parameters

DECLARE @CustomerEmailUPDATED TABLE(
    Email NVARCHAR(100) NOT NULL PRIMARY KEY
);
BEGIN TRANSACTION;
    UPDATE    trg
    SET        OrdersCount = OrdersCount + 1
    OUTPUT    inserted.Email INTO @CustomerEmailUPDATED
    FROM    dbo.Customer trg WITH(HOLDLOCK)
    INNER JOIN @CustomerEmail src ON trg.Email = src.Email;

    INSERT    dbo.Customer(Email)
    SELECT    ce.Email
    FROM    @CustomerEmail ce
    WHERE    NOT EXISTS(SELECT * FROM @CustomerEmailUPDATED upd WHERE upd.Email = ce.Email);
COMMIT;
GO

The next blog

  • will present solutions for scenarios where we need to insert or update multiple rows (2014-02-18) and
  • will discuss pros and cons of these solutions.

Revision History:

2014-02-08: First update. I have added a new section for multiple rows. Solutions for a single row: some minor changes.

2014-01-13

Conditional joins

Let’s start with following example (based on Supertypes and Subtypes modeling approach) which creates three tables:

  1. dbo.BankAccountOwner (BankAccountOwnerID-PK, OwnerType, CreateDate)
  2. dbo.Person (BankAccountOwnerID – PK, FK, FirstName, LastName)
  3. dbo.Company (BankAccountOwnerID – PK, FK, CompanyName)

T-SQL Script:

Code Snippet
CREATE TABLE dbo.BankAccountOwner (
    BankAccountOwnerID INT PRIMARY KEY,
    OwnerType CHAR(1) NOT NULL CHECK( OwnerType IN ('P', 'C') ), – P=Person, C=Company
    CreateDate DATETIME NOT NULL DEFAULT (GETDATE())
);
CREATE TABLE dbo.Person (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    FirstName NVARCHAR(50) NOT NULL,
    LastName NVARCHAR(50) NOT NULL
);
CREATE TABLE dbo.Company (
    BankAccountOwnerID INT PRIMARY KEY
        REFERENCES dbo.BankAccountOwner(BankAccountOwnerID),
    CompanyName NVARCHAR(100) NOT NULL
);

INSERT    dbo.BankAccountOwner (BankAccountOwnerID, OwnerType)
VALUES    (1, 'P'), (2, 'P'), (3, 'C'), (4, 'C'), (5, 'C');
INSERT    dbo.Person (BankAccountOwnerID, FirstName, LastName)
VALUES    (1, N'John', N'Doe'), (2, N'Mary', N'Doe');
INSERT    dbo.Company (BankAccountOwnerID, CompanyName)
VALUES (3, N'MyComputer'), (4, N'Control Panel'), (5, N'Device Manager');
GO

Problem: how can we get FirstName, LastName and CompanyName values for the following owners: 1, 2, 3, 4, 5 ?

First solution:

Code Snippet
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'Test #1';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

My solution:

Because

  1. dbo.Person table contains only rows with OwnerType = ‘P’ and dbo.Company table contains only rows with OwnerType = ‘C’ and
  2. SQL Server doesn’t knows this

I added these predicates to every left join thus:

Code Snippet
PRINT 'Test #2';
SELECT    bao.*, p.FirstName, p.LastName, c.CompanyName
FROM    dbo.BankAccountOwner bao
LEFT JOIN dbo.Person p ON bao.OwnerType = 'P' AND bao.BankAccountOwnerID = p.BankAccountOwnerID
LEFT JOIN dbo.Company c ON bao.OwnerType = 'C' AND bao.BankAccountOwnerID = c.BankAccountOwnerID
WHERE    bao.BankAccountOwnerID IN (1,2,3,4,5);

These are the execution plans:

Conditional.Joins.Execution.Plans

First solution: as you can see from the properties of Index Seek operators SQL Server will try to find every bank account owner (1, 2, …, 5) within dbo.Person table and within dbo.Company table (Number of Execution  = 5).

Second solution: this time because SQL Server knows that dbo.Person table contains only OwnerType = ‘P’ rows it will read from dbo.Person table using an Index Seek (on Person.PK_Person_…) just two times (Number of Executions = 2) because there are only two persons among those five owners. This was possible because of Filter operator which will verify before executing Index Seek if the current owner is ‘P':

Filter.Operator

Also, SQL Server will read from dbo.Company table using an Index Seek (on Company.PK_Company_…) three times (Number of Executions = 3) because there are three companies.

Because we have reduced the number of execution for Index Seek operators, this is means that we have reduced also the number of logical reads as we can see from the output of STATISTICS IO ON:

Code Snippet
Test #1
Table 'Company'. Scan count 0, logical reads 10
Table 'Person'. Scan count 0, logical reads 10
Table 'BankAccountOwner'. Scan count 5, logical reads 10
Test #2
Table 'Company'. Scan count 0, logical reads 6
Table 'Person'. Scan count 0, logical reads 4
Table 'BankAccountOwner'. Scan count 5, logical reads 10

2014-01-05

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ?

Filed under: concurrency,EXISTS,index,MERGE,safe,sql server,UPSERT — ---- @ 10:47 PM

Is it safe the following approach for UPSERT ?

Code Snippet
IF EXISTS (SELECT FROM MyTable WHERE ID = @param1)
    UPDATE    MyTable
    SET        Col1 = NewValue
    WHERE    ID = @param1
ELSE
    INSERT    MyTable(ID, Col1)
    VALUES    (@param1, @param2)

The short answer is no!

Example:

  1. The transaction isolation level is READ COMMITTED which is the default isolation level.
  2. For this demo I’ve used a customer table to store the customer’s email and the number of sales orders (for every customer).
  3. I’ve inserted two customers (1 – [email protected]; 2 – [email protected]).
  4. I’ve created a stored procedure for UPSERT (dbo.InsertOrder). This stored procedure receive the customer’s email, it checks if exists this email and if not then it inserts the new customer (sales orders 1). Otherwise it increments the number of sales orders by 1.

T-SQL script:

Code Snippet
IF OBJECT_ID(N'dbo.Customer') IS NOT NULL
    DROP TABLE dbo.Customer;
IF OBJECT_ID(N'dbo.InsertOrder') IS NOT NULL
    DROP PROCEDURE dbo.InsertOrder;
GO

CREATE TABLE dbo.Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Email NVARCHAR(100) NOT NULL,
    Note NVARCHAR(50) NULL,    
    OrdersCount INT NOT NULL DEFAULT (1)
);
GO

INSERT    dbo.Customer (Email)
VALUES    (N'[email protected]'), – CustomerID 1
        (N'[email protected]'); – CustomerID 2
GO

CREATE PROCEDURE dbo.InsertOrder (
    @Email NVARCHAR(100)
)
AS
BEGIN
    IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email)
    BEGIN
        UPDATE    dbo.Customer
        SET        OrdersCount = OrdersCount + 1
        WHERE    Email = @Email;
    END
    ELSE
    BEGIN
        WAITFOR DELAY '00:00:03';
        INSERT    dbo.Customer (Email)
        VALUES    (@Email) – OrdersCount = Default 1
    END
END;
GO

Note: WAITFOR DELAY is used to delay the INSERTion of the new customers with just 3 seconds.

Now we can start this test executing following statements in separate windows:

Is.NOT.Safe.If.Exists.Update.Insert.UPSERT

Query window #1 (Step 1):

Code Snippet
BEGIN TRANSACTION
    UPDATE    dbo.Customer
    SET        Note = N'Note#2'
    WHERE    Email = N'[email protected]';
– ROLLBACK — or COMMIT

At this moment the CustomerID = 2 is locked X (exclusive) and no other transaction can read (excepting NOLOCK and READ UNCOMMITTED), update or delete this record. Because this transaction remains open (no COMMIT or ROLLBACK) this record will be locked till the end of transaction.

Query window #2 & #3 (Step 2 & Step 3):

Code Snippet
EXEC dbo.InsertOrder N'[email protected]';
SELECT * FROM dbo.Customer;

At this moment the IF EXISTS(SELECT * FROM dbo.Customer WHERE Email = @Email) statement will try to read the record with CustomerID = 1 and then the record with CustomerID = 2 requesting a S lock (shared). Because X and S locks aren’t compatible (see Lock Compatibility) both (Step 2 & 3) SELECT * FROM dbo.Customer WHERE Email = @Email statements will be blocked. Note: you can see the “Executing ….” message within tab title.

Query window #1 (Step 4):

I execute the ROLLBACK or COMMIT statement which ends the initial transaction. The X lock for CustomerID = 2 record is released and the SELECT statements (SELECT * FROM dbo.Customer WHERE Email = @Email; Step 2 & 3)  continue to be executed. Every EXISTS(SELECT …. WHERE Email = [email protected]) statement will return FALSE and both stored procedures will INSERT the customer `[email protected]`.

This is the reason why I get duplicated emails at the end (Step 5) of my example:

Results.Step5Solutions ?

Simplest solution is to create an unique index/constraint on Email column.

Code Snippet
CREATE UNIQUE INDEX IUN_Customer_Email
ON dbo.Customer(Email);

This UNIQUE index/constraint guarantees that we can’t have duplicated emails. But this doesn’t means this UPSERT approach is 100% safe because dbo.InsertOrder stored procedure still tries to INSERT duplicated emails (at Step 5):

Results.Step5.With.UNIQUE.index

Next blog On Jan 27 I will discuss what solutions do we have for safe UPSERTs.

See also:

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #2

Is it safe IF EXISTS – UPDATE ELSE INSERT [UPSERT] ? #3

Revision History:

2014-01-12: I changed the last paragraph.

2014-01-27: I changed the last paragraph.

2014-04-06: I added the [See also] section

2014-01-02

Happy New Year!

Filed under: sql server — ---- @ 5:48 PM
Code Snippet
  1. SELECT (
  2. SELECT CHAR(h+i)
  3. FROM (
  4. SELECT    e.f.value('(@value)', 'TINYINT'), ROW_NUMBER() OVER(ORDER BY e.f DESC)
  5. FROM (VALUES (N'<row value="57"/><row value="83"/><row value="99"/><row value="100"/><row value="110"/><row value="22"/><row value="69"/><row value="93"/><row value="112"/><row value="26"/><row value="84"/><row value="97"/><row value="94"/><row value="112"/><row value="32"/>')) a(b)
  6. CROSS APPLY (VALUES (CONVERT(XML, a.b))) c(d)
  7. CROSS APPLY c.d.nodes('/row') e(f)
  8. ) g(h,i)
  9. ORDER BY i DESC
  10. FOR XML PATH(''), TYPE
  11. ).value('.', 'NVARCHAR(MAX)')

SQL Fiddle demo

2013-12-25

Notes regarding SQL Server and BETWEEN operator

Filed under: between,optimization,SARG,sql server — ---- @ 12:49 AM

Note #1

According to ANSI SQL ’92 "X BETWEEN Y AND Z" is equivalent to "X>=Y AND X<=Z". This is the reason why "WHERE h.OrderDate BETWEEN @StartDate AND @EndDate" is translated into "WHERE h.OrderDate >= @StartDate AND h.OrderDate <= @EndDate":Sql.Serevr.Between.Execution.Plan.1

This means also that Y must be less or equal than Z (Y <= Z) and @StartDate must be less or equal than @EndDate (@StartDate <= @EndDate). Now, if we run following query having @StartDate = Dec 31, 2005 and @EndDate = Jan 1, 2005,

Code Snippet
  1. SET STATISTICS IO ON;
  2.  
  3. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  4.         @EndDate DATETIME = '20050101 00:00:00.000';
  5.         
  6. SELECT    COUNT(*) AS OrderCount1
  7. FROM    Sales.SalesOrderHeader h
  8. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'SalesOrderHeader'. Scan count 1, logical reads 686
  12. */

the result of COUNT(*) is 0 sales orders and the output of STATISTICS IO ON (the number of logical reads = the number of 8K data pages read from buffer pool) is 686 [logical reads]. This means that SQL Server reads data from Sales.SalesOrderHeader table though @StartDate <= @EndDate predicate is False for current values ("WHERE h.OrderDate BETWEEN ‘20051231 23:59:59.997′ AND ‘20050101 00:00:00.000’" or "WHERE h.OrderDate >= ‘20051231 23:59:59.997′ AND h.OrderDate <= ‘20050101 00:00:00.000’").

What options do we have in such cases ?

  1. We validate the values of parameters
  2. We tell SQL Server that Y must be less or equal than Z:
Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount1
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate BETWEEN @StartDate AND @EndDate
  8. AND        @StartDate <= @EndDate
  9. /*
  10. STATISTICS IO ON output:
  11. Table 'Worktable'. Scan count 0, logical reads 0
  12. */

This time, because SQL Server knows that @StartDate must be less or equal than @EndDate the output of STATISTICS IO ON shows that it never reads data from Sales.SalesOrderHeader table (0 logical reads). The execution plan of this query includes a Filter operator which checks  @StartDate <= @EndDate predicate (‘20051231 23:59:59.997′ <= ‘20050101 00:00:00.000′) preventing (in this case) the execution of Index Scan / Index Seek operator and thus preventing reading data from Sales.SalesOrderHeader table:

Sql.Serevr.Between.Execution.Plan.3

Note #2

What if we want to count sales orders when @StartDate <= @EndDate but also when @StartDate > @EndDate thus avoiding the limitation of BETWEEN operator from ANSI SQL ? One solution is to rewrite the BETWEEN predicate in such way that BETWEEN always will bet BETWEEN @MinimumValue AND @MaximumValue.

Example:

Code Snippet
  1. SET STATISTICS IO ON;
  2. DECLARE @StartDate DATETIME = '20051231 23:59:59.997',
  3.         @EndDate DATETIME = '20050101 00:00:00.000';
  4.         
  5. SELECT    COUNT(*) AS OrderCount2
  6. FROM    Sales.SalesOrderHeader h
  7. WHERE    h.OrderDate
  8.         BETWEEN
  9.         – Minimum value
  10.         (CASE WHEN @StartDate <= @EndDate THEN @StartDate ELSE @EndDate END)
  11.         AND
  12.         – Maximum value
  13.         (CASE WHEN @StartDate <= @EndDate THEN @EndDate ELSE @StartDate END)

This solution has an advantage: is SARG-able and if there is an index on OrderDate column

Code Snippet
  1. CREATE INDEX IX_SalesOrderHeader_OrderDate
  2. ON Sales.SalesOrderHeader (OrderDate);

then the execution plan will include an Index Seek operator:

Sql.Serevr.Between.Execution.Plan.2

2013-12-16

SQL Server conferences with recorded sessions

Filed under: conference,sql server — ---- @ 7:52 AM
  1. SQLRally Nordic 2013
  2. SQLBits X (also see Previous Events)

2013-12-09

Non SARG-able predicates #1

Sometimes isn’t enough to create an index. to optimize a query. Even we create the proper index we may discover by examining the execution plan that SQL Server doesn’t use that an Index Seek to find the rows  using instead an Index Scan.

For example, if we want to count the number of sales order for December 2005 a first attempt can use following query and an index on OrderDate:

Code Snippet
CREATE INDEX IX_SalesOrderHeader_OrderDate
ON Sales.SalesOrderHeader (OrderDate);
GO
SET STATISTICS IO ON;
DBCC FREEPROCCACHE;
GO
SELECT COUNT(*)
FROM Sales.SalesOrderHeader h
WHERE YEAR(h.OrderDate) = 2005 AND MONTH(h.OrderDate) = 12
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 73
*/

Execution.Plan.Index.Scan

We can see that for this query the execution plan includes an Index Scan instead of Index Seek. The reason is these two predicates aren’t SARG-able. A SARG-able predicates allows an Index Seek if an index is available.

The second attempt uses a BETWEEN predicate (which is SARG-able) thus:

Code Snippet
DBCC FREEPROCCACHE;
GO
SELECT    COUNT(*)
FROM    Sales.SalesOrderHeader h
WHERE    h.OrderDate BETWEEN '20051201' AND '20051231 23:59:59.997'
– or WHERE    h.OrderDate >= '20051201' AND h.OrderDate < '20060101'
GO
/*
SET STATISTICS IO ON output:
Table 'SalesOrderHeader'. Scan count 1, logical reads 3
*/

Execution.Plan.Index.Seek

This time the execution plan includes an Index Seek and the performance (Logical Reads) is better: just 3 pages / logical reads (Seek) instead of 73 pages logical reads (Index Scan).

More information on SARG-able predicates can be found here.

Note: For this test I used Adventure Works for SQL Server 2012 (data file) database

« Previous PageNext Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.