Scenario
-
There is a query that uses big tables,
-
It’s executed rarely (ex. it’s executed once or twice per month),
-
This query has many conditions, some of them uses UDF scalar functions. The combined selectivity of non-UDF conditions is high (ex. 3.295 rows from 31.263.601).
Problem
The query is slow. How can be optimized quickly this query without rewriting scalar function ?
Scalar UDF used for this test:
RETURNS NVARCHAR(250)
AS
BEGIN
DECLARE @Result NVARCHAR(250), @Index TINYINT = 1, @Len TINYINT = LEN(@Name + '$') - 1;
SET @Result = CASE WHEN @Name IS NOT NULL THEN '' END
WHILE @Index <= @Len
BEGIN
SET @Result += UPPER(SUBSTRING(@Name, @Index, 1))
SET @Index += 1
END
RETURN @Result
END;
Note: this scalar function was wrote intentionally to be slow.
I’ve used following query for this test:
SELECT dbo.SlowScalarFunction(h.ProductName) AS UpperName, h.Quantity
FROM [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE p.ListPrice > 3550 AND h.Quantity >= 100 AND dbo.SlowScalarFunction(h.ProductName) LIKE N'ROAD%'
GO
Execution plan
As you can see,
-
The WHERE clause contains one condition that calls a scalar UDF ( dbo.SlowScalarFunction(h.ProductName) LIKE N’ROAD%’ ).
-
The same scalar UDF is used also by SELECT clause.
-
The execution plan is serial.
-
There is a “missing index” warning that tells we should create a non-clustered index:
Code SnippetCREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[BigTransactionHistory] ([ProductID],[Quantity])
INCLUDE ([ProductName])
[One] Solution
1) Because this query is executed rarely, the suggested “missing” index will be / could be used rarely (for read operations). This means that after creation of this index will be an overhead due to maintenance of this index after every INSER, UPDATE or DELETE operation.
2) The usage of scalar function forces SQL Server to choose a serial plan.
Because of these two reasons I decided to not create the suggested index and to decompose the initial query thus:
-
One query without scalar functions calls. This query inserts data that into a temporary table (#Results). This table will contain, usually, a small amount of data. Because of lack of scalar functions, the execution plan will use parallelism.
-
For that small amount of rows inserted into temporary table I call the scalar UDF.
INTO #Results
FROM [dbo].[BigProduct] p JOIN BigTransactionHistory h ON p.ProductID = h.ProductID
WHERE ListPrice > 3550 AND Quantity >= 100
ALTER TABLE #Results
ADD UpperName NVARCHAR(250)
UPDATE #Results
SET UpperName = dbo.SlowScalarFunction(ProductName)
SELECT UpperName, Quantity
FROM #Results
WHERE UpperName LIKE N'ROAD%'
DROP TABLE #Results
Execution plan:
Using a warm buffer, after these changes, the execution time dropped from 10 seconds to 1 second .