Minimum and maximum on bit fields: SQL Server at EXPLAIN EXTENDED

I have this query in SQL Server 2005:
SELECT  *
FROM    tblJobs AS j
-- joining other tables here
WHERE   Closed = 0
AND Invoiced = 0
AND Active = 1
AND DepartmentId &lt;&gt; 2
AND dbo.fncIsAllPointsDelivered(JobID) = 1
This query is taking too long to run, and I know the problem is the UDF: dbo.fncIsAllPointsDelivered(J.JobID) = 1

The SQL for the UDF is here:
DECLARE @DetailCount int
DECLARE @TrackingCount int

SELECT  @DetailCount = COUNT(*)
FROM    tblLoadDetails
WHERE   JobId = @JobId

SELECT  @TrackingCount = COUNT(*)
FROM    tblLoadDetails
WHERE   JobId = @JobId
AND Delivered = 1

IF (@DetailCount = @TrackingCount AND @DetailCount &gt; 0)
RETURN 1

RETURN 0
All of this runs blazingly fast unless the job has a large number of load details in it.

I am trying to think of a way to either make the UDF faster or get rid of the need for the UDF, but I am at a loss.

UDF is definitely a problem here, since the conditions based on the UDF's are not sargable, that is an index cannot be used to search for them.

Let's look closed into this UDF.

Given a @JobID, the function searches the load details for this job. For the function to succeed, all load details should have the state of Delivered, and there must be at least one load detail for this job.

The UDF can be rewritten using IN and NOT IN conditions. The first condition makes sure that there is at least one load detail for the given job, the second one ensures that all load details have state Delivered.

Let's create a sample table and see how it works:

Table creation details

CREATE SCHEMA [20091012_bit]
CREATE TABLE t_job (
id INT NOT NULL,
name VARCHAR(30) NOT NULL
)
CREATE TABLE t_loaddetail (
id INT NOT NULL,
job INT NOT NULL,
delivered BIT NOT NULL,
name VARCHAR(30) NOT NULL
)
GO
CREATE INDEX ix_loaddetail_job_delivered
ON [20091012_bit].t_loaddetail (job, delivered)
GO
BEGIN TRANSACTION
SELECT  RAND(20091012)
DECLARE @cnt INT
DECLARE @ld INT
SET @cnt = 1
WHILE @cnt &lt;= 20000
BEGIN
        INSERT
        INTO    [20091012_bit].t_job
        VALUES  (@cnt, 'Job ' + CAST(@cnt AS VARCHAR))
        SET @ld = 1
        WHILE @cnt &gt; 1000 AND @ld &lt;= 100
        BEGIN
                INSERT
                INTO    [20091012_bit].t_loaddetail
                VALUES  (
                        (@cnt - 1) * 10 + @ld,
                        @cnt,
                        CASE WHEN RAND() &lt; 0.93 THEN 1 ELSE 0 END,
                        'Load detail ' + CAST((@cnt - 1) * 10 + @ld AS VARCHAR)
                        )
                SET @ld = @ld + 1
        END
        SET @cnt = @cnt + 1
END
COMMIT
GO

There are 20,000 records in t_job and 1,900,000 records in t_loaddetails (with random value of delivery).

The first 1,000 jobs have no corresponding load details.

Now, here's the query:

View the query

SELECT  *
FROM    [20091012_bit].t_job
WHERE   id IN
(
SELECT  job
FROM    [20091012_bit].t_loaddetail
)
AND id NOT IN
(
SELECT  job
FROM    [20091012_bit].t_loaddetail
WHERE   delivered = 0
)

id	name
6151	Job 6151
6167	Job 6167
6619	Job 6619
6430	Job 6430
12869	Job 12869
18484	Job 18484
18436	Job 18436
18536	Job 18536
18994	Job 18994
19654	Job 19654
3867	Job 3867
4835	Job 4835
13025	Job 13025
19051	Job 19051
18005	Job 18005
15 rows fetched in 0.0005s (0.3032s)

Table 't_loaddetail'. Scan count 19003, logical reads 73131, physical reads 0, read-ahead reads 7, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_job'. Scan count 3, logical reads 295, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 593 ms,  elapsed time = 303 ms.

  |--Parallelism(Gather Streams)
       |--Nested Loops(Left Anti Semi Join, OUTER REFERENCES:([test].[20091012_bit].[t_job].[id], [Expr1009]) WITH UNORDERED PREFETCH)
            |--Hash Match(Right Semi Join, HASH:([test].[20091012_bit].[t_loaddetail].[job])=([test].[20091012_bit].[t_job].[id]))
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[20091012_bit].[t_loaddetail].[job]))
            |    |    |--Stream Aggregate(GROUP BY:([test].[20091012_bit].[t_loaddetail].[job]))
            |    |         |--Index Scan(OBJECT:([test].[20091012_bit].[t_loaddetail].[IX_loaddetail]), ORDERED FORWARD)
            |    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([test].[20091012_bit].[t_job].[id]))
            |         |--Clustered Index Scan(OBJECT:([test].[20091012_bit].[t_job].[PK__t_job__48C5B0DD]))
            |--Top(TOP EXPRESSION:((1)))
                 |--Index Seek(OBJECT:([test].[20091012_bit].[t_loaddetail].[IX_loaddetail]), SEEK:([test].[20091012_bit].[t_loaddetail].[job]=[test].[20091012_bit].[t_job].[id] AND [test].[20091012_bit].[t_loaddetail].[delivered]=(0)) ORDERED FORWARD)

This is quite efficient, but still there are two checks: one to check that the load details are present, another one to check there are not undelivered load details.

Can it be rewritten using only one check?

If we look into this condition, we will see that the query should fail if there are no load details or there are undelivered load details.

This can be reformulated: for any given job, the minimal value of delivery should be equal to 1.

Indeed: if we have no load details for the given job, there will be no minimum (the minimum is NULL). If we have but one undelivered load detail, it will have the value of 0 which is less than 1, and therefore will constitute the minumum. Only if the values are present and are delivered, the value of the minimum will be equal to 1.

Unfortunately, SQL Server does not let to calcuate MIN or MAX for the bit fields, and casting them into an integer would kill all performance benefit, since no index can be used to find the extremal values.

However, SQL Server still maintains the order of bit fields, and this limitation can be easilty worked around using CROSS APPLY / TOP.

Here's the query:

SELECT  j.*
FROM    [20091012_bit].t_job j
CROSS APPLY
(
SELECT  TOP 1 delivered
FROM    [20091012_bit].t_loaddetail
WHERE   job = j.id
) ld
WHERE   ld.delivered = 1

View query details

id	name
3867	Job 3867
4835	Job 4835
6151	Job 6151
6167	Job 6167
6430	Job 6430
6619	Job 6619
12869	Job 12869
13025	Job 13025
18005	Job 18005
18436	Job 18436
18484	Job 18484
18536	Job 18536
18994	Job 18994
19051	Job 19051
19654	Job 19654
15 rows fetched in 0.0005s (0.1148s)

Table 't_loaddetail'. Scan count 20000, logical reads 63844, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_job'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 93 ms,  elapsed time = 115 ms.

  |--Filter(WHERE:([test].[20091012_bit].[t_loaddetail].[delivered]=(1)))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([j].[id], [Expr1005]) OPTIMIZED WITH UNORDERED PREFETCH)
            |--Clustered Index Scan(OBJECT:([test].[20091012_bit].[t_job].[PK__t_job__48C5B0DD] AS [j]))
            |--Top(TOP EXPRESSION:((1)))
                 |--Index Seek(OBJECT:([test].[20091012_bit].[t_loaddetail].[IX_loaddetail]), SEEK:([test].[20091012_bit].[t_loaddetail].[job]=[test].[20091012_bit].[t_job].[id] as [j].[id]) ORDERED FORWARD)

Since there is one check instead of two, the execution plan is much more simple (a single TOP in a loop), and the query completes thrice as fast.

Written by Quassnoi

October 12th, 2009 at 11:00 pm

Posted in SQL Server

EXPLAIN EXTENDED

Minimum and maximum on bit fields: SQL Server

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow