EXPLAIN EXTENDED

How to create fast database queries

Minimum and maximum on bit fields: SQL Server

Comments enabled. I *really* need your comment

From Stack Overflow:

I have this query in SQL Server 2005:

SELECT  *
FROM    tblJobs AS j
-- joining other tables here
WHERE   Closed = 0
AND Invoiced = 0
AND Active = 1
AND DepartmentId <> 2
AND dbo.fncIsAllPointsDelivered(JobID) = 1

This query is taking too long to run, and I know the problem is the UDF: dbo.fncIsAllPointsDelivered(J.JobID) = 1

The SQL for the UDF is here:

DECLARE @DetailCount int
DECLARE @TrackingCount int

SELECT  @DetailCount = COUNT(*)
FROM    tblLoadDetails
WHERE   JobId = @JobId

SELECT  @TrackingCount = COUNT(*)
FROM    tblLoadDetails
WHERE   JobId = @JobId
AND Delivered = 1

IF (@DetailCount = @TrackingCount AND @DetailCount > 0)
RETURN 1

RETURN 0

All of this runs blazingly fast unless the job has a large number of load details in it.

I am trying to think of a way to either make the UDF faster or get rid of the need for the UDF, but I am at a loss.

UDF is definitely a problem here, since the conditions based on the UDF's are not sargable, that is an index cannot be used to search for them.

Let's look closed into this UDF.

Given a @JobID, the function searches the load details for this job. For the function to succeed, all load details should have the state of Delivered, and there must be at least one load detail for this job.

The UDF can be rewritten using IN and NOT IN conditions. The first condition makes sure that there is at least one load detail for the given job, the second one ensures that all load details have state Delivered.

Let's create a sample table and see how it works:

Table creation details

There are 20,000 records in t_job and 1,900,000 records in t_loaddetails (with random value of delivery).

The first 1,000 jobs have no corresponding load details.

Now, here's the query:

View the query

This is quite efficient, but still there are two checks: one to check that the load details are present, another one to check there are not undelivered load details.

Can it be rewritten using only one check?

If we look into this condition, we will see that the query should fail if there are no load details or there are undelivered load details.

This can be reformulated: for any given job, the minimal value of delivery should be equal to 1.

Indeed: if we have no load details for the given job, there will be no minimum (the minimum is NULL). If we have but one undelivered load detail, it will have the value of 0 which is less than 1, and therefore will constitute the minumum. Only if the values are present and are delivered, the value of the minimum will be equal to 1.

Unfortunately, SQL Server does not let to calcuate MIN or MAX for the bit fields, and casting them into an integer would kill all performance benefit, since no index can be used to find the extremal values.

However, SQL Server still maintains the order of bit fields, and this limitation can be easilty worked around using CROSS APPLY / TOP.

Here's the query:

SELECT  j.*
FROM    [20091012_bit].t_job j
CROSS APPLY
(
SELECT  TOP 1 delivered
FROM    [20091012_bit].t_loaddetail
WHERE   job = j.id
) ld
WHERE   ld.delivered = 1

View query details

Since there is one check instead of two, the execution plan is much more simple (a single TOP in a loop), and the query completes thrice as fast.

Written by Quassnoi

October 12th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply