EXPLAIN EXTENDED

How to create fast database queries

SQL Server: leading wildcard match using an index

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm joining to a table dozens of different times, and every time, I join (or filter) based on the results of a SUBSTRING of one of the columns (it's a string, but left-padded with zeros, and I don't care about the last four digits).

As a result, even though this column is indexed and my query would use the index, it does a table scan because the SUBSTRING itself isn't indexed, so SQL Server has to calculate it for every row before as it's joining.

I'm looking for any ideas on how to speed up this process.

This problem is called leading wildcard match. Though it's possible to implement it, it usually requires indexing all possible combinations of substrings which usually bloats the index.

However, in this very case it's possible to create quite an efficient index. It is possible because the number of characters we need to trim from the right is fixed.

The main idea is to create a computed column which would store trimmed and reversed value of the string, index that column and use this index in search.

Let's create a sample table:


Table creation details

This table contains 100,000 randomly generated four character strings, prepended with a random number of zeros (from 0 to 9) and appended with 4 randomly chosen decimal digits.

We created a computed column on the following condition: RIGHT(REVERSE(string), LEN(string) - 4)

It strips last four characters from the string and reverses the remaining string.

Now, let's try to find the string which would match the search condition BCDE:

SELECT  id, string
FROM    [20090824_prefix].t_string
WHERE   LEFT(string, LEN(string) - 4) LIKE '%BCDE'

See the query details

This is quite long, 340 ms.

Now, let's try the same query using an index. We should just take one thing into account: since we have reversed the string to build the index, we should reverse the search condition just as well.

Here's the query:

SELECT  id, string
FROM    [20090824_prefix].t_string
WHERE   rstring LIKE 'EDCB%'

See the query details

This is instant: 1 ms, just a single index lookup.

Written by Quassnoi

August 24th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply