SQL Server: leading wildcard match using an index at EXPLAIN EXTENDED

I'm joining to a table dozens of different times, and every time, I join (or filter) based on the results of a SUBSTRING of one of the columns (it's a string, but left-padded with zeros, and I don't care about the last four digits).

As a result, even though this column is indexed and my query would use the index, it does a table scan because the SUBSTRING itself isn't indexed, so SQL Server has to calculate it for every row before as it's joining.

I'm looking for any ideas on how to speed up this process.

This problem is called leading wildcard match. Though it's possible to implement it, it usually requires indexing all possible combinations of substrings which usually bloats the index.

However, in this very case it's possible to create quite an efficient index. It is possible because the number of characters we need to trim from the right is fixed.

The main idea is to create a computed column which would store trimmed and reversed value of the string, index that column and use this index in search.

Let's create a sample table:

Table creation details

CREATE SCHEMA [20090824_prefix]
CREATE TABLE [20090824_prefix].t_string (
id INT NOT NULL PRIMARY KEY,
string VARCHAR(100) NOT NULL,
rstring AS RIGHT(REVERSE(string), LEN(string) - 4)
)
GO
CREATE INDEX IX_string_rstring ON [20090824_prefix].t_string(rstring)
GO
BEGIN TRANSACTION
SELECT  RAND(20090824)
DECLARE @cnt INT
DECLARE @plen INT
DECLARE @tval CHAR(4)
DECLARE @nval CHAR(4)
SET @cnt = 1
WHILE @cnt &lt;= 100000
BEGIN
        SET @plen = CAST(RAND() * 10 AS INTEGER)
        SET @tval = RIGHT('0000' + CAST(RAND() * 10000 AS INTEGER), 4)
        SET @nval = ''
        WHILE LEN(@nval) &lt; 4
        BEGIN
                SET @nval = CHAR(65 + RAND() * 26) + @nval
        END
        INSERT
        INTO    [20090824_prefix].t_string
        VALUES  (@cnt, REPLICATE('0', @plen) + @nval + @tval)
        SET @cnt = @cnt + 1
END
COMMIT

This table contains 100,000 randomly generated four character strings, prepended with a random number of zeros (from 0 to 9) and appended with 4 randomly chosen decimal digits.

We created a computed column on the following condition: RIGHT(REVERSE(string), LEN(string) - 4)

It strips last four characters from the string and reverses the remaining string.

Now, let's try to find the string which would match the search condition BCDE:

SELECT  id, string
FROM    [20090824_prefix].t_string
WHERE   LEFT(string, LEN(string) - 4) LIKE '%BCDE'

See the query details

id	string
97612	000BCDE5598
1 row fetched in 0.0002s (0.3421s)

Table 't_string'. Scan count 1, logical reads 541, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 342 ms.

SELECT
  Clustered Index Scan ([test].[20090824_prefix].[t_string])

This is quite long, 340 ms.

Now, let's try the same query using an index. We should just take one thing into account: since we have reversed the string to build the index, we should reverse the search condition just as well.

Here's the query:

SELECT  id, string
FROM    [20090824_prefix].t_string
WHERE   rstring LIKE 'EDCB%'

See the query details

id	string
97612	000BCDE5598
1 row fetched in 0.0002s (0.0007s)

Table 't_string'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

SELECT
  Nested Loops (Inner Join)
    Index Seek ([test].[20090824_prefix].[t_string].[IX_string_rstring])
    Key Lookup ([test].[20090824_prefix].[t_string])

This is instant: 1 ms, just a single index lookup.

Written by Quassnoi

August 24th, 2009 at 11:00 pm

Posted in SQL Server

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

EXPLAIN EXTENDED

SQL Server: leading wildcard match using an index

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow