SQL Server: leading wildcard match using an index
Comments enabled. I *really* need your comment
From Stack Overflow:
I'm joining to a table dozens of different times, and every time, I join (or filter) based on the results of a
SUBSTRING
of one of the columns (it's a string, but left-padded with zeros, and I don't care about the last four digits).As a result, even though this column is indexed and my query would use the index, it does a table scan because the
SUBSTRING
itself isn't indexed, so SQL Server has to calculate it for every row before as it's joining.I'm looking for any ideas on how to speed up this process.
This problem is called leading wildcard match. Though it's possible to implement it, it usually requires indexing all possible combinations of substrings which usually bloats the index.
However, in this very case it's possible to create quite an efficient index. It is possible because the number of characters we need to trim from the right is fixed.
The main idea is to create a computed column which would store trimmed and reversed value of the string, index that column and use this index in search.
Let's create a sample table:
This table contains 100,000 randomly generated four character strings, prepended with a random number of zeros (from 0 to 9) and appended with 4 randomly chosen decimal digits.
We created a computed column on the following condition: RIGHT(REVERSE(string), LEN(string) - 4)
It strips last four characters from the string
and reverses the remaining string.
Now, let's try to find the string which would match the search condition BCDE
:
SELECT id, string FROM [20090824_prefix].t_string WHERE LEFT(string, LEN(string) - 4) LIKE '%BCDE'
This is quite long, 340 ms.
Now, let's try the same query using an index. We should just take one thing into account: since we have reversed the string to build the index, we should reverse the search condition just as well.
Here's the query:
SELECT id, string FROM [20090824_prefix].t_string WHERE rstring LIKE 'EDCB%'
This is instant: 1 ms, just a single index lookup.