From Stack Overflow:
I'm joining to a table dozens of different times, and every time, I join (or filter) based on the results of a
SUBSTRINGof one of the columns (it's a string, but left-padded with zeros, and I don't care about the last four digits).
As a result, even though this column is indexed and my query would use the index, it does a table scan because the
SUBSTRINGitself isn't indexed, so SQL Server has to calculate it for every row before as it's joining.
I'm looking for any ideas on how to speed up this process.
This problem is called leading wildcard match. Though it's possible to implement it, it usually requires indexing all possible combinations of substrings which usually bloats the index.
However, in this very case it's possible to create quite an efficient index. It is possible because the number of characters we need to trim from the right is fixed.
The main idea is to create a computed column which would store trimmed and reversed value of the string, index that column and use this index in search.
Let's create a sample table:
CREATE SCHEMA [20090824_prefix] CREATE TABLE [20090824_prefix].t_string ( id INT NOT NULL PRIMARY KEY, string VARCHAR(100) NOT NULL, rstring AS RIGHT(REVERSE(string), LEN(string) - 4) ) GO CREATE INDEX IX_string_rstring ON [20090824_prefix].t_string(rstring) GO BEGIN TRANSACTION SELECT RAND(20090824) DECLARE @cnt INT DECLARE @plen INT DECLARE @tval CHAR(4) DECLARE @nval CHAR(4) SET @cnt = 1 WHILE @cnt <= 100000 BEGIN SET @plen = CAST(RAND() * 10 AS INTEGER) SET @tval = RIGHT('0000' + CAST(RAND() * 10000 AS INTEGER), 4) SET @nval = '' WHILE LEN(@nval) < 4 BEGIN SET @nval = CHAR(65 + RAND() * 26) + @nval END INSERT INTO [20090824_prefix].t_string VALUES (@cnt, REPLICATE('0', @plen) + @nval + @tval) SET @cnt = @cnt + 1 END COMMIT
This table contains 100,000 randomly generated four character strings, prepended with a random number of zeros (from 0 to 9) and appended with 4 randomly chosen decimal digits.
We created a computed column on the following condition:
RIGHT(REVERSE(string), LEN(string) - 4)
It strips last four characters from the
string and reverses the remaining string.
Now, let's try to find the string which would match the search condition
SELECT id, string FROM [20090824_prefix].t_string WHERE LEFT(string, LEN(string) - 4) LIKE '%BCDE'
|1 row fetched in 0.0002s (0.3421s)|
Table 't_string'. Scan count 1, logical reads 541, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 344 ms, elapsed time = 342 ms.
SELECT Clustered Index Scan ([test].[20090824_prefix].[t_string])
This is quite long, 340 ms.
Now, let's try the same query using an index. We should just take one thing into account: since we have reversed the string to build the index, we should reverse the search condition just as well.
Here's the query:
SELECT id, string FROM [20090824_prefix].t_string WHERE rstring LIKE 'EDCB%'
|1 row fetched in 0.0002s (0.0007s)|
Table 't_string'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.
SELECT Nested Loops (Inner Join) Index Seek ([test].[20090824_prefix].[t_string].[IX_string_rstring]) Key Lookup ([test].[20090824_prefix].[t_string])
This is instant: 1 ms, just a single index lookup.