EXPLAIN EXTENDED

How to create fast database queries

Indexing VARCHAR(MAX)

with 6 comments

SQL Server 2005 introduced new datatype: VARCHAR(MAX).

This is a replacement of old TEXT that can do anything that TEXT could, and something that TEXT could not.

Something includes possibility of being queried and compared like a plain VARCHAR. A really nice feature.

SELECT  *
FROM    [20090501_max].t_bigdata
WHERE   value LIKE 'lorem ipsum dolor sit%'

Unfortunately, VARCHAR(MAX) is still far from a plain VARCHAR. For instance, it cannot serve as an index key. It can be a part of an index (the column is included to the index leaves to avoid table lookups when using the index), but not the key of an index (the index leaves are not sorted on this column).

This means that the queries using LIKE predicate on this column will be slow.

Let's create a sample table:

Table creation details

and see how it performs:

SELECT  id, SUBSTRING(value, 1, 50) AS value
FROM    [20090501_max].t_bigdata
WHERE   value LIKE 'lorem ipsum dolor sit%'

id value
10355 lorem ipsum dolor sit elit elit lorem do ipsum ame
33549 lorem ipsum dolor sit ipsum sed dolor elit dolor a
38052 lorem ipsum dolor sit sed do amet adipisicing adip
55940 lorem ipsum dolor sit adipisicing sed elit dolor d
37941 lorem ipsum dolor sit amet sed ipsum sit elit dolo
60576 lorem ipsum dolor sit elit dolor lorem lorem sit a
60626 lorem ipsum dolor sit elit sit ipsum adipisicing d
7 rows fetched in 0.0004s (6.8593s)
Table 't_bigdata'. Scan count 3, logical reads 55077, physical reads 95, read-ahead reads 31457, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 531 ms,  elapsed time = 6854 ms. 

Since VARCHAR(MAX) cannot be indexed, and full table scan is very expensive on such a large table, the query runs for almost 7 seconds.

Fortunately, we can create a coarse filter for VARCHAR(MAX) by using a computed column.

Along the lines of the table creation script you can see:

CREATE SCHEMA [20090501_max]
CREATE TABLE t_bigdata (
id INT NOT NULL PRIMARY KEY,
value NVARCHAR(MAX),
value_index AS CAST(value AS NVARCHAR(450))
)
GO
CREATE INDEX IX_bigdata_value ON [20090501_max].t_bigdata(value_index)
GO

Here, value_index is a helper column. Its value is just a value truncated to NVARCHAR(450) (each NVARCHAR symbol takes 2 bytes, and 900 bytes is the maximum length of an index key in SQL Server).

This columpted column can be indexed as well, and an index is in fact created on this column, as you can see in the script above.

Now, let's try to use the index. To do this, we'll need to add one more predicate into the query:

SELECT  id, SUBSTRING(value, 1, 50) AS value
FROM    [20090501_max].t_bigdata
WHERE   value LIKE 'lorem ipsum dolor sit%'
AND value_index LIKE 'lorem ipsum dolor sit%'

What did we do here?

We added one more filtering condition into the query. This condition checks if the computed column is also LIKE the expression we are looking for.

The computed column is indexable and the index is created on it. That means that this extra condition, though logically redundant, can be used in an INDEX SEEK, efficiently filtering out those records that do not start with lorem ipsum dolor sit.

Let's see how it works:

id value
55940 lorem ipsum dolor sit adipisicing sed elit dolor d
37941 lorem ipsum dolor sit amet sed ipsum sit elit dolo
10355 lorem ipsum dolor sit elit elit lorem do ipsum ame
33549 lorem ipsum dolor sit ipsum sed dolor elit dolor a
60576 lorem ipsum dolor sit elit dolor lorem lorem sit a
60626 lorem ipsum dolor sit elit sit ipsum adipisicing d
38052 lorem ipsum dolor sit sed do amet adipisicing adip
7 rows fetched in 0.0003s (0.0015s)
Table 't_bigdata'. Scan count 1, logical reads 31, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms. 

Same resultset, but so much faster.

Written by Quassnoi

May 1st, 2009 at 11:00 pm

Posted in SQL Server

6 Responses to 'Indexing VARCHAR(MAX)'

Subscribe to comments with RSS

  1. Hi Quassnoi

    Will this only impact “Like” queries or would it also speed up “=” queries?

    Thanks

    Carl Bartlett

    13 May 13 at 19:50

  2. @Carl: yes it would, but you should compare value_index against the truncated string as well if the latter exceeds 450 characters.

    Quassnoi

    13 May 13 at 19:52

  3. This post carries some merit for LIKE queries on the (MAX) datatype column, but one must be very careful when implementing solutions like this coarse filter, as you will undeniably omit some rows from your result set if you need a wildcard at the beginning of your search string as well as the end.

    WHERE value LIKE ‘lorem ipsum dolor sit%’
    AND value_index LIKE ‘lorem ipsum dolor sit%’

    This works because the truncation of the LOB field occurs at the end of the string, and we are searching from the beginning moving forward only(wildcard at the trailing end only). If we were to do the following:

    WHERE value LIKE ‘%lorem ipsum dolor sit%’
    AND value_index LIKE ‘%lorem ipsum dolor sit%’

    …we would almost assuredly have different rowcounts between the value column and the value_index and have to compare back to the LOB field to get the full result set. Otherwise, using both predicates will omit some rows, as there is a very good chance that the pattern ‘lorem ipsum dolor sit’ may occur somewhere in the truncated data omitted from the 450 length value_index column.

    Still, great article and 100% accurate for the scenario depicted above.

    Crist Zimmerman

    4 Dec 14 at 19:04

  4. @Crist: indexes are pretty useless for infix searching anyway. At best, a covering index would help for such a scenario, but you can include a VARCHAR(MAX) column into a covering index as a non-key right as it is in all its entirety.

    Quassnoi

    5 Dec 14 at 00:40

  5. Very Insightful. And, worthy of extra credit.

    Daniel Adeniji

    10 Apr 15 at 07:33

  6. this helped me a ton. thanks!

    kevin

    2 Sep 20 at 23:21

Leave a Reply