Continuing the series:
What is better to store hierarchical data: nested sets model or adjacency list (parent-child) model?
For detailed explanations of the terms, see the first article in the series:
Now, let’s see what’s better for SQL Server.
We will create a single table that holds both adjacency list data and nested sets data, with 8 levels of nesting, 5 children of each parent node and 2,441,405 records:
Table creation details
CREATE SCHEMA [20090925_nested]
CREATE TABLE [20090925_nested].t_hierarchy (
id INT NOT NULL PRIMARY KEY,
parent INT NOT NULL,
lft INT NOT NULL,
rgt INT NOT NULL,
data VARCHAR(100) NOT NULL,
stuffing VARCHAR(100) NOT NULL
)
GO
WITH nums AS
(
SELECT 1 AS num
UNION ALL
SELECT num + 1
FROM nums
WHERE num < 5
),
q AS
(
SELECT num AS id, 0 AS parent,
1 + 585937 * (num - 1) AS lft,
1 + 585937 * num - 1 AS rgt,
FLOOR(585937 / 5) AS width
FROM nums
UNION ALL
SELECT id * 5 + num, id,
1 + lft + width * (num - 1),
1 + lft + width * num - 1,
FLOOR(width / 5)
FROM q
CROSS JOIN
nums
WHERE width > 1
)
INSERT
INTO [20090925_nested].t_hierarchy
SELECT id, parent, lft, rgt, 'Value ' + CAST(id AS VARCHAR), REPLICATE('*', 100)
FROM q
CREATE INDEX IX_hierarchy_parent ON [20090925_nested].t_hierarchy (parent)
CREATE INDEX IX_hierarchy_lft ON [20090925_nested].t_hierarchy (lft)
CREATE INDEX IX_hierarchy_rgt ON [20090925_nested].t_hierarchy (rgt)
We will test the performance of 3 most common operations:
- Find all descendants of a given node
- Find all ancestors of a given node
- Find all descendants of a given node up to a certain depth
All descendants
We will select all descendants of a given node (which is close to the root node and has lots of descendants).
Since there are more than 100,000 descendants, we will select the aggregated length of the stuffing columns rather than a huge resultset of values. All rows are required to be visited anyway, so this is quite good performance test.
Nested sets
SELECT SUM(LEN(hc.stuffing))
FROM [20090925_nested].t_hierarchy hp
JOIN [20090925_nested].t_hierarchy hc
ON hc.lft BETWEEN hp.lft AND hp.rgt
WHERE hp.id = 42
View query details
|
| 1953100 |
| 1 row fetched in 0.0002s (0.0816s) |
Table 't_hierarchy'. Scan count 1, logical reads 59856, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 81 ms.
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [globalagg1007]=(0) THEN NULL ELSE [globalagg1009] END))
|--Stream Aggregate(DEFINE:([globalagg1007]=SUM([partialagg1006]), [globalagg1009]=SUM([partialagg1008])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1006]=COUNT_BIG([Expr1005]), [partialagg1008]=SUM([Expr1005])))
|--Compute Scalar(DEFINE:([Expr1005]=len([test].[20090925_nested].[t_hierarchy].[stuffing] as [hc].[stuffing])))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id], [Expr1011]) WITH UNORDERED PREFETCH)
|--Sort(ORDER BY:([hc].[id] ASC))
| |--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[lft], [hp].[rgt]))
| |--Parallelism(Distribute Streams, RoundRobin Partitioning)
| | |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hp]), SEEK:([hp].[id]=(42)) ORDERED FORWARD)
| |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_lft] AS [hc]), SEEK:([hc].[lft] >= [test].[20090925_nested].[t_hierarchy].[lft] as [hp].[lft] AND [hc].[lft] <= [test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
Nested sets model is good for this query, since it required only a single range scan on lft to get all descendants.
This query completes in 81 ms.
Adjacency list
WITH q AS
(
SELECT id, stuffing
FROM [20090925_nested].t_hierarchy h
WHERE id = 42
UNION ALL
SELECT hc.id, hc.stuffing
FROM q
JOIN [20090925_nested].t_hierarchy hc
ON hc.parent = q.id
)
SELECT SUM(LEN(stuffing))
FROM q
View query details
|
| 1953100 |
| 1 row fetched in 0.0002s (0.5312s) |
Table 'Worktable'. Scan count 2, logical reads 117188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_hierarchy'. Scan count 19531, logical reads 117218, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 500 ms, elapsed time = 531 ms.
|--Compute Scalar(DEFINE:([Expr1008]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019] END))
|--Stream Aggregate(DEFINE:([Expr1018]=COUNT_BIG(len([Recr1007])), [Expr1019]=SUM(len([Recr1007]))))
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1014]=(0)))
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [h]), SEEK:([h].[id]=(42)) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1016]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1016], [Recr1002], [Recr1003]))
|--Compute Scalar(DEFINE:([Expr1016]=[Expr1015]+(1)))
| |--Table Spool(WITH STACK)
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id]) OPTIMIZED)
|--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_parent] AS [hc]), SEEK:([hc].[parent]=[Recr1002]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
This query makes several range scans (one per node) using recursion.
These ranges finally sum up to the same set of values (of course), however, the parents of descendants are not contiguous and no single range can select them all. That's why this query is less efficient than nested sets and runs for 531 ms, or about 6 times as long.
All ancestors
Nested sets
SELECT hp.id, hp.parent, hp.lft, hp.rgt, hp.data
FROM [20090925_nested].t_hierarchy hc
JOIN [20090925_nested].t_hierarchy hp
ON hc.lft BETWEEN hp.lft AND hp.rgt
WHERE hc.id = 1000000
ORDER BY
hp.lft
View query details
| id |
parent |
lft |
rgt |
data |
| 2 |
0 |
585938 |
1171874 |
Value 2 |
| 12 |
2 |
703126 |
820312 |
Value 12 |
| 63 |
12 |
750001 |
773437 |
Value 63 |
| 319 |
63 |
764063 |
768749 |
Value 319 |
| 1599 |
319 |
766875 |
767811 |
Value 1599 |
| 7999 |
1599 |
767437 |
767623 |
Value 7999 |
| 39999 |
7999 |
767549 |
767585 |
Value 39999 |
| 199999 |
39999 |
767571 |
767577 |
Value 199999 |
| 1000000 |
199999 |
767576 |
767576 |
Value 1000000 |
| 9 rows fetched in 0.0006s (16.8588s) |
Table 't_hierarchy'. Scan count 1, logical reads 2036352, physical reads 10716, read-ahead reads 1991, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4390 ms, elapsed time = 16856 ms.
|--Parallelism(Gather Streams, ORDER BY:([hp].[lft] ASC))
|--Sort(ORDER BY:([hp].[lft] ASC))
|--Filter(WHERE:([test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]<=[test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[id], [Expr1004]) WITH UNORDERED PREFETCH)
|--Sort(ORDER BY:([hp].[id] ASC))
| |--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[lft]))
| |--Parallelism(Distribute Streams, RoundRobin Partitioning)
| | |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=(1000000)) ORDERED FORWARD)
| |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_lft] AS [hp]), SEEK:([hp].[lft] <= [test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hp]), SEEK:([hp].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hp].[id]) LOOKUP ORDERED FORWARD)
This is extremely inefficient: 16 seconds.
Unlike PostgreSQL, SQL Server is not capable of building in-memory bitmaps from the index results. It just has to seek the index and join. And the join condition used here is not an equality, thus making all join methods except NESTED LOOPS impossible to use.
All this results in a very poor performance which is far beyond that of even a marginally usable query.
Adjacency list
WITH q AS
(
SELECT h.*, 1 AS level
FROM [20090925_nested].t_hierarchy h
WHERE id = 1000000
UNION ALL
SELECT hp.*, level + 1
FROM q
JOIN [20090925_nested].t_hierarchy hp
ON hp.id = q.parent
)
SELECT id, parent, lft, rgt, data
FROM q
ORDER BY
level DESC
View query details
| id |
parent |
lft |
rgt |
data |
| 2 |
0 |
585938 |
1171874 |
Value 2 |
| 12 |
2 |
703126 |
820312 |
Value 12 |
| 63 |
12 |
750001 |
773437 |
Value 63 |
| 319 |
63 |
764063 |
768749 |
Value 319 |
| 1599 |
319 |
766875 |
767811 |
Value 1599 |
| 7999 |
1599 |
767437 |
767623 |
Value 7999 |
| 39999 |
7999 |
767549 |
767585 |
Value 39999 |
| 199999 |
39999 |
767571 |
767577 |
Value 199999 |
| 1000000 |
199999 |
767576 |
767576 |
Value 1000000 |
| 9 rows fetched in 0.0006s (0.0012s) |
Table 'Worktable'. Scan count 2, logical reads 56, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_hierarchy'. Scan count 0, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
|--Sort(ORDER BY:([Recr1019] DESC))
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1020]=(0)))
| |--Compute Scalar(DEFINE:([Expr1002]=(1)))
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [h]), SEEK:([h].[id]=(1000000)) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1022]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1022], [Recr1003], [Recr1004], [Recr1005], [Recr1006], [Recr1007], [Recr1008], [Recr1009]))
|--Compute Scalar(DEFINE:([Expr1022]=[Expr1021]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1012]=[Recr1009]+(1)))
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hp]), SEEK:([hp].[id]=[Recr1004]) ORDERED FORWARD)
Now, this query is a boon for adjacency list model. There are 9 ancestors that can be retrieved in 9 clustered index seeks and these 9 values then need to be sorted. SQL Server eats such queries for breakfast.
This query completes in 1 ms which is the range of time measurement error in SQL Server statistics.
In other words, this query is very very fast.
Descendants up to a given level
Since the performance on this task may depend on the number of the descendants, we will run 2 queries for each model.
This first query selects descendants of a node that is close to the root (node 42), the second one selects those of node 31,415 which is further from the root.
Nested sets
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM [20090925_nested].t_hierarchy hp
JOIN [20090925_nested].t_hierarchy hc
ON hc.lft BETWEEN hp.lft AND hp.rgt
WHERE hp.id = ?
AND
(
SELECT COUNT(*)
FROM [20090925_nested].t_hierarchy hn
WHERE hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) <= 3
View results for node 42
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM [20090925_nested].t_hierarchy hp
JOIN [20090925_nested].t_hierarchy hc
ON hc.lft BETWEEN hp.lft AND hp.rgt
WHERE hp.id = 42
AND
(
SELECT COUNT(*)
FROM [20090925_nested].t_hierarchy hn
WHERE hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) <= 3
| id |
parent |
lft |
rgt |
data |
| 42 |
8 |
257814 |
281250 |
Value 42 |
| 211 |
42 |
257815 |
262501 |
Value 211 |
| 212 |
42 |
262502 |
267188 |
Value 212 |
| 213 |
42 |
267189 |
271875 |
Value 213 |
| 214 |
42 |
271876 |
276562 |
Value 214 |
| 215 |
42 |
276563 |
281249 |
Value 215 |
| 1056 |
211 |
257816 |
258752 |
Value 1056 |
| 1057 |
211 |
258753 |
259689 |
Value 1057 |
| 1058 |
211 |
259690 |
260626 |
Value 1058 |
| 1059 |
211 |
260627 |
261563 |
Value 1059 |
| 1060 |
211 |
261564 |
262500 |
Value 1060 |
| 1061 |
212 |
262503 |
263439 |
Value 1061 |
| 1062 |
212 |
263440 |
264376 |
Value 1062 |
| 1063 |
212 |
264377 |
265313 |
Value 1063 |
| 1064 |
212 |
265314 |
266250 |
Value 1064 |
| 1065 |
212 |
266251 |
267187 |
Value 1065 |
| 1066 |
213 |
267190 |
268126 |
Value 1066 |
| 1067 |
213 |
268127 |
269063 |
Value 1067 |
| 1068 |
213 |
269064 |
270000 |
Value 1068 |
| 1069 |
213 |
270001 |
270937 |
Value 1069 |
| 1070 |
213 |
270938 |
271874 |
Value 1070 |
| 1071 |
214 |
271877 |
272813 |
Value 1071 |
| 1072 |
214 |
272814 |
273750 |
Value 1072 |
| 1073 |
214 |
273751 |
274687 |
Value 1073 |
| 1074 |
214 |
274688 |
275624 |
Value 1074 |
| 1075 |
214 |
275625 |
276561 |
Value 1075 |
| 1076 |
215 |
276564 |
277500 |
Value 1076 |
| 1077 |
215 |
277501 |
278437 |
Value 1077 |
| 1078 |
215 |
278438 |
279374 |
Value 1078 |
| 1079 |
215 |
279375 |
280311 |
Value 1079 |
| 1080 |
215 |
280312 |
281248 |
Value 1080 |
| 31 rows fetched in 0.0000s (103.1061s) |
Table 't_hierarchy'. Scan count 2, logical reads 119875, physical reads 0, read-ahead reads 56945, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 19531, logical reads 7970942, physical reads 1017, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 71750 ms, elapsed time = 103096 ms.
|--Parallelism(Gather Streams)
|--Filter(WHERE:([Expr1006]<=(3)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[lft], [hp].[rgt], [hc].[lft]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id], [Expr1010]) WITH UNORDERED PREFETCH)
| |--Sort(ORDER BY:([hc].[id] ASC))
| | |--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[lft], [hp].[rgt]))
| | |--Parallelism(Distribute Streams, RoundRobin Partitioning)
| | | |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hp]), SEEK:([hp].[id]=(42)) ORDERED FORWARD)
| | |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_lft] AS [hc]), SEEK:([hc].[lft] >= [test].[20090925_nested].[t_hierarchy].[lft] as [hp].[lft] AND [hc].[lft] <= [test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]) ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream Aggregate(DEFINE:([Expr1011]=Count(*)))
|--Filter(WHERE:([test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]<=[test].[20090925_nested].[t_hierarchy].[rgt] as [hn].[rgt] AND [test].[20090925_nested].[t_hierarchy].[lft] as [hn].[lft]<=[test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]))
|--Index Spool(SEEK:([hn].[lft] >= [test].[20090925_nested].[t_hierarchy].[lft] as [hp].[lft] AND [hn].[lft] <= [test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]))
|--Clustered Index Scan(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hn]))
View results for node 31,415
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data
FROM [20090925_nested].t_hierarchy hp
JOIN [20090925_nested].t_hierarchy hc
ON hc.lft BETWEEN hp.lft AND hp.rgt
WHERE hp.id = 31415
AND
(
SELECT COUNT(*)
FROM [20090925_nested].t_hierarchy hn
WHERE hc.lft BETWEEN hn.lft AND hn.rgt
AND hn.lft BETWEEN hp.lft AND hp.rgt
) <= 3
| id |
parent |
lft |
rgt |
data |
| 31415 |
6282 |
445651 |
445687 |
Value 31415 |
| 157076 |
31415 |
445652 |
445658 |
Value 157076 |
| 157077 |
31415 |
445659 |
445665 |
Value 157077 |
| 157078 |
31415 |
445666 |
445672 |
Value 157078 |
| 157079 |
31415 |
445673 |
445679 |
Value 157079 |
| 157080 |
31415 |
445680 |
445686 |
Value 157080 |
| 785381 |
157076 |
445653 |
445653 |
Value 785381 |
| 785382 |
157076 |
445654 |
445654 |
Value 785382 |
| 785383 |
157076 |
445655 |
445655 |
Value 785383 |
| 785384 |
157076 |
445656 |
445656 |
Value 785384 |
| 785385 |
157076 |
445657 |
445657 |
Value 785385 |
| 785386 |
157077 |
445660 |
445660 |
Value 785386 |
| 785387 |
157077 |
445661 |
445661 |
Value 785387 |
| 785388 |
157077 |
445662 |
445662 |
Value 785388 |
| 785389 |
157077 |
445663 |
445663 |
Value 785389 |
| 785390 |
157077 |
445664 |
445664 |
Value 785390 |
| 785391 |
157078 |
445667 |
445667 |
Value 785391 |
| 785392 |
157078 |
445668 |
445668 |
Value 785392 |
| 785393 |
157078 |
445669 |
445669 |
Value 785393 |
| 785394 |
157078 |
445670 |
445670 |
Value 785394 |
| 785395 |
157078 |
445671 |
445671 |
Value 785395 |
| 785396 |
157079 |
445674 |
445674 |
Value 785396 |
| 785397 |
157079 |
445675 |
445675 |
Value 785397 |
| 785398 |
157079 |
445676 |
445676 |
Value 785398 |
| 785399 |
157079 |
445677 |
445677 |
Value 785399 |
| 785400 |
157079 |
445678 |
445678 |
Value 785400 |
| 785401 |
157080 |
445681 |
445681 |
Value 785401 |
| 785402 |
157080 |
445682 |
445682 |
Value 785402 |
| 785403 |
157080 |
445683 |
445683 |
Value 785403 |
| 785404 |
157080 |
445684 |
445684 |
Value 785404 |
| 785405 |
157080 |
445685 |
445685 |
Value 785405 |
| 31 rows fetched in 0.0000s (38.1238s) |
Table 't_hierarchy'. Scan count 2, logical reads 60129, physical reads 4, read-ahead reads 56653, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 31, logical reads 7268474, physical reads 1274, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 11235 ms, elapsed time = 38121 ms.
|--Parallelism(Gather Streams)
|--Filter(WHERE:([Expr1006]<=(3)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[lft], [hp].[rgt], [hc].[lft]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id], [Expr1010]) WITH UNORDERED PREFETCH)
| |--Sort(ORDER BY:([hc].[id] ASC))
| | |--Nested Loops(Inner Join, OUTER REFERENCES:([hp].[lft], [hp].[rgt]))
| | |--Parallelism(Distribute Streams, RoundRobin Partitioning)
| | | |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hp]), SEEK:([hp].[id]=(31415)) ORDERED FORWARD)
| | |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_lft] AS [hc]), SEEK:([hc].[lft] >= [test].[20090925_nested].[t_hierarchy].[lft] as [hp].[lft] AND [hc].[lft] <= [test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]) ORDERED FORWARD)
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream Aggregate(DEFINE:([Expr1011]=Count(*)))
|--Filter(WHERE:([test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]<=[test].[20090925_nested].[t_hierarchy].[rgt] as [hn].[rgt] AND [test].[20090925_nested].[t_hierarchy].[lft] as [hn].[lft]<=[test].[20090925_nested].[t_hierarchy].[rgt] as [hp].[rgt]))
|--Index Spool(SEEK:([hn].[lft] >= [test].[20090925_nested].[t_hierarchy].[lft] as [hp].[lft] AND [hn].[lft] <= [test].[20090925_nested].[t_hierarchy].[lft] as [hc].[lft]))
|--Clustered Index Scan(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hn]))
These queries complete in 108 seconds and 38 seconds, respectively. Both these queries use nested loops with a full table scan which doesn't add to performance. This makes the queries unusable.
Adjacency list
WITH q AS
(
SELECT id, parent, lft, rgt, data, 1 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [20090925_nested].t_hierarchy hc
WHERE id = ?
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [20090925_nested].t_hierarchy hc
ON hc.parent = q.id
WHERE level < 3
)
SELECT id, parent, lft, rgt, data
FROM q
ORDER BY
bc
The field bc defines breadcrumbs
which can be used to return the nodes in tree order.
In PostgreSQL 8.4, we used an array for that.
SQL Server lacks array type, that's why we need to use a hack: converting the id's into zero-padded strings of fixed length (10 characters) and concatenating them.
Here are the query results:
View results for node 42
WITH q AS
(
SELECT id, parent, lft, rgt, data, 1 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [20090925_nested].t_hierarchy hc
WHERE id = 42
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [20090925_nested].t_hierarchy hc
ON hc.parent = q.id
WHERE level < 3
)
SELECT id, parent, lft, rgt, data
FROM q
ORDER BY
bc
| id |
parent |
lft |
rgt |
data |
| 42 |
8 |
257814 |
281250 |
Value 42 |
| 211 |
42 |
257815 |
262501 |
Value 211 |
| 1056 |
211 |
257816 |
258752 |
Value 1056 |
| 1057 |
211 |
258753 |
259689 |
Value 1057 |
| 1058 |
211 |
259690 |
260626 |
Value 1058 |
| 1059 |
211 |
260627 |
261563 |
Value 1059 |
| 1060 |
211 |
261564 |
262500 |
Value 1060 |
| 212 |
42 |
262502 |
267188 |
Value 212 |
| 1061 |
212 |
262503 |
263439 |
Value 1061 |
| 1062 |
212 |
263440 |
264376 |
Value 1062 |
| 1063 |
212 |
264377 |
265313 |
Value 1063 |
| 1064 |
212 |
265314 |
266250 |
Value 1064 |
| 1065 |
212 |
266251 |
267187 |
Value 1065 |
| 213 |
42 |
267189 |
271875 |
Value 213 |
| 1066 |
213 |
267190 |
268126 |
Value 1066 |
| 1067 |
213 |
268127 |
269063 |
Value 1067 |
| 1068 |
213 |
269064 |
270000 |
Value 1068 |
| 1069 |
213 |
270001 |
270937 |
Value 1069 |
| 1070 |
213 |
270938 |
271874 |
Value 1070 |
| 214 |
42 |
271876 |
276562 |
Value 214 |
| 1071 |
214 |
271877 |
272813 |
Value 1071 |
| 1072 |
214 |
272814 |
273750 |
Value 1072 |
| 1073 |
214 |
273751 |
274687 |
Value 1073 |
| 1074 |
214 |
274688 |
275624 |
Value 1074 |
| 1075 |
214 |
275625 |
276561 |
Value 1075 |
| 215 |
42 |
276563 |
281249 |
Value 215 |
| 1076 |
215 |
276564 |
277500 |
Value 1076 |
| 1077 |
215 |
277501 |
278437 |
Value 1077 |
| 1078 |
215 |
278438 |
279374 |
Value 1078 |
| 1079 |
215 |
279375 |
280311 |
Value 1079 |
| 1080 |
215 |
280312 |
281248 |
Value 1080 |
| 31 rows fetched in 0.0018s (0.0023s) |
Table 'Worktable'. Scan count 2, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_hierarchy'. Scan count 6, logical reads 111, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
|--Sort(ORDER BY:([Recr1021] ASC))
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1023]=(0)))
| |--Compute Scalar(DEFINE:([Expr1002]=(1)))
| |--Compute Scalar(DEFINE:([Expr1003]=CONVERT(varchar(30),substring(CONVERT(varchar(30),[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id],0)+'0000000000',(1),(10)),0)))
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=(42)) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1025]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1025], [Recr1004], [Recr1005], [Recr1006], [Recr1007], [Recr1008], [Recr1009], [Recr1010]))
|--Compute Scalar(DEFINE:([Expr1025]=[Expr1024]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1013]=[Recr1009]+(1), [Expr1014]=CONVERT(varchar(30),([Recr1010]+'.')+[Expr1022],0)))
|--Compute Scalar(DEFINE:([Expr1022]=substring(CONVERT(varchar(30),[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id],0)+'0000000000',(1),(10))))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id]) OPTIMIZED)
|--Filter(WHERE:(STARTUP EXPR([Recr1009]<(3))))
| |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_parent] AS [hc]), SEEK:([hc].[parent]=[Recr1004]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
View results for node 31,415
WITH q AS
(
SELECT id, parent, lft, rgt, data, 1 AS level,
CAST(LEFT(CAST(id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR) AS bc
FROM [20090925_nested].t_hierarchy hc
WHERE id = 42
UNION ALL
SELECT hc.id, hc.parent, hc.lft, hc.rgt, hc.data, level + 1,
CAST(bc + '.' + LEFT(CAST(hc.id AS VARCHAR) + REPLICATE('0', 10), 10) AS VARCHAR)
FROM q
JOIN [20090925_nested].t_hierarchy hc
ON hc.parent = q.id
WHERE level < 3
)
SELECT id, parent, lft, rgt, data
FROM q
ORDER BY
bc
| id |
parent |
lft |
rgt |
data |
| 42 |
8 |
257814 |
281250 |
Value 42 |
| 211 |
42 |
257815 |
262501 |
Value 211 |
| 1056 |
211 |
257816 |
258752 |
Value 1056 |
| 1057 |
211 |
258753 |
259689 |
Value 1057 |
| 1058 |
211 |
259690 |
260626 |
Value 1058 |
| 1059 |
211 |
260627 |
261563 |
Value 1059 |
| 1060 |
211 |
261564 |
262500 |
Value 1060 |
| 212 |
42 |
262502 |
267188 |
Value 212 |
| 1061 |
212 |
262503 |
263439 |
Value 1061 |
| 1062 |
212 |
263440 |
264376 |
Value 1062 |
| 1063 |
212 |
264377 |
265313 |
Value 1063 |
| 1064 |
212 |
265314 |
266250 |
Value 1064 |
| 1065 |
212 |
266251 |
267187 |
Value 1065 |
| 213 |
42 |
267189 |
271875 |
Value 213 |
| 1066 |
213 |
267190 |
268126 |
Value 1066 |
| 1067 |
213 |
268127 |
269063 |
Value 1067 |
| 1068 |
213 |
269064 |
270000 |
Value 1068 |
| 1069 |
213 |
270001 |
270937 |
Value 1069 |
| 1070 |
213 |
270938 |
271874 |
Value 1070 |
| 214 |
42 |
271876 |
276562 |
Value 214 |
| 1071 |
214 |
271877 |
272813 |
Value 1071 |
| 1072 |
214 |
272814 |
273750 |
Value 1072 |
| 1073 |
214 |
273751 |
274687 |
Value 1073 |
| 1074 |
214 |
274688 |
275624 |
Value 1074 |
| 1075 |
214 |
275625 |
276561 |
Value 1075 |
| 215 |
42 |
276563 |
281249 |
Value 215 |
| 1076 |
215 |
276564 |
277500 |
Value 1076 |
| 1077 |
215 |
277501 |
278437 |
Value 1077 |
| 1078 |
215 |
278438 |
279374 |
Value 1078 |
| 1079 |
215 |
279375 |
280311 |
Value 1079 |
| 1080 |
215 |
280312 |
281248 |
Value 1080 |
| 31 rows fetched in 0.0018s (0.0023s) |
Table 'Worktable'. Scan count 2, logical reads 188, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 't_hierarchy'. Scan count 6, logical reads 111, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.
|--Sort(ORDER BY:([Recr1021] ASC))
|--Index Spool(WITH STACK)
|--Concatenation
|--Compute Scalar(DEFINE:([Expr1023]=(0)))
| |--Compute Scalar(DEFINE:([Expr1002]=(1)))
| |--Compute Scalar(DEFINE:([Expr1003]=CONVERT(varchar(30),substring(CONVERT(varchar(30),[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id],0)+'0000000000',(1),(10)),0)))
| |--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=(42)) ORDERED FORWARD)
|--Assert(WHERE:(CASE WHEN [Expr1025]>(100) THEN (0) ELSE NULL END))
|--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1025], [Recr1004], [Recr1005], [Recr1006], [Recr1007], [Recr1008], [Recr1009], [Recr1010]))
|--Compute Scalar(DEFINE:([Expr1025]=[Expr1024]+(1)))
| |--Table Spool(WITH STACK)
|--Compute Scalar(DEFINE:([Expr1013]=[Recr1009]+(1), [Expr1014]=CONVERT(varchar(30),([Recr1010]+'.')+[Expr1022],0)))
|--Compute Scalar(DEFINE:([Expr1022]=substring(CONVERT(varchar(30),[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id],0)+'0000000000',(1),(10))))
|--Nested Loops(Inner Join, OUTER REFERENCES:([hc].[id]) OPTIMIZED)
|--Filter(WHERE:(STARTUP EXPR([Recr1009]<(3))))
| |--Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[IX_hierarchy_parent] AS [hc]), SEEK:([hc].[parent]=[Recr1004]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([test].[20090925_nested].[t_hierarchy].[PK__t_hierarchy__49EEDF40] AS [hc]), SEEK:([hc].[id]=[test].[20090925_nested].[t_hierarchy].[id] as [hc].[id]) LOOKUP ORDERED FORWARD)
Both queries use the same efficient plans (which basically include 31 index seeks and ordering of the values fetched) and complete in 2 ms, which may be considered instant.
Summary
We made an SQL Server table combining two popular models for hierarchical data: adjacency list and nested sets and tested three most used queries against these data:
- Find all descendants of a given node
- Find all ancestors of a given node
- Find all descendants of a given node up to a certain depth
As in the similar test for PostgreSQL 8.4 (conducted in the previous article), nested sets model has shown a little performance benefit for the first query (finding all descendants).
However, SQL Server, unlike PostgreSQL, is not able to build in-memory bitmaps which makes nested loops (required for the queries 2 and 3 against the nested sets model) very inefficient.
This makes the nested sets model very slow for queries 2 and 3 in SQL Server. More or less significant number of records in the table makes these queries too slow to be of practical use.
Adjacency list, on the contrary, shows decent performance for the query 1 and amazing performance for the queries 2 and 3.
This fact, along with the simplicity of adjacency list implementation and management, make adjacency list a preferred way to store hierarchical data in SQL Server versions that support recursive CTEs, that is versions from 2005 onwards.