EXPLAIN EXTENDED

How to create fast database queries

First common ancestor

with 2 comments

From Stack Overflow:

Let's say that we have we have a table with the classic manager_id recursive relationship:

Users (user_id int, manager_id int) (refers to user_id)

If you randomly select 2 rows in the table, or 2 nodes, how do you find the lowest level common ancestor? My platform is SQL Server 2005 (Transact-SQL), but any ANSI compliant SQL will also work...

Very nice question.

This may be useful to check against any kind of common ancestry in a tree structure: classes, folders, etc.

In Transact SQL we for sure will use CTE's for hierarchical queries. This has some possibility of hitting the recursion limit, but even Microsoft has only 12 levels of management (only, huh).

We'll just have two CTE's for each ancestry chain and join them together with good old HASH JOIN. No need to worry for performance here, as it's 100 by 100 rows in the worst case, or it will just fail.

The table itself of course may be as big as needed. As long as there is an index on manager_id, the query will just use nested loops to build ancestry chains (which will most probably consist of a dozen records).

We'll create table of 9 management levels and see how it works:

Table creation scripts

SELECT  COUNT(*) AS cnt
FROM    t_user
cnt
2441406

So there are more than 2 000 000 rows in our table, more than enough for any real world organization.

Let's try:

WITH    hier1(id, parent) AS
        (
        SELECT  id, manager_id
        FROM    t_user u 
        WHERE   id = CAST(RAND(20090303) * 2441406 AS INTEGER)
        UNION ALL
        SELECT  u.id, manager_id
        FROM    t_user u, hier1 h
        WHERE   u.id = parent
        ),
        hier2(id, parent) AS
        (
        SELECT  id, manager_id
        FROM    t_user u 
        WHERE   id = CAST(RAND() * 2441406 AS INTEGER)
        UNION ALL
        SELECT  u.id, manager_id
        FROM    t_user u, hier2 h
        WHERE   u.id = parent
        )
SELECT  TOP 1 hier1.id
FROM    hier1, hier2
WHERE   hier1.id = hier2.id
id
2
SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.

Works like a charm.

This query has Θ(log(N)) execution time, so it should work nicely even for much larger structures like folder hierarchies.

Written by Quassnoi

March 3rd, 2009 at 11:00 pm

Posted in SQL Server

2 Responses to 'First common ancestor'

Subscribe to comments with RSS

  1. Should line 18 use

    hier2

    instead of

    hier1

    ?

    D T

    9 Jan 19 at 15:02

  2. @DT you’re right, thanks!

    Quassnoi

    9 Jan 19 at 15:58

Leave a Reply