Genealogy: finding a descendant
Comments enabled. I *really* need your comment
In the previous article I described an efficient way to build a genealogy with a hierarchical query in Oracle:
Now, I'd like to describe a query to check if one entity in the table is a descendant on another one.
Let's create a sample table:
CREATE TABLE animal (
animal_id INTEGER NOT NULL,
father INTEGER NOT NULL,
mother INTEGER NOT NULL,
name VARCHAR2(50) NOT NULL
)
/
ALTER TABLE animal
ADD CONSTRAINT pk_animal_id
PRIMARY KEY (animal_id)
/
CREATE INDEX ix_animal_father
ON animal (father)
/
CREATE INDEX ix_animal_mother
ON animal (mother)
/
CALL DBMS_RANDOM.SEED(20090525)
/
INSERT
INTO animal (animal_id, father, mother, name)
SELECT lvl,
CASE WHEN father <= 0 OR father >= lvl THEN 0 ELSE father END,
CASE WHEN mother <= 0 OR mother >= lvl THEN 0 ELSE mother END,
'Animal ' || lvl
FROM (
SELECT level AS lvl,
level - MOD(level, 2) - FLOOR(DBMS_RANDOM.VALUE() * 50) * 2 - 2 AS father,
level - MOD(level, 2) - FLOOR(DBMS_RANDOM.VALUE() * 50) * 2 - 1 AS mother
FROM dual
CONNECT BY
level <= 10000
)
/
COMMIT
/
BEGIN
DBMS_STATS.gather_schema_stats('"20090525_genealogy"');
END;
/
[/sourcecode]
As in previous article, this table has <strong>10,000</strong> rows, and for each animal both parents are randomly chosen from <strong>100</strong> previous ones.
Male animals have even <code>id</code>'s, female ones have odd ones.
We had a query to select all descendants for a given animal. To find out if an animal with a given <code>animal_id</code> is a descendant, we can just add a simple <code>WHERE</code> filter:
SELECT DISTINCT(animal_id)
FROM (
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
WHERE animal_id = 9999
START WITH
animal_id = 9000
CONNECT BY
parent = PRIOR animal_id
| ANIMAL_ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9999 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0001s (9.7809s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
HASH UNIQUE
FILTER
CONNECT BY WITH FILTERING
FILTER
COUNT
VIEW
UNION-ALL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
HASH JOIN
CONNECT BY PUMP
COUNT
VIEW
UNION-ALL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
COUNT
VIEW
UNION-ALL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
This query works fine for younger animals, but its time grows exponentially with the animal's age.
In the example above it takes 12 seconds, and for a animal of the first 100 rows it would take like forever.
This task in fact is path finding problem. There are efficient algorithms to do that (like Dijkstra's algorithm), but they are hardly feasible to implement with SQL.
For instance, the query above just goes over all descendancy paths, selects all animals it meets on its way and then uses HASH UNIQUE to get rid of the duplicates, which, of course, takes exponential time.
But it's possible to improve this query.
Here are four steps to do this:
- Constructing a
UNION ALLeach time by browsing the original table is quite expensive. It's better to materialize the table so that only required fields are stored. - To find out if a given animal is a descendant of another one, it's not needed to go over all descendants. It's enough to find the first path leading to this descendant and stop. This will reduce the query time greatly.
- We need to cut the paths that lead to the animals younger that the one we are searching for: they cannot be its parents.
- We need to select longer paths first, as they are more likely to lead to the descendant sooner.
Fortunately, all these steps are possible to implement in Oracle:
WITH parents AS
(
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
SELECT *
FROM (
SELECT p.*, level
FROM parents p
WHERE p.animal_id = 9500
START WITH
p.animal_id = 20
CONNECT BY
p.parent = PRIOR p.animal_id
AND p.parent < 9500
ORDER SIBLINGS BY
animal_id - parent DESC
)
WHERE rownum = 1
[/sourcecode]
| GENDER | ANIMAL_ID | PARENT | LEVEL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 9500 | 9463 | 152 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 row fetched in 0.0002s (0.6094s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
TEMP TABLE TRANSFORMATION
LOAD AS SELECT
UNION-ALL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
COUNT STOPKEY
VIEW
FILTER
CONNECT BY WITH FILTERING
FILTER
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6668_7C7986
HASH JOIN
CONNECT BY PUMP
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6668_7C7986
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6668_7C7986
This query runs for only 0.6 seconds, and finds a very long ancestry chain between a very old and very young animal.
This needs a little explanation:
- To materialize the query, we used a
WITHclause.Usually, materializing requires an undocumented hint
MATERIALIZE, but for hierarchical queries the CBO is smart enough to materialize the query without this hint. - To stop on the first path, we used the subquery along with ROWNUM = 1 which gave use a
COUNT STOPKEYin the plan and stopped scanning as soon as it found the first path. - To cut the wrong paths, we added the condition
AND p.parent < 9500to theCONNECT BYclause. This conditions stops traversing the hierarchy as soon as it reaches the maximal possible age for the parent (a parent cannot be younger than a child). - Finally, we added
ORDER SIBLINGS BY animal_id - parent DESC. This condition chooses order in which the siblings are chosen for traversal (and therefore returned). Of all possible sibings, the engine selects the ones born latest, thus making the engine to evaluate the shorter paths first, which reduces the number of iterations.
We can even improve this query to find out if several animals are descendants of a given one:
WITH parents AS
(
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
SELECT *
FROM animal a
WHERE animal_id BETWEEN 9500 AND 9510
AND EXISTS
(
SELECT 1
FROM parents p
WHERE p.animal_id = a.animal_id
START WITH
p.animal_id = 20
CONNECT BY
p.parent = PRIOR p.animal_id
AND p.parent < a.animal_id
)
[/sourcecode]
| ANIMAL_ID | FATHER | MOTHER | NAME | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9500 | 9400 | 9463 | Animal 9500 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9501 | 9472 | 9415 | Animal 9501 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9502 | 9472 | 9481 | Animal 9502 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9503 | 9476 | 9481 | Animal 9503 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9504 | 9422 | 9451 | Animal 9504 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9505 | 9472 | 9499 | Animal 9505 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9506 | 9432 | 9447 | Animal 9506 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9507 | 9468 | 9477 | Animal 9507 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9508 | 9448 | 9501 | Animal 9508 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9509 | 9410 | 9487 | Animal 9509 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9510 | 9442 | 9473 | Animal 9510 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 11 rows fetched in 0.0005s (6.0466s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
TEMP TABLE TRANSFORMATION
LOAD AS SELECT , 20090525_genealogy.ANIMAL
UNION-ALL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS FULL, 20090525_genealogy.ANIMAL
TABLE ACCESS BY INDEX ROWID, 20090525_genealogy.ANIMAL
INDEX RANGE SCAN, 20090525_genealogy.PK_ANIMAL_ID
FILTER
CONNECT BY WITH FILTERING
FILTER
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6677_7C7986
HASH JOIN
CONNECT BY PUMP
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6677_7C7986
COUNT
VIEW
TABLE ACCESS FULL, SYS.SYS_TEMP_0FD9D6677_7C7986
, which also works quite fast (6 seconds).
In this query, however, we had to omit last step (ORDER SIBLINGS BY), as Oracle does not allow ORDER BY clause in correlated subqueries.
The query is still quite efficient.
Subscribe in a reader