Genealogy query on both parents
Comments enabled. I *really* need your comment
From Stack Overflow:
I'm trying to fetch a genealogy tree of animals from my Oracle database.
Here's the table:
Animal_ID Parent_Male_ID Parent_Female_ID If I specify an animal, I can get all of its descendants (on the male side) using something like this:
SELECT * FROM animal START WITH animal_id = 123 CONNECT BY PRIOR animal_id = parent_male_idI'm trying to find a way to extend this in such a way that if I specify an animal, it will fetch both parents and then will fetch all of their descendants.
Any thoughts?
This is possible to do with a same CONNECT BY
query by providing a little more comlex condition for START WITH
and CONNECT BY
:
SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id IN ( SELECT father FROM animal WHERE animal_id = 9500 UNION ALL SELECT mother FROM animal WHERE animal_id = 9500 ) CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id
However, my gut feeling says this query need performance checking.
Let's create a sample table:
CREATE TABLE animal ( animal_id INTEGER NOT NULL, father INTEGER NOT NULL, mother INTEGER NOT NULL, name VARCHAR2(50) NOT NULL ) / ALTER TABLE animal ADD CONSTRAINT pk_animal_id PRIMARY KEY (animal_id) / CREATE INDEX ix_animal_father ON animal (father) / CREATE INDEX ix_animal_mother ON animal (mother) / CALL DBMS_RANDOM.SEED(20090524) / INSERT INTO animal (animal_id, father, mother, name) SELECT lvl, CASE WHEN father <= 0 OR father >= lvl THEN 0 ELSE father END, CASE WHEN mother <= 0 OR mother >= lvl THEN 0 ELSE mother END, 'Animal ' || lvl FROM ( SELECT level AS lvl, level - MOD(level, 2) - FLOOR(DBMS_RANDOM.VALUE() * 50) * 2 - 2 AS father, level - MOD(level, 2) - FLOOR(DBMS_RANDOM.VALUE() * 50) * 2 - 1 AS mother FROM dual CONNECT BY level <= 10000 ) / COMMIT / BEGIN DBMS_STATS.gather_schema_stats('"20090524_genealogy"'); END; /
This table is filled with 10,000 rows.
Males have even id
's, females have odd ones.
For each animal, both parents are randomly chosen from 100 preceding species (taking gender into account, of course).
Both father
and mother
fields are NOT NULL
, a fake id
of zero is used to mark unknown parent (so that the indexes can be always used).
Such an algorithm, of course, makes incest and promiscuity common for these animals, so that multiple ancestry chains can lead from one animal to another.
Now, let's run the query:
SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id IN ( SELECT father FROM animal WHERE animal_id = 9500 UNION ALL SELECT mother FROM animal WHERE animal_id = 9500 ) CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id
ANIMAL_ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9409 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9432 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9435 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9439 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9453 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9462 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9465 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9473 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9493 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9500 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9998 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9999 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
424 rows fetched in 0.0156s (98.0431s) |
SELECT STATEMENT SORT UNIQUE CONNECT BY WITHOUT FILTERING TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL NESTED LOOPS VIEW , SYS.VW_NSO_1 UNION-ALL TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX UNIQUE SCAN, 20090524_genealogy.PK_ANIMAL_ID TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX UNIQUE SCAN, 20090524_genealogy.PK_ANIMAL_ID INDEX UNIQUE SCAN, 20090524_genealogy.PK_ANIMAL_ID TABLE ACCESS FULL, 20090524_genealogy.ANIMAL UNION-ALL TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_FATHER TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_MOTHER
This query works for 98 seconds, and this is for animal with id of 9500, that is one of the youngest ones.
For elder animals, this query will take literally forever.
But why so long?
This query does not look very complex. It seems that on each iteration there should be an index lookup which finds a appropriate set of animals that have PRIOR animal_id
either as a father
or as a mother
and further traverses the tree.
However, Oracle is not so smart.
When building the query plan, it need to choose an index it will then use for joining.
As any given animal can be either male or female, the different index will be used in each case. In case of a male species, the engine needs to use index on father
to select all of its breed, and the index on mother
should be used to select all children of a female species.
More than that: the index should be chosen again on each iteration.
Oracle can use a single index to build a hierarchial query, but such a complex task is beyond CBO
's abilities.
That's why the CBO
choses a good old TABLE ACCESS FULL
to traverse the hierarchy tree, though it manages to pick up the indexes to select the animal_id
for the START WITH
condition.
Now, what can we do with this?
Since Oracle can only use a single index, we need to provide it somehow with this single index.
This is impossible to do on the table layout like this, since there can be at most one ROWID
per index. Oracle does not allow creation of indexes with multiple key values for one row.
Oracle also does not allow creating indexes on non-materialized views.
However, we can trick Oracle a little and force it to create the index on the fly.
If we look into the query plan above, we will see that NESTED LOOPS
along with TABLE ACCESS FULL
are used to iterate the hierarchy, though a hash lookup would be more efficient here.
But hash lookups are possible only on equi-joins, and we have an IN
condition. Can we turn it to an equi-join somehow?
Since we need an equality condition here, we need two separate rows to be returned for each condition in the IN
clause. This can be achieved by selecting father
's and mother
's apart and UNION
'ing them.
For each animal we will get two rows: first one with its father as a parent, second one with its mother. Then we can just use the common condition parent = PRIOR id
.
On each iteration step, two rows will satisfy it now instead of one, which will lead to more iterations.
But since a HASH JOIN
will be used now, this will perform much better.
Let's check it out:
SELECT DISTINCT(animal_id) AS animal_id FROM ( SELECT 0 AS gender, animal_id, father AS parent FROM animal UNION ALL SELECT 1, animal_id, mother FROM animal ) START WITH animal_id IN ( SELECT father FROM animal WHERE animal_id = 9500 UNION ALL SELECT mother FROM animal WHERE animal_id = 9500 ) CONNECT BY parent = PRIOR animal_id ORDER BY animal_id
ANIMAL_ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9409 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9432 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9435 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9439 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9453 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9462 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9465 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9473 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9493 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9500 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9998 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9999 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10000 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
424 rows fetched in 0.0077s (0.4217s) |
SELECT STATEMENT SORT UNIQUE CONNECT BY WITH FILTERING FILTER COUNT VIEW UNION-ALL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL UNION-ALL TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_FATHER TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_MOTHER HASH JOIN CONNECT BY PUMP COUNT VIEW UNION-ALL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL COUNT VIEW UNION-ALL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL TABLE ACCESS FULL, 20090524_genealogy.ANIMAL UNION-ALL TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_FATHER TABLE ACCESS BY INDEX ROWID, 20090524_genealogy.ANIMAL INDEX RANGE SCAN, 20090524_genealogy.IX_ANIMAL_MOTHER
Now the query takes only 0.42 seconds.