From Stack Overflow:
I have a view which matches up the needs and the wants, and discards the requests that cannot be fulfilled by any user, that looks like this:
Item Need Have 1 Bob George 2 George Herbie 3 Herbie Bob 4 Larry Wally 5 Wally Larry 6 John George
I am now trying to write a query where I can calculate the number of permutations of each user having their requested needs fulfilled.
For instance, in the example above, Bob, George, and Herbie can together fulfill each other’s needs, as can Larry and Wally, but John cannot, so the total count would be 2.
This task is actually about finding loops.
Whenever there is a set of users that fulfill each other’s needs, there is a loop in the tree.
We can assume that either of the columns (
have) is unique so there there is always a single path in the corresponding direction which ends with at most one loop.
It’s not stated in the task, but otherwise it would be an equivalent of Clique problem which is
NP complete and hardly solvable with
Now, let’s create a sample table:
CREATE TABLE t_needs ( id NUMBER(10) NOT NULL, name VARCHAR2(50) NOT NULL, needs NUMBER(10) NOT NULL ) / ALTER TABLE t_needs ADD CONSTRAINT pk_needs_id PRIMARY KEY (id) / CREATE INDEX ix_needs_needs ON t_needs (needs) / BEGIN DBMS_RANDOM.seed(20090518); END; / INSERT INTO t_needs (id, name, needs) SELECT level, 'User ' || level, DBMS_RANDOM.value() * 999 + 1 FROM dual CONNECT BY level <= 1000 / COMMIT / BEGIN DBMS_STATS.gather_schema_stats('"20090518_loops"'); END; /
We have a tree table with 1,000 records and randomly filled
needs corresponds to an existing user).
Now, we want to find all the loops in the table and count their lengths.
First, we should filter out the users that are not a part of any loop.
To do this, we will use Oracle‘s ability to detect loops in a path. We will select all paths possible (starting with each of given users) and select all records which are a part of a loop.
These records are very easy to detect: on the next step they would yield a records that had already been evaluated in the branch and therefore resides in the current recursion stack.
Oracle just examines the recursion stack on each step and sets a special condition if this step would return something that already had been returned.
CONNECT BY query would be a little help to us, since any loop in this query would result in an immediate error.
But Oracle has a nice clause called
CONNECT BY NOCYCLE, which does not throws an exception in case of a loop, but instead returns it in a special pseudocolumn called
If we use it to filter the results of our query, we will get:
SELECT id FROM "20090518_loops".t_needs n WHERE CONNECT_BY_ISCYCLE = 1 CONNECT BY NOCYCLE needs = PRIOR id
|65 rows fetched in 0.0012s (0.3101s)|
Now, we have a list 65 users that are self-sufficient within their loops, i. e. form a closed cycle of what they need and what they provide to each other.
But we still don’t know which groups do they belong to, nor how many users are there in each group.
We can’t use a simple filtering, since loops have neither beginning nor an end, and from any given row we cannot say which loop is it a part of, or how long the loop is: 2 records or 100.
But each loop, of course, has its minimal value. And we can consider this value to be a
beginning of a loop. Since it’s a loop we are talking about, any value is just as good as all other values.
To figure out the minimal
id of each loop, we need to do the following:
- For each value that is a part of a loop, we should unwind the loop (traverse it in the opposite direction) and find all values that constitute the loop.
- For each loop, find the minimal
id, and consider this minimal
idthe start of the loop.
This needs a little explanation. Imagine that values 2, 3, 4 and 5 form a loop (in this order).
If we start unwinding the loop from value 2, we will get the following results: 2, 5, 4, 3.
If we start unwinding it from value 3, we will get 3, 2, 5, 4, etc.
So whatever record of a loop we take as a
START WITH condition, we get the same results in different order.
Since any record can be found in a loop at most once, the minimal value of the chain uniquely identifies the loop.
So this query:
SELECT DISTINCT MIN(id) FROM ( SELECT id, CONNECT_BY_ROOT(id) AS root FROM t_needs START WITH id IN ( SELECT id FROM t_needs n WHERE CONNECT_BY_ISCYCLE = 1 CONNECT BY NOCYCLE needs = PRIOR id ) CONNECT BY NOCYCLE id = PRIOR needs ) GROUP BY root
|3 rows fetched in 0.0002s (0.3280s)|
gives us the minimal id‘s in each loop.
Now all that is left to do is to repeat the query but now use the minimal
id‘s from each group in a
START WITH condition:
SELECT CONNECT_BY_ROOT(id), level FROM t_needs n WHERE CONNECT_BY_ISCYCLE = 1 START WITH id IN ( SELECT MIN(id) FROM ( SELECT id, CONNECT_BY_ROOT(id) AS root FROM t_needs START WITH id IN ( SELECT id FROM t_needs n WHERE CONNECT_BY_ISCYCLE = 1 CONNECT BY NOCYCLE needs = PRIOR id ) CONNECT BY NOCYCLE id = PRIOR needs ) GROUP BY root ) CONNECT BY NOCYCLE id = PRIOR needs
|3 rows fetched in 0.0002s (0.3296s)|
That will give us the starting
id and the number of records in each loop.
If we just want to browse the records (but this time knowing the loop
id and the position in the loop for each record), we just get rid of
SELECT n.*, CONNECT_BY_ROOT(id), level FROM t_needs n START WITH id IN ( SELECT MIN(id) FROM ( SELECT id, CONNECT_BY_ROOT(id) AS root FROM t_needs START WITH id IN ( SELECT id FROM t_needs n WHERE CONNECT_BY_ISCYCLE = 1 CONNECT BY NOCYCLE needs = PRIOR id ) CONNECT BY NOCYCLE id = PRIOR needs ) GROUP BY root ) CONNECT BY NOCYCLE id = PRIOR needs
|65 rows fetched in 0.0031s (0.3330s)|
Now we have the loops completely identified.