Overlapping numbers
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Kat asks:
I want to determine if I have any overlapping ranges of numbers in a table for specific
id's.For example, the table has the following:
ID START_NUMBER END_NUMBER 1 3035112000 3035112010 2 3035112011 3035112020 3 3035112003 3035112005 4 3032925000 3032925010 1 3035113000 3035113050 2 3035113000 3035113050 3 3035113051 3035113052 4 3032925500 3032925550 I only want to pull those that have overlapping number ranges for
id's 1, 2, and 3 (but not 4).This is in Oracle.
We need to write a query to pull out all id's of rows that overlap at least one other row in the table.
Let's create a pair of sample tables:
CREATE TABLE t_number
(
id NUMBER(10) NOT NULL,
start_number NUMBER(10) NOT NULL,
end_number NUMBER(10) NOT NULL
)
/
CREATE TABLE t_numbers
(
id NUMBER(10) NOT NULL,
start_number NUMBER(10) NOT NULL,
end_number NUMBER(10) NOT NULL
)
/
CREATE INDEX ix_number2_start ON t_number2 (start_number)
/
BEGIN
DBMS_RANDOM.seed(20090702);
END;
/
INSERT
INTO t_number
SELECT 1 AS id, 3035112000 AS start_number, 3035112010 AS end_number
FROM dual
UNION ALL
SELECT 2, 3035112011, 3035112020
FROM dual
UNION ALL
SELECT 3, 3035112003, 3035112005
FROM dual
UNION ALL
SELECT 4, 3032925000, 3032925010
FROM dual
UNION ALL
SELECT 1, 3035113000, 3035113050
FROM dual
UNION ALL
SELECT 2, 3035113000, 3035113050
FROM dual
UNION ALL
SELECT 3, 3035113051, 3035113052
FROM dual
UNION ALL
SELECT 4, 3032925500, 3032925550
FROM dual
/
INSERT
INTO t_number2
SELECT TRUNC(lvl / 2),
TRUNC(init * 50000000),
TRUNC(init * 50000000) + TRUNC(offset * 5)
FROM (
SELECT /*+ NO_MERGE */
level AS lvl,
DBMS_RANDOM.value() AS init,
DBMS_RANDOM.value() AS offset
FROM dual
CONNECT BY
level <= 10000
)
/
COMMIT
/
The first table, called t_number, contains the sample data provided by the author of the question.
The second table, called t_number2, contains 10,000 random rows to test the queries for efficiency.
Now, how will we query?
We can go straightforward: just make a self-join, filter the rows that overlap some other row in the table and get rid of duplicate id's using DISTINCT.
To find out if two rows overlap each other, we can use a simple condition we discussed earlier:
SELECT DISTINCT q1.id
FROM t_number q1
JOIN t_number q2
ON q1.start_number < q2.end_number
AND q1.end_number > q2.start_number
AND q1.rowid <> q2.rowid
ORDER BY
q1.id
| ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 rows fetched in 0.0001s (0.0007s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
This works fine for a small table, but when we'll try to query a larger table, it will take way too much time:
SELECT DISTINCT q1.id
FROM t_number2 q1
JOIN t_number2 q2
ON q1.start_number <= q2.end_number
AND q1.end_number >= q2.start_number
AND q1.rowid <> q2.rowid
ORDER BY
q1.id
| ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 14 rows fetched in 0.0003s (17.7964s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
SORT UNIQUE
MERGE JOIN
SORT JOIN
TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2
FILTER
SORT JOIN
TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2
Almost 18 seconds, not good.
Instead of this simple condition, we can use the same approach we used to build contiguous ranges in this article:
We need to sort all values by start_number and calculate the running maximum of end_number.
Then we will just have to find all rows that overlap their preceding row (a row does if the current value of start_number is less than the previous value of the running maximum of end_number).
Overlapping, though, is symmetrical operation (if row 1 overlaps row 2, this means that row 2 overlaps row 1 too), so we will have to return id of the previous row too as a separate record, i. e. make an unpivot. In Oracle, it's best to do this by joining the resultset with a constructed helper table.
This is more efficient but less readable and requires a little explanation, so I'll construct the query in steps.
First, we need to find the running maximum of end_number for each row. In Oracle, it's super easy to do this by using analytic functions, namely MAX:
SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number
| ID | START_NUMBER | MAX_END_NUMBER |
|---|---|---|
| 4 | 3032925000 | 3032925010 |
| 4 | 3032925500 | 3032925550 |
| 1 | 3035112000 | 3035112010 |
| 3 | 3035112003 | 3035112010 |
| 2 | 3035112011 | 3035112020 |
| 1 | 3035113000 | 3035113050 |
| 2 | 3035113000 | 3035113050 |
| 3 | 3035113051 | 3035113052 |
The column max_end_number in the query above is the running maximum: it's the maximal value of end_number over all preceding values.
Now we need to find all rows whose start_number does not exceed the running maximum for the previous row.
SQL does not provide an easily way to make vertical (inter-row) calculations in a resultset, but Oracle's extensions will help us in this case. We will just need the function LAG, which returns the value of any column from the previous row.
For the sake of readability, I will use CTE's (aliased queries) instead of the nested queries:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ) SELECT id, start_number, max_end_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges
| ID | START_NUMBER | MAX_END_NUMBER | PREV_MAX_END_NUMBER | PREV_ID |
|---|---|---|---|---|
| 4 | 3032925000 | 3032925010 | ||
| 4 | 3032925500 | 3032925550 | 3032925010 | 4 |
| 1 | 3035112000 | 3035112010 | 3032925550 | 4 |
| 3 | 3035112003 | 3035112010 | 3035112010 | 1 |
| 2 | 3035112011 | 3035112020 | 3035112010 | 3 |
| 1 | 3035113000 | 3035113050 | 3035112020 | 2 |
| 2 | 3035113000 | 3035113050 | 3035113050 | 1 |
| 3 | 3035113051 | 3035113052 | 3035113050 | 2 |
Now we need to select id's of rows that overlap their preceding row. This is done by adding a simple condition to the query above:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ) SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number
| ID | PREV_ID |
|---|---|
| 3 | 1 |
| 2 | 1 |
We see that one of the rows with id = 3 overlaps a row with id = 1 and a row with id = 2 overlaps another row with id = 1. That means that rows with id = 1 overlap their corresponding rows toom and 1 be included into the final resultset just as well.
To do this, we need to unpivot that table by joining it with an inline helper rowset and uniquify the returned id's using DISTINCT:
WITH ranges AS
(
SELECT id,
start_number,
MAX(end_number) OVER (ORDER BY start_number) AS max_end_number
FROM t_number
),
lags AS
(
SELECT id, start_number,
LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number,
LAG(id) OVER (ORDER BY start_number) AS prev_id
FROM ranges
),
overlaps AS
(
SELECT id, prev_id
FROM lags
WHERE start_number <= prev_max_end_number
),
pivot AS (
SELECT 1 AS which
FROM dual
UNION ALL
SELECT 2 AS which
FROM dual
)
SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id
FROM overlaps
CROSS JOIN
pivot
| ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 rows fetched in 0.0002s (0.0014s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Here's the result we're after.
Now, let's check performance on the large table:
WITH ranges AS
(
SELECT id,
start_number,
MAX(end_number) OVER (ORDER BY start_number) AS max_end_number
FROM t_number2
),
lags AS
(
SELECT id, start_number,
LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number,
LAG(id) OVER (ORDER BY start_number) AS prev_id
FROM ranges
),
overlaps AS
(
SELECT id, prev_id
FROM lags
WHERE start_number <= prev_max_end_number
),
pivot AS (
SELECT 1 AS which
FROM dual
UNION ALL
SELECT 2 AS which
FROM dual
)
SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id
FROM overlaps
CROSS JOIN
pivot
ORDER BY
id
| ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 14 rows fetched in 0.0003s (0.0499s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SELECT STATEMENT
SORT UNIQUE
MERGE JOIN CARTESIAN
VIEW
UNION-ALL
FAST DUAL
FAST DUAL
BUFFER SORT
VIEW
WINDOW BUFFER
VIEW
WINDOW SORT
TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2
Instead of costly self-join, this query uses but a single table scan and a cheap CARTESIAN JOIN with an in-memory table to unpivot the results.
That's why for 10,000 rows this query takes only 40 ms, which is almost instant.
Hope that helps.
I'm always glad to answer the questions regarding database queries.
Subscribe in a reader