Overlapping numbers
Comments enabled. I *really* need your comment
Answering questions asked on the site.
Kat asks:
I want to determine if I have any overlapping ranges of numbers in a table for specific
id
's.For example, the table has the following:
ID START_NUMBER END_NUMBER 1 3035112000 3035112010 2 3035112011 3035112020 3 3035112003 3035112005 4 3032925000 3032925010 1 3035113000 3035113050 2 3035113000 3035113050 3 3035113051 3035113052 4 3032925500 3032925550 I only want to pull those that have overlapping number ranges for
id
's 1, 2, and 3 (but not 4).This is in Oracle.
We need to write a query to pull out all id
's of rows that overlap at least one other row in the table.
Let's create a pair of sample tables:
CREATE TABLE t_number ( id NUMBER(10) NOT NULL, start_number NUMBER(10) NOT NULL, end_number NUMBER(10) NOT NULL ) / CREATE TABLE t_numbers ( id NUMBER(10) NOT NULL, start_number NUMBER(10) NOT NULL, end_number NUMBER(10) NOT NULL ) / CREATE INDEX ix_number2_start ON t_number2 (start_number) / BEGIN DBMS_RANDOM.seed(20090702); END; / INSERT INTO t_number SELECT 1 AS id, 3035112000 AS start_number, 3035112010 AS end_number FROM dual UNION ALL SELECT 2, 3035112011, 3035112020 FROM dual UNION ALL SELECT 3, 3035112003, 3035112005 FROM dual UNION ALL SELECT 4, 3032925000, 3032925010 FROM dual UNION ALL SELECT 1, 3035113000, 3035113050 FROM dual UNION ALL SELECT 2, 3035113000, 3035113050 FROM dual UNION ALL SELECT 3, 3035113051, 3035113052 FROM dual UNION ALL SELECT 4, 3032925500, 3032925550 FROM dual / INSERT INTO t_number2 SELECT TRUNC(lvl / 2), TRUNC(init * 50000000), TRUNC(init * 50000000) + TRUNC(offset * 5) FROM ( SELECT /*+ NO_MERGE */ level AS lvl, DBMS_RANDOM.value() AS init, DBMS_RANDOM.value() AS offset FROM dual CONNECT BY level <= 10000 ) / COMMIT /
The first table, called t_number
, contains the sample data provided by the author of the question.
The second table, called t_number2
, contains 10,000 random rows to test the queries for efficiency.
Now, how will we query?
We can go straightforward: just make a self-join, filter the rows that overlap some other row in the table and get rid of duplicate id
's using DISTINCT
.
To find out if two rows overlap each other, we can use a simple condition we discussed earlier:
SELECT DISTINCT q1.id FROM t_number q1 JOIN t_number q2 ON q1.start_number < q2.end_number AND q1.end_number > q2.start_number AND q1.rowid <> q2.rowid ORDER BY q1.id
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 rows fetched in 0.0001s (0.0007s) |
This works fine for a small table, but when we'll try to query a larger table, it will take way too much time:
SELECT DISTINCT q1.id FROM t_number2 q1 JOIN t_number2 q2 ON q1.start_number <= q2.end_number AND q1.end_number >= q2.start_number AND q1.rowid <> q2.rowid ORDER BY q1.id
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 rows fetched in 0.0003s (17.7964s) |
SELECT STATEMENT SORT UNIQUE MERGE JOIN SORT JOIN TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2 FILTER SORT JOIN TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2
Almost 18 seconds, not good.
Instead of this simple condition, we can use the same approach we used to build contiguous ranges in this article:
We need to sort all values by start_number
and calculate the running maximum of end_number
.
Then we will just have to find all rows that overlap their preceding row (a row does if the current value of start_number
is less than the previous value of the running maximum of end_number
).
Overlapping, though, is symmetrical operation (if row 1 overlaps row 2, this means that row 2 overlaps row 1 too), so we will have to return id of the previous row too as a separate record, i. e. make an unpivot. In Oracle, it's best to do this by joining the resultset with a constructed helper table.
This is more efficient but less readable and requires a little explanation, so I'll construct the query in steps.
First, we need to find the running maximum of end_number
for each row. In Oracle, it's super easy to do this by using analytic functions, namely MAX
:
SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number
ID | START_NUMBER | MAX_END_NUMBER |
---|---|---|
4 | 3032925000 | 3032925010 |
4 | 3032925500 | 3032925550 |
1 | 3035112000 | 3035112010 |
3 | 3035112003 | 3035112010 |
2 | 3035112011 | 3035112020 |
1 | 3035113000 | 3035113050 |
2 | 3035113000 | 3035113050 |
3 | 3035113051 | 3035113052 |
The column max_end_number
in the query above is the running maximum: it's the maximal value of end_number
over all preceding values.
Now we need to find all rows whose start_number
does not exceed the running maximum for the previous row.
SQL does not provide an easily way to make vertical (inter-row) calculations in a resultset, but Oracle's extensions will help us in this case. We will just need the function LAG
, which returns the value of any column from the previous row.
For the sake of readability, I will use CTE's (aliased queries) instead of the nested queries:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ) SELECT id, start_number, max_end_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges
ID | START_NUMBER | MAX_END_NUMBER | PREV_MAX_END_NUMBER | PREV_ID |
---|---|---|---|---|
4 | 3032925000 | 3032925010 | ||
4 | 3032925500 | 3032925550 | 3032925010 | 4 |
1 | 3035112000 | 3035112010 | 3032925550 | 4 |
3 | 3035112003 | 3035112010 | 3035112010 | 1 |
2 | 3035112011 | 3035112020 | 3035112010 | 3 |
1 | 3035113000 | 3035113050 | 3035112020 | 2 |
2 | 3035113000 | 3035113050 | 3035113050 | 1 |
3 | 3035113051 | 3035113052 | 3035113050 | 2 |
Now we need to select id
's of rows that overlap their preceding row. This is done by adding a simple condition to the query above:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ) SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number
ID | PREV_ID |
---|---|
3 | 1 |
2 | 1 |
We see that one of the rows with id = 3
overlaps a row with id = 1
and a row with id = 2
overlaps another row with id = 1
. That means that rows with id = 1
overlap their corresponding rows toom and 1 be included into the final resultset just as well.
To do this, we need to unpivot that table by joining it with an inline helper rowset and uniquify the returned id
's using DISTINCT
:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ), overlaps AS ( SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number ), pivot AS ( SELECT 1 AS which FROM dual UNION ALL SELECT 2 AS which FROM dual ) SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id FROM overlaps CROSS JOIN pivot
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 rows fetched in 0.0002s (0.0014s) |
Here's the result we're after.
Now, let's check performance on the large table:
WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number2 ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ), overlaps AS ( SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number ), pivot AS ( SELECT 1 AS which FROM dual UNION ALL SELECT 2 AS which FROM dual ) SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id FROM overlaps CROSS JOIN pivot ORDER BY id
ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
14 rows fetched in 0.0003s (0.0499s) |
SELECT STATEMENT SORT UNIQUE MERGE JOIN CARTESIAN VIEW UNION-ALL FAST DUAL FAST DUAL BUFFER SORT VIEW WINDOW BUFFER VIEW WINDOW SORT TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2
Instead of costly self-join, this query uses but a single table scan and a cheap CARTESIAN JOIN
with an in-memory table to unpivot the results.
That's why for 10,000 rows this query takes only 40 ms, which is almost instant.
Hope that helps.
I'm always glad to answer the questions regarding database queries.