## Overlapping numbers

Comments enabled. I *really* need your comment

Answering questions asked on the site.

**Kat** asks:

I want to determine if I have any overlapping ranges of numbers in a table for specific

`id`

's.For example, the table has the following:

ID START_NUMBER END_NUMBER 1 3035112000 3035112010 2 3035112011 3035112020 3 3035112003 3035112005 4 3032925000 3032925010 1 3035113000 3035113050 2 3035113000 3035113050 3 3035113051 3035113052 4 3032925500 3032925550 I only want to pull those that have overlapping number ranges for

`id`

's1,2, and3(but not4).This is in

Oracle.

We need to write a query to pull out all `id`

's of rows that overlap at least one other row in the table.

Let's create a pair of sample tables:

CREATE TABLE t_number ( id NUMBER(10) NOT NULL, start_number NUMBER(10) NOT NULL, end_number NUMBER(10) NOT NULL ) / CREATE TABLE t_numbers ( id NUMBER(10) NOT NULL, start_number NUMBER(10) NOT NULL, end_number NUMBER(10) NOT NULL ) / CREATE INDEX ix_number2_start ON t_number2 (start_number) / BEGIN DBMS_RANDOM.seed(20090702); END; / INSERT INTO t_number SELECT 1 AS id, 3035112000 AS start_number, 3035112010 AS end_number FROM dual UNION ALL SELECT 2, 3035112011, 3035112020 FROM dual UNION ALL SELECT 3, 3035112003, 3035112005 FROM dual UNION ALL SELECT 4, 3032925000, 3032925010 FROM dual UNION ALL SELECT 1, 3035113000, 3035113050 FROM dual UNION ALL SELECT 2, 3035113000, 3035113050 FROM dual UNION ALL SELECT 3, 3035113051, 3035113052 FROM dual UNION ALL SELECT 4, 3032925500, 3032925550 FROM dual / INSERT INTO t_number2 SELECT TRUNC(lvl / 2), TRUNC(init * 50000000), TRUNC(init * 50000000) + TRUNC(offset * 5) FROM ( SELECT /*+ NO_MERGE */ level AS lvl, DBMS_RANDOM.value() AS init, DBMS_RANDOM.value() AS offset FROM dual CONNECT BY level <= 10000 ) / COMMIT /

The first table, called `t_number`

, contains the sample data provided by the author of the question.

The second table, called `t_number2`

, contains **10,000** random rows to test the queries for efficiency.

Now, how will we query?

We can go straightforward: just make a self-join, filter the rows that overlap some other row in the table and get rid of duplicate `id`

's using `DISTINCT`

.

To find out if two rows overlap each other, we can use a simple condition we discussed earlier:

SELECT DISTINCT q1.id FROM t_number q1 JOIN t_number q2 ON q1.start_number < q2.end_number AND q1.end_number > q2.start_number AND q1.rowid <> q2.rowid ORDER BY q1.id

ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3 rows fetched in 0.0001s (0.0007s) |

This works fine for a small table, but when we'll try to query a larger table, it will take way too much time:

SELECT DISTINCT q1.id FROM t_number2 q1 JOIN t_number2 q2 ON q1.start_number <= q2.end_number AND q1.end_number >= q2.start_number AND q1.rowid <> q2.rowid ORDER BY q1.id

ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

14 rows fetched in 0.0003s (17.7964s) |

SELECT STATEMENT SORT UNIQUE MERGE JOIN SORT JOIN TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2 FILTER SORT JOIN TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2

Almost **18** seconds, not good.

Instead of this simple condition, we can use the same approach we used to build contiguous ranges in this article:

We need to sort all values by `start_number`

and calculate the running maximum of `end_number`

.

Then we will just have to find all rows that overlap their preceding row (a row does if the current value of `start_number`

is less than the previous value of the running maximum of `end_number`

).

Overlapping, though, is symmetrical operation (if row **1** overlaps row **2**, this means that row **2** overlaps row **1** too), so we will have to return id of the previous row too as a separate record, i. e. make an unpivot. In **Oracle**, it's best to do this by joining the resultset with a constructed helper table.

This is more efficient but less readable and requires a little explanation, so I'll construct the query in steps.

First, we need to find the running maximum of `end_number`

for each row. In **Oracle**, it's super easy to do this by using analytic functions, namely `MAX`

:

SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number

ID | START_NUMBER | MAX_END_NUMBER |
---|---|---|

4 | 3032925000 | 3032925010 |

4 | 3032925500 | 3032925550 |

1 | 3035112000 | 3035112010 |

3 | 3035112003 | 3035112010 |

2 | 3035112011 | 3035112020 |

1 | 3035113000 | 3035113050 |

2 | 3035113000 | 3035113050 |

3 | 3035113051 | 3035113052 |

The column `max_end_number`

in the query above is the running maximum: it's the maximal value of `end_number`

over all preceding values.

Now we need to find all rows whose `start_number`

does not exceed the running maximum for the previous row.

**SQL** does not provide an easily way to make vertical (inter-row) calculations in a resultset, but **Oracle**'s extensions will help us in this case. We will just need the function `LAG`

, which returns the value of any column from the previous row.

For the sake of readability, I will use **CTE**'s (aliased queries) instead of the nested queries:

WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ) SELECT id, start_number, max_end_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges

ID | START_NUMBER | MAX_END_NUMBER | PREV_MAX_END_NUMBER | PREV_ID |
---|---|---|---|---|

4 | 3032925000 | 3032925010 | ||

4 | 3032925500 | 3032925550 | 3032925010 | 4 |

1 | 3035112000 | 3035112010 | 3032925550 | 4 |

3 | 3035112003 | 3035112010 | 3035112010 | 1 |

2 | 3035112011 | 3035112020 | 3035112010 | 3 |

1 | 3035113000 | 3035113050 | 3035112020 | 2 |

2 | 3035113000 | 3035113050 | 3035113050 | 1 |

3 | 3035113051 | 3035113052 | 3035113050 | 2 |

Now we need to select `id`

's of rows that overlap their preceding row. This is done by adding a simple condition to the query above:

WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ) SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number

ID | PREV_ID |
---|---|

3 | 1 |

2 | 1 |

We see that one of the rows with `id = 3`

overlaps a row with `id = 1`

and a row with `id = 2`

overlaps another row with `id = 1`

. That means that rows with `id = 1`

overlap their corresponding rows toom and **1** be included into the final resultset just as well.

To do this, we need to unpivot that table by joining it with an inline helper rowset and uniquify the returned `id`

's using `DISTINCT`

:

WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ), overlaps AS ( SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number ), pivot AS ( SELECT 1 AS which FROM dual UNION ALL SELECT 2 AS which FROM dual ) SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id FROM overlaps CROSS JOIN pivot

ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3 rows fetched in 0.0002s (0.0014s) |

Here's the result we're after.

Now, let's check performance on the large table:

WITH ranges AS ( SELECT id, start_number, MAX(end_number) OVER (ORDER BY start_number) AS max_end_number FROM t_number2 ), lags AS ( SELECT id, start_number, LAG(max_end_number) OVER (ORDER BY start_number) AS prev_max_end_number, LAG(id) OVER (ORDER BY start_number) AS prev_id FROM ranges ), overlaps AS ( SELECT id, prev_id FROM lags WHERE start_number <= prev_max_end_number ), pivot AS ( SELECT 1 AS which FROM dual UNION ALL SELECT 2 AS which FROM dual ) SELECT DISTINCT DECODE(which, 1, id, 2, prev_id) AS id FROM overlaps CROSS JOIN pivot ORDER BY id

ID | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

59 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

443 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

596 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1286 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1296 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

1677 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

2806 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3068 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3416 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3472 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3605 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

3828 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

4607 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

4639 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

14 rows fetched in 0.0003s (0.0499s) |

SELECT STATEMENT SORT UNIQUE MERGE JOIN CARTESIAN VIEW UNION-ALL FAST DUAL FAST DUAL BUFFER SORT VIEW WINDOW BUFFER VIEW WINDOW SORT TABLE ACCESS FULL, 20090706_overlaps.T_NUMBER2

Instead of costly self-join, this query uses but a single table scan and a cheap `CARTESIAN JOIN`

with an in-memory table to unpivot the results.

That's why for **10,000** rows this query takes only **40 ms**, which is almost instant.

Hope that helps.

I'm always glad to answer the questions regarding database queries.