Archive for 2009
Flattening timespans: SQL Server
Comments enabled. I *really* need your comment
From Stack Overflow:
I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.
To make things a bit clearer, take a look at the sample data for 03.06.2009:
The following timespans are overlapping or contiunous and need to merge into one timespan:
date start stop 2009.06.03 05:54:48:000 10:00:13:000 2009.06.03 09:26:45:000 09:59:40:000 The resulting timespan would be from 05:54:48 to 10:00:13.
Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:
date start stop 2009.06.03 10:12:50:000 10:27:25:000 2009.06.03 10:13:12:000 11:14:56:000 2009.06.03 10:27:25:000 10:27:31:000 2009.06.03 10:27:39:000 13:53:38:000 2009.06.03 11:14:56:000 11:15:03:000 2009.06.03 11:15:30:000 14:02:14:000 2009.06.03 13:53:38:000 13:53:43:000 2009.06.03 14:02:14:000 14:02:31:000 which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.
Any solution, be it SQL or not, is appreciated.
This is also quite a common task.
The algorithm here is simple:
- Order all timespans by
start
- Initialize a variable which will hold a running maximum of the timespans'
stop
- Take the timespans one by one, and for each next timespan, check its
start
against a running maximum of the previousstop
's. If the currentstart
is greater than the maximum of previousstop
's, it means that all timespans that started earlier have been ended, and we have a gap - Update the running maximum
Unfortunately, SQL Server
does not provide a simple way to calculate running maximums. We could use a correlated subquery of course, but it would take forever.
This is one of quite rare cases when cursor in SQL Server
work faster than set-based operations.
Let's create a sample table and see how it works:
Read the rest of this entry »
Joining unjoinable
Comments enabled. I *really* need your comment
From Stack Overflow:
I want to combine two tables into one.
Let say I have:
Table1:
ID Name 1 A 2 B 3 C Table2:
ID Name 4 D 5 E 6 F I want to make Table3:
Name1 Name2 A D B E C F How can I do this in SQL Server?
This a certainly a task for a JOIN
, since we have two tables here and want to match any row from the first table with another row from the second table.
But what condition do we use for a JOIN
?
The ID
's of the rows seem to be arbitrary and have no special meaning. They could as well be sparse and randomly distributed, so it's impossible to build a formula which corresponds one ID
to another. Same is true for the names.
However, there actually is a condition which we could JOIN
on.
Since the author mentioned there is same number of rows in both tables, we could use the ROW_NUMBER()
as a JOIN
condition.
We just need to wrap both these queries into CTE's which select a ROW_NUMBER()
along with the table data:
WITH table1 AS ( SELECT 1 AS id, 'A' AS name UNION ALL SELECT 2 AS id, 'B' AS name UNION ALL SELECT 3 AS id, 'C' AS name ), table2 AS ( SELECT 4 AS id, 'D' AS name UNION ALL SELECT 5 AS id, 'E' AS name UNION ALL SELECT 6 AS id, 'F' AS name ), q1 AS ( SELECT table1.*, ROW_NUMBER() OVER (ORDER BY name) AS rn FROM table1 ), q2 AS ( SELECT table2.*, ROW_NUMBER() OVER (ORDER BY name) AS rn FROM table2 ) SELECT q1.name, q2.name FROM q1 JOIN q2 ON q2.rn = q1.rn
name | name | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | D | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
B | E | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
C | F | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 rows fetched in 0.0002s (0.0264s) |
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 4 ms.
Latest DISTINCT records: efficient subquery
Comments enabled. I *really* need your comment
In my yesterday's article:
I described an efficient query to select 10 latest distinct IP
's from a pagehit table in MySQL.
Here's the query:
SELECT INET_NTOA(ip), ts FROM t_latest2 lo WHERE NOT EXISTS ( SELECT 1 FROM t_latest2 li WHERE li.ip = lo.ip AND li.ts > lo.ts ) ORDER BY ts DESC LIMIT 10
, which works all right if there are lots of distinct IP
's (as it is in a real table).
However, due to the bug in MySQL, this query is not as efficient as it should be.
Let's create the sample table and see what the problem is:
Read the rest of this entry »
Latest DISTINCT records
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table containing pagehit data in MySQL.
How do I grab the 10 latest unique
IP
's from that table?
This is quite a common task and there are two approaches to solve it.
Let's create two sample tables that differ in IP
distribution:
Read the rest of this entry »
Longest common prefix: MySQL
Comments enabled. I *really* need your comment
This is the fourth and final article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
, and today we will solve this task for MySQL.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
MySQL solution differs from the previous ones a little.
On the one hand, MySQL is the only RDBMS of all listed above that supplies a built-in GROUP_CONCAT
(aggregate function to concatenate strings).
On the other hand, MySQL lacks a way to generate an arbitrary resultset which we relied upon in all solutions above.
This makes it easier to concatenate the results, but a lot harder to find the longest common prefix.
Fortunately, there is a way to do it.
Since 5.1, MySQL offers an XPath
manipulation function, namely ExtractValue
, that will help us to solve this task.
Unfortunately, ExtractValue
is not set generating, i. e. it cannot create an SQL
resultset out of an XML
nodeset. But XPath
itself is quite a powerful tool, and we can implement the longest common prefix algorithm completely in XPath
.
Read the rest of this entry »
Longest common prefix: Oracle
Comments enabled. I *really* need your comment
Today, the third article of the series on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
This article will describe how to do it in Oracle.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
As I already wrote earlier, this solution needs two aggregates: first one to calculate the length of the longest common prefix and the second one to concatenate the strings.
Oracle allows creating custom aggregates, but it's usually a pain to implement them. That's why we will use Oracle's MODEL
clause to solve this task.
Read the rest of this entry »
Longest common prefix: PostgreSQL
Comments enabled. I *really* need your comment
This is a series of articles on how to strip all strings in a set of their longest common prefix and concatenate the results:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
Today, I'll show how to do it in PostgreSQL.
A quick reminder of the problem (taken from Stack Overflow):
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
Like with SQL Server, it is possible to do this in PostgreSQL using a single SQL query.
But since PostgreSQL offers a nice ability to create custom aggregates, I'll better demonstrate how to solve this task using ones.
In my opinion, custom aggregates fit here just perfectly.
Since PostgreSQL lacks native support for aggregate concatenation, we will need to create two custom aggregates here:
Read the rest of this entry »
Longest common prefix: SQL Server
Comments enabled. I *really* need your comment
From Stack Overflow:
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
I've already encoutered this problem several times, so I'll try to cover solutions for all RDBMS's my blog is about:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
I hope this will be interesting as approaches will differ significantly for all four systems.
Today is SQL Server time.
I won't create sample tables here, since I'm demonstrating the principle. Instead, I'll just use dynamically generated data.
What do we need to do here is:
- Find least common prefix for each group and its length
- Cut off the prefix of each but the first
ref
, usingSUBSTRING
- Concatenate the strings using
FOR XML
Steps 2 and 3 are quite simple, but the first one needs some effort.
SQL Server doesn't provide a function to find the longest common prefix (LCP), so we'll have to implement it.
With some limitations, it may be done using pure SQL
, no UDF
's.
Read the rest of this entry »
Selecting records unique on each column
Comments enabled. I *really* need your comment
From Stack Overflow:
I need to select 3 columns from a table, but I need each value from any column to be unique in the resultset for this column.
This query:
SELECT DISTINCT TOP 10 a, b, c FROM xwill return 10 distinct sets.
How do I do it?
Here we will need to make sure that for each column in a record we are going to return, no previous record returned so far does not contain the same value of the column.
Let's create a sample table and construct the query:
Read the rest of this entry »
Distinct pairs
Comments enabled. I *really* need your comment
From Stack Overflow:
I have this query:
SELECT DISTINCT id, name FROM table1For a given
id
, thename
will always be the same.Both fields are indexed. There's no separate table that maps the id to the name.
The table is very large (10,000,000 rows), so the query could take some time.
This query is very fast, since it's indexed:
SELECT DISTINCT id FROM table1Likewise for this query:
SELECT DISTINCT id FROM table1Assuming I can't get the database structure changed (a very safe assumption), what's a better way to structure the first query for performance?
The fastest way would of course be building an index on (id, name)
.
But since we (for whatever reason) cannot change the database structure, it's not an option.
Author, nevertheless, mentioned that for any given id
, the name
will always be the same. This can help us.
Let's create a sample table:
Read the rest of this entry »