Author Archive
Longest common prefix: SQL Server
Comments enabled. I *really* need your comment
From Stack Overflow:
I have some data:
id ref 1 3536757616 1 3536757617 1 3536757618 2 3536757628 2 3536757629 2 3536757630 and want to get the result like this:
id result 1 3536757616/7/8 2 3536757629/28/30 Essentially, the data should be aggregated on
id
, and theref
's should be concatenated together and separated by a/
(slash), but with longest common prefix removed.
I've already encoutered this problem several times, so I'll try to cover solutions for all RDBMS's my blog is about:
- Longest common prefix: SQL Server
- Longest common prefix: PostgreSQL
- Longest common prefix: Oracle
- Longest common prefix: MySQL
I hope this will be interesting as approaches will differ significantly for all four systems.
Today is SQL Server time.
I won't create sample tables here, since I'm demonstrating the principle. Instead, I'll just use dynamically generated data.
What do we need to do here is:
- Find least common prefix for each group and its length
- Cut off the prefix of each but the first
ref
, usingSUBSTRING
- Concatenate the strings using
FOR XML
Steps 2 and 3 are quite simple, but the first one needs some effort.
SQL Server doesn't provide a function to find the longest common prefix (LCP), so we'll have to implement it.
With some limitations, it may be done using pure SQL
, no UDF
's.
Read the rest of this entry »
Selecting records unique on each column
Comments enabled. I *really* need your comment
From Stack Overflow:
I need to select 3 columns from a table, but I need each value from any column to be unique in the resultset for this column.
This query:
SELECT DISTINCT TOP 10 a, b, c FROM xwill return 10 distinct sets.
How do I do it?
Here we will need to make sure that for each column in a record we are going to return, no previous record returned so far does not contain the same value of the column.
Let's create a sample table and construct the query:
Read the rest of this entry »
Distinct pairs
Comments enabled. I *really* need your comment
From Stack Overflow:
I have this query:
SELECT DISTINCT id, name FROM table1For a given
id
, thename
will always be the same.Both fields are indexed. There's no separate table that maps the id to the name.
The table is very large (10,000,000 rows), so the query could take some time.
This query is very fast, since it's indexed:
SELECT DISTINCT id FROM table1Likewise for this query:
SELECT DISTINCT id FROM table1Assuming I can't get the database structure changed (a very safe assumption), what's a better way to structure the first query for performance?
The fastest way would of course be building an index on (id, name)
.
But since we (for whatever reason) cannot change the database structure, it's not an option.
Author, nevertheless, mentioned that for any given id
, the name
will always be the same. This can help us.
Let's create a sample table:
Read the rest of this entry »
Numbering groups
Comments enabled. I *really* need your comment
From Stack Overflow:
I have the following table:
Id Value 1 3 1 12 1 67 2 7 2 99 5 30 5 33 5 4 5 87 5 12 5 1 I'd like to update it:
Id UniqueIdBySubSet Value 1 1 3 1 2 12 1 3 67 2 1 7 2 2 99 5 1 30 5 2 33 5 3 4 5 4 87 5 5 12 5 6 1 Does anyone know how can I do that in MySQL?
These kinds of things are easily done with ROW_NUMBER()
in systems that support it.
This function can be emulated in MySQL too. Some time ago I wrote an article on how to do that:
This task, however, is a little more complex. Since we update a single field, we need to change two variables at once in one SET
clause.
We cannot use JOIN
's here too since we don't yet have a PRIMARY KEY
to join on.
However, session variable support in MySQL
is flexible enough to do that.
Let's create the table and see how do we do it:
Read the rest of this entry »
Aggregate AND
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:
CREATE TABLE myTable ( someKey integer, someBool boolean ); INSERT INTO myTable VALUES (1, 't'), (1, 't'), (2, 'f'), (2, 't');Each
someKey
could have 0 or more entries.For any given
someKey
, I need to know if
- All the entries are true, or
- Any of the entries are false
Basically, it's an
AND
.
This solution is often used to represent polls that should be unanimous for the decision to be made (i. e. anyone can put a veto on the decision).
PostgreSQL offers a special aggregate BOOL_AND
to do this.
However, an aggregate may be less efficient here.
The return value of an AND
function is constrained by finding certain values:
- Whenever a
FALSE
is found, the return value cannot beTRUE
anymore. It's eitherFALSE
orNULL
. - Whenever a
NULL
is found, the return value isNULL
Aggregates in PostgreSQL, however, won't take this into account.
What we need here is a method to stop and return whenever first NULL
or FALSE
value is found.
Let's create a sample table and see how can it may be done:
Read the rest of this entry »
Hierarchical queries in PostgreSQL
Comments enabled. I *really* need your comment
Note: this article concerns PostgreSQL 8.3 and earlier.
For hierarchical queries in PostgreSQL 8.4 and higher, see this article:
In one of the previous articles I wrote about using hierarchical queries in MySQL:
PostgreSQL has a contrib module to implement the same functionality.
However, it's not always possible to install and use contribs. Same is true for procedural languages.
Fortunately, this functionality can be implemented using a plain SQL function.
Let's create a sample table and see how it works:
Read the rest of this entry »
Generating XML in subqueries
From Stack Overflow:
I have a query that looks like
SELECT P.Column1, P.Column2, P.Column3, ( SELECT A.ColumnX, A.ColumnY FROM dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A WHERE A.Key = P.Key FOR XML AUTO, TYPE ), ( SELECT B.ColumnX, B.ColumnY, FROM dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B WHERE B.Key = P.Key FOR XML AUTO, TYPE ) FROM ( /* joined tables here */ ) AS P FOR XML AUTO, ROOT('ROOT')
P
has about 5,000 rows.
A
andB
, about 4,000 rows each.This query has a runtime performance of more than 10 minutes.
Changing it to this, however:
SELECT P.Column1, P.Column2, P.Column3 INTO #P FROM ( /* joined tables here ) p SELECT A.ColumnX, A.ColumnY INTO #A FROM dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A SELECT B.ColumnX, B.ColumnY INTO #B FROM dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B SELECT P.Column1, P.Column2, P.Column3, ( SELECT A.ColumnX, A.ColumnY FROM #A AS A WHERE A.Key = P.Key FOR XML AUTO, TYPE ), ( SELECT B.ColumnX, B.ColumnY, FROM #B AS B WHERE B.Key = P.Key FOR XML AUTO, TYPE ) FROM #P AS P FOR XML AUTO, ROOT('ROOT'), has a performance of about 4 seconds.
This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default.
My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.
Is there a way to achieve this without using
#temptable
's or@table_variable
's via indexes and/or hints?
The reason for this is obvious: SQL Server needs to reevaluate the function each time the subquery is called, which is long.
When the subquery result is cached in the temp table, SQL Server will just build an index over this table in an Eager Spool
, and use this index in the joins.
But is it possible to increase performance without using temporary tables?
Let's create sample functions, tables and see:
Read the rest of this entry »
Converting currencies
Comments enabled. I *really* need your comment
When developing a financial application, one needs to deal with currencies and convert from one currency to another.
There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.
A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.
Let's create sample tables and see how it may be done:
Read the rest of this entry »
Selecting options
Comments enabled. I *really* need your comment
From Stack Overflow:
I have three checkboxes in my application.
If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.
Can I do this with a single
SQL
query?
Assuming MySQL.
The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.
This is best achieved by using two mutually exclusive SELECT
statements with a UNION ALL
.
Let's create a sample table and try to construct the query:
Read the rest of this entry »
Genealogy: finding a descendant
Comments enabled. I *really* need your comment
In the previous article I described an efficient way to build a genealogy with a hierarchical query in Oracle:
Now, I'd like to describe a query to check if one entity in the table is a descendant on another one.
Let's create a sample table:
Read the rest of this entry »