Quassnoi, Author at EXPLAIN EXTENDED - Page 20 of 28 at EXPLAIN EXTENDED

Author Archive

I have some data:

id ref

1 3536757616

1 3536757617

1 3536757618

2 3536757628

2 3536757629

2 3536757630

and want to get the result like this:

id result

1 3536757616/7/8

2 3536757629/28/30

Essentially, the data should be aggregated on id, and the ref's should be concatenated together and separated by a / (slash), but with longest common prefix removed.

id	ref
1	3536757616
1	3536757617
1	3536757618
2	3536757628
2	3536757629
2	3536757630

id	result
1	3536757616/7/8
2	3536757629/28/30

I've already encoutered this problem several times, so I'll try to cover solutions for all RDBMS's my blog is about:

I hope this will be interesting as approaches will differ significantly for all four systems.

Today is SQL Server time.

I won't create sample tables here, since I'm demonstrating the principle. Instead, I'll just use dynamically generated data.

What do we need to do here is:

Find least common prefix for each group and its length
Cut off the prefix of each but the first ref, using SUBSTRING
Concatenate the strings using FOR XML

Steps 2 and 3 are quite simple, but the first one needs some effort.

SQL Server doesn't provide a function to find the longest common prefix (LCP), so we'll have to implement it.

With some limitations, it may be done using pure SQL, no UDF's.
Read the rest of this entry »

Written by Quassnoi

June 4th, 2009 at 11:00 pm

Posted in SQL Server

Selecting records unique on each column

Comments enabled. I *really* need your comment

From Stack Overflow:

I need to select 3 columns from a table, but I need each value from any column to be unique in the resultset for this column.

This query:
SELECT DISTINCT TOP 10 a, b, c
FROM   x
will return 10 distinct sets.

How do I do it?

Here we will need to make sure that for each column in a record we are going to return, no previous record returned so far does not contain the same value of the column.

Let's create a sample table and construct the query:
Read the rest of this entry »

Written by Quassnoi

June 3rd, 2009 at 11:00 pm

Posted in SQL Server

Distinct pairs

Comments enabled. I *really* need your comment

From Stack Overflow:

I have this query:
SELECT  DISTINCT id, name
FROM    table1
For a given id, the name will always be the same.

Both fields are indexed. There's no separate table that maps the id to the name.

The table is very large (10,000,000 rows), so the query could take some time.

This query is very fast, since it's indexed:
SELECT  DISTINCT id
FROM    table1
Likewise for this query:
SELECT  DISTINCT id
FROM    table1
Assuming I can't get the database structure changed (a very safe assumption), what's a better way to structure the first query for performance?

The fastest way would of course be building an index on (id, name).

But since we (for whatever reason) cannot change the database structure, it's not an option.

Author, nevertheless, mentioned that for any given id, the name will always be the same. This can help us.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

June 2nd, 2009 at 11:00 pm

Posted in Oracle

Numbering groups

Comments enabled. I *really* need your comment

From Stack Overflow:

I have the following table:

Id Value

1 3

1 12

1 67

2 7

2 99

5 30

5 33

5 4

5 87

5 12

5 1

I'd like to update it:

Id UniqueIdBySubSet Value

1 1 3

1 2 12

1 3 67

2 1 7

2 2 99

5 1 30

5 2 33

5 3 4

5 4 87

5 5 12

5 6 1

Does anyone know how can I do that in MySQL?

Id	Value
1	3
1	12
1	67
2	7
2	99
5	30
5	33
5	4
5	87
5	12
5	1

Id	UniqueIdBySubSet	Value
1	1	3
1	2	12
1	3	67
2	1	7
2	2	99
5	1	30
5	2	33
5	3	4
5	4	87
5	5	12
5	6	1

These kinds of things are easily done with ROW_NUMBER() in systems that support it.

This function can be emulated in MySQL too. Some time ago I wrote an article on how to do that:

Analytic functions: SUM, AVG, ROW_NUMBER

This task, however, is a little more complex. Since we update a single field, we need to change two variables at once in one SET clause.

We cannot use JOIN's here too since we don't yet have a PRIMARY KEY to join on.

However, session variable support in MySQL is flexible enough to do that.

Let's create the table and see how do we do it:
Read the rest of this entry »

Written by Quassnoi

June 1st, 2009 at 11:00 pm

Posted in MySQL

Aggregate AND

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:
CREATE TABLE myTable
(        someKey integer,
        someBool boolean
);
INSERT
INTO    myTable
VALUES
        (1, 't'),
        (1, 't'),
        (2, 'f'),
        (2, 't');
Each someKey could have 0 or more entries.

For any given someKey, I need to know if

All the entries are true, or

Any of the entries are false

Basically, it's an AND.

This solution is often used to represent polls that should be unanimous for the decision to be made (i. e. anyone can put a veto on the decision).

PostgreSQL offers a special aggregate BOOL_AND to do this.

However, an aggregate may be less efficient here.

The return value of an AND function is constrained by finding certain values:

Whenever a FALSE is found, the return value cannot be TRUE anymore. It's either FALSE or NULL.
Whenever a NULL is found, the return value is NULL

Aggregates in PostgreSQL, however, won't take this into account.

What we need here is a method to stop and return whenever first NULL or FALSE value is found.

Let's create a sample table and see how can it may be done:
Read the rest of this entry »

Written by Quassnoi

May 30th, 2009 at 11:00 pm

Posted in PostgreSQL

Hierarchical queries in PostgreSQL

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and earlier.

For hierarchical queries in PostgreSQL 8.4 and higher, see this article:

PostgreSQL 8.4: preserving order for hierarchical query

In one of the previous articles I wrote about using hierarchical queries in MySQL:

Hierarchical queries in MySQL

PostgreSQL has a contrib module to implement the same functionality.

However, it's not always possible to install and use contribs. Same is true for procedural languages.

Fortunately, this functionality can be implemented using a plain SQL function.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 29th, 2009 at 11:00 pm

Posted in PostgreSQL

Generating XML in subqueries

with one comment

From Stack Overflow:

I have a query that looks like
SELECT  P.Column1, P.Column2, P.Column3,
        (
        SELECT  A.ColumnX, A.ColumnY
        FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A
        WHERE   A.Key = P.Key
        FOR XML AUTO, TYPE  
        ),
        (
        SELECT  B.ColumnX, B.ColumnY,
        FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B
        WHERE   B.Key = P.Key
        FOR XML AUTO, TYPE  
        )
FROM    (
        /* joined tables here */
        ) AS P
FOR XML AUTO, ROOT('ROOT')
P has about 5,000 rows.

A and B, about 4,000 rows each.

This query has a runtime performance of more than 10 minutes.

Changing it to this, however:
SELECT  P.Column1,
        P.Column2,
        P.Column3
INTO    #P
FROM    (
        /* joined tables here
        ) p

SELECT  A.ColumnX,
        A.ColumnY
INTO    #A     
FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A

SELECT  B.ColumnX, B.ColumnY
INTO    #B     
FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B

SELECT  P.Column1, P.Column2, P.Column3,
        (
        SELECT  A.ColumnX, A.ColumnY
        FROM    #A AS A
        WHERE   A.Key = P.Key
        FOR XML AUTO, TYPE  
        ),
        (
        SELECT  B.ColumnX, B.ColumnY,
        FROM    #B AS B
        WHERE   B.Key = P.Key
        FOR XML AUTO, TYPE  
        )
FROM    #P AS P
FOR XML AUTO, ROOT('ROOT')
, has a performance of about 4 seconds.

This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default.

My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.

Is there a way to achieve this without using #temptable's or @table_variable's via indexes and/or hints?

The reason for this is obvious: SQL Server needs to reevaluate the function each time the subquery is called, which is long.

When the subquery result is cached in the temp table, SQL Server will just build an index over this table in an Eager Spool, and use this index in the joins.

But is it possible to increase performance without using temporary tables?

Let's create sample functions, tables and see:
Read the rest of this entry »

Written by Quassnoi

May 28th, 2009 at 11:00 pm

Posted in SQL Server

Converting currencies

Comments enabled. I *really* need your comment

When developing a financial application, one needs to deal with currencies and convert from one currency to another.

There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.

A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.

Let's create sample tables and see how it may be done:
Read the rest of this entry »

Written by Quassnoi

May 27th, 2009 at 11:00 pm

Posted in Oracle

Selecting options

Comments enabled. I *really* need your comment

From Stack Overflow:

I have three checkboxes in my application.

If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.

Can I do this with a single SQL query?

Assuming MySQL.

The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.

This is best achieved by using two mutually exclusive SELECT statements with a UNION ALL.

Let's create a sample table and try to construct the query:
Read the rest of this entry »

Written by Quassnoi

May 26th, 2009 at 11:00 pm

Posted in MySQL

Genealogy: finding a descendant

Comments enabled. I *really* need your comment

In the previous article I described an efficient way to build a genealogy with a hierarchical query in Oracle:

Genealogy query on both parents

Now, I'd like to describe a query to check if one entity in the table is a descendant on another one.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 25th, 2009 at 11:00 pm

Posted in Oracle

EXPLAIN EXTENDED

Author Archive

Longest common prefix: SQL Server

Selecting records unique on each column

Distinct pairs

Numbering groups

Aggregate AND

Hierarchical queries in PostgreSQL

Generating XML in subqueries

Converting currencies

Selecting options

Genealogy: finding a descendant

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

Author Archive

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow