EXPLAIN EXTENDED

How to create fast database queries

Archive for May, 2009

Aggregate AND

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:

CREATE TABLE myTable
(        someKey integer,
        someBool boolean
);
INSERT
INTO    myTable
VALUES
        (1, 't'),
        (1, 't'),
        (2, 'f'),
        (2, 't');

Each someKey could have 0 or more entries.

For any given someKey, I need to know if

  1. All the entries are true, or
  2. Any of the entries are false

Basically, it's an AND.

This solution is often used to represent polls that should be unanimous for the decision to be made (i. e. anyone can put a veto on the decision).

PostgreSQL offers a special aggregate BOOL_AND to do this.

However, an aggregate may be less efficient here.

The return value of an AND function is constrained by finding certain values:

  1. Whenever a FALSE is found, the return value cannot be TRUE anymore. It's either FALSE or NULL.
  2. Whenever a NULL is found, the return value is NULL

Aggregates in PostgreSQL, however, won't take this into account.

What we need here is a method to stop and return whenever first NULL or FALSE value is found.

Let's create a sample table and see how can it may be done:
Read the rest of this entry »

Written by Quassnoi

May 30th, 2009 at 11:00 pm

Posted in PostgreSQL

Hierarchical queries in PostgreSQL

Comments enabled. I *really* need your comment

Note: this article concerns PostgreSQL 8.3 and earlier.

For hierarchical queries in PostgreSQL 8.4 and higher, see this article:

In one of the previous articles I wrote about using hierarchical queries in MySQL:

PostgreSQL has a contrib module to implement the same functionality.

However, it's not always possible to install and use contribs. Same is true for procedural languages.

Fortunately, this functionality can be implemented using a plain SQL function.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 29th, 2009 at 11:00 pm

Posted in PostgreSQL

Generating XML in subqueries

with one comment

From Stack Overflow:

I have a query that looks like

SELECT  P.Column1, P.Column2, P.Column3,
        (
        SELECT  A.ColumnX, A.ColumnY
        FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A
        WHERE   A.Key = P.Key
        FOR XML AUTO, TYPE  
        ),
        (
        SELECT  B.ColumnX, B.ColumnY,
        FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B
        WHERE   B.Key = P.Key
        FOR XML AUTO, TYPE  
        )
FROM    (
        /* joined tables here */
        ) AS P
FOR XML AUTO, ROOT('ROOT')

P has about 5,000 rows.

A and B, about 4,000 rows each.

This query has a runtime performance of more than 10 minutes.

Changing it to this, however:

SELECT  P.Column1,
        P.Column2,
        P.Column3
INTO    #P
FROM    (
        /* joined tables here
        ) p

SELECT  A.ColumnX,
        A.ColumnY
INTO    #A     
FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A

SELECT  B.ColumnX, B.ColumnY
INTO    #B     
FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B

SELECT  P.Column1, P.Column2, P.Column3,
        (
        SELECT  A.ColumnX, A.ColumnY
        FROM    #A AS A
        WHERE   A.Key = P.Key
        FOR XML AUTO, TYPE  
        ),
        (
        SELECT  B.ColumnX, B.ColumnY,
        FROM    #B AS B
        WHERE   B.Key = P.Key
        FOR XML AUTO, TYPE  
        )
FROM    #P AS P
FOR XML AUTO, ROOT('ROOT')

, has a performance of about 4 seconds.

This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default.

My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.

Is there a way to achieve this without using #temptable's or @table_variable's via indexes and/or hints?

The reason for this is obvious: SQL Server needs to reevaluate the function each time the subquery is called, which is long.

When the subquery result is cached in the temp table, SQL Server will just build an index over this table in an Eager Spool, and use this index in the joins.

But is it possible to increase performance without using temporary tables?

Let's create sample functions, tables and see:
Read the rest of this entry »

Written by Quassnoi

May 28th, 2009 at 11:00 pm

Posted in SQL Server

Converting currencies

Comments enabled. I *really* need your comment

When developing a financial application, one needs to deal with currencies and convert from one currency to another.

There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.

A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.

Let's create sample tables and see how it may be done:
Read the rest of this entry »

Written by Quassnoi

May 27th, 2009 at 11:00 pm

Posted in Oracle

Selecting options

Comments enabled. I *really* need your comment

From Stack Overflow:

I have three checkboxes in my application.

If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.

Can I do this with a single SQL query?

Assuming MySQL.

The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.

This is best achieved by using two mutually exclusive SELECT statements with a UNION ALL.

Let's create a sample table and try to construct the query:
Read the rest of this entry »

Written by Quassnoi

May 26th, 2009 at 11:00 pm

Posted in MySQL

Genealogy: finding a descendant

Comments enabled. I *really* need your comment

In the previous article I described an efficient way to build a genealogy with a hierarchical query in Oracle:

Now, I'd like to describe a query to check if one entity in the table is a descendant on another one.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 25th, 2009 at 11:00 pm

Posted in Oracle

Genealogy query on both parents

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm trying to fetch a genealogy tree of animals from my Oracle database.

Here's the table:

Animal_ID Parent_Male_ID Parent_Female_ID

If I specify an animal, I can get all of its descendants (on the male side) using something like this:

SELECT  *
FROM    animal
START WITH
        animal_id = 123
CONNECT BY
        PRIOR animal_id = parent_male_id

I'm trying to find a way to extend this in such a way that if I specify an animal, it will fetch both parents and then will fetch all of their descendants.

Any thoughts?

This is possible to do with a same CONNECT BY query by providing a little more comlex condition for START WITH and CONNECT BY:

SELECT  DISTINCT(animal_id) AS animal_id
FROM    animal
START WITH
        animal_id IN
        (
        SELECT  father
        FROM    animal
        WHERE   animal_id = 9500
        UNION ALL 
        SELECT  mother
        FROM    animal
        WHERE   animal_id = 9500
        )
CONNECT BY
        PRIOR animal_id IN (father, mother)
ORDER BY
        animal_id

However, my gut feeling says this query need performance checking.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 24th, 2009 at 11:00 pm

Posted in Oracle

Ordering columns independently

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with 5 columns in it.

What's the easiest way to select all rows, but where each column is individually randomized?

All I can think of is to select each column separately, along with

ROW_NUMBER() OVER (ORDER BY NEWID()) AS lookup

, and then join each column back together on lookup.

Is there an easier way?

Though it's possible to do this by means of JOIN'ing, there is an easier way.

Since all SQL operations are row-level, we will need to make 5 independent ordered sets, which we later can use to order each column set. This implies having 5 times as much rows as there are rows in the original table.

But instead of JOIN'ing, we can use SQL Server's clause which maps rows to columns, which is called PIVOT.

We need to do the following:

  1. Generate 5 sets of column values and UNION ALL them.
  2. For each set, select a ROW_NUMBER() OVER (ORDER BY NEWID()) along with each row. It will be unique for each row in any set and will give a row identifier for PIVOT'ing.
  3. For each set, select a set identifier: just a string with column name. This will give a column identifier for PIVOT'ing
  4. Finally, PIVOT the column values: they will get into appropriate row and column, based on ROW_NUMBER() and the set name

Since PIVOT requires aggregation (in case row and column identifiers are not unique), we need some aggregation function, but our query is designed so that there is exactly one row with a given ROW_NUMBER() and set name, so any aggregation function will do.

We will use MIN.

Now, let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 22nd, 2009 at 11:00 pm

Posted in SQL Server

Choosing column order

with one comment

From Stack Overflow:

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother?

Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?

Well, there is reason.

MySQL stores some types (out of which VARCHAR is probably the most important) with variable length.

That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.

Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.

If we need the value of the first column (first non-PRIMARY KEY column, of course), the engine will do it once per row.

But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.

Let's create a sample table and see how big the impact is:
Read the rest of this entry »

Written by Quassnoi

May 21st, 2009 at 11:00 pm

Posted in MySQL

Oracle: matching SYS_GUID’s

with one comment

Answering questions asked on the site.

Habib asks:

I have a table with a PRIMARY KEY defined as ConsumerID RAW(16) NOT NULL.

However, when I issue a query like this:

SELECT  *
FROM     Consumers
WHERE   ConsumerID = '1DFFC45DEDEB42B8B79B060D544C2ACB'

, it uses a TABLE ACCESS FULL and is very slow.

How can I improve this query?

Defining a PRIMARY KEY column as RAW(16) almost always means using system generated GUID's returned by SYS_GUID as PRIMARY KEY's.

Let's create a sample table and see what's going on:
Read the rest of this entry »

Written by Quassnoi

May 20th, 2009 at 11:00 pm

Posted in Oracle