EXPLAIN EXTENDED

How to create fast database queries

Author Archive

Genealogy query on both parents

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm trying to fetch a genealogy tree of animals from my Oracle database.

Here's the table:

Animal_ID Parent_Male_ID Parent_Female_ID

If I specify an animal, I can get all of its descendants (on the male side) using something like this:

SELECT  *
FROM    animal
START WITH
animal_id = 123
CONNECT BY
PRIOR animal_id = parent_male_id

I'm trying to find a way to extend this in such a way that if I specify an animal, it will fetch both parents and then will fetch all of their descendants.

Any thoughts?

This is possible to do with a same CONNECT BY query by providing a little more comlex condition for START WITH and CONNECT BY:

SELECT  DISTINCT(animal_id) AS animal_id
FROM    animal
START WITH
animal_id IN
(
SELECT  father
FROM    animal
WHERE   animal_id = 9500
UNION ALL
SELECT  mother
FROM    animal
WHERE   animal_id = 9500
)
CONNECT BY
PRIOR animal_id IN (father, mother)
ORDER BY
animal_id

However, my gut feeling says this query need performance checking.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 24th, 2009 at 11:00 pm

Posted in Oracle

Ordering columns independently

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a table with 5 columns in it.

What's the easiest way to select all rows, but where each column is individually randomized?

All I can think of is to select each column separately, along with

ROW_NUMBER() OVER (ORDER BY NEWID()) AS lookup

, and then join each column back together on lookup.

Is there an easier way?

Though it's possible to do this by means of JOIN'ing, there is an easier way.

Since all SQL operations are row-level, we will need to make 5 independent ordered sets, which we later can use to order each column set. This implies having 5 times as much rows as there are rows in the original table.

But instead of JOIN'ing, we can use SQL Server's clause which maps rows to columns, which is called PIVOT.

We need to do the following:

  1. Generate 5 sets of column values and UNION ALL them.
  2. For each set, select a ROW_NUMBER() OVER (ORDER BY NEWID()) along with each row. It will be unique for each row in any set and will give a row identifier for PIVOT'ing.
  3. For each set, select a set identifier: just a string with column name. This will give a column identifier for PIVOT'ing
  4. Finally, PIVOT the column values: they will get into appropriate row and column, based on ROW_NUMBER() and the set name

Since PIVOT requires aggregation (in case row and column identifiers are not unique), we need some aggregation function, but our query is designed so that there is exactly one row with a given ROW_NUMBER() and set name, so any aggregation function will do.

We will use MIN.

Now, let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 22nd, 2009 at 11:00 pm

Posted in SQL Server

Choosing column order

with one comment

From Stack Overflow:

I know you can ALTER the column order in MySQL with FIRST and AFTER, but why would you want to bother?

Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?

Well, there is reason.

MySQL stores some types (out of which VARCHAR is probably the most important) with variable length.

That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.

Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.

If we need the value of the first column (first non-PRIMARY KEY column, of course), the engine will do it once per row.

But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.

Let's create a sample table and see how big the impact is:
Read the rest of this entry »

Written by Quassnoi

May 21st, 2009 at 11:00 pm

Posted in MySQL

Oracle: matching SYS_GUID’s

with one comment

Answering questions asked on the site.

Habib asks:

I have a table with a PRIMARY KEY defined as ConsumerID RAW(16) NOT NULL.

However, when I issue a query like this:

SELECT  *
FROM     Consumers
WHERE   ConsumerID = '1DFFC45DEDEB42B8B79B060D544C2ACB'

, it uses a TABLE ACCESS FULL and is very slow.

How can I improve this query?

Defining a PRIMARY KEY column as RAW(16) almost always means using system generated GUID's returned by SYS_GUID as PRIMARY KEY's.

Let's create a sample table and see what's going on:
Read the rest of this entry »

Written by Quassnoi

May 20th, 2009 at 11:00 pm

Posted in Oracle

Finding loops in a tree

with 4 comments

From Stack Overflow:

I have a view which matches up the needs and the wants, and discards the requests that cannot be fulfilled by any user, that looks like this:

Item Need Have
1 Bob George
2 George Herbie
3 Herbie Bob
4 Larry Wally
5 Wally Larry
6 John George

I am now trying to write a query where I can calculate the number of permutations of each user having their requested needs fulfilled.

For instance, in the example above, Bob, George, and Herbie can together fulfill each other's needs, as can Larry and Wally, but John cannot, so the total count would be 2.

This task is actually about finding loops.

Whenever there is a set of users that fulfill each other's needs, there is a loop in the tree.

We can assume that either of the columns (need or have) is unique so there there is always a single path in the corresponding direction which ends with at most one loop.

It's not stated in the task, but otherwise it would be an equivalent of Clique problem which is NP complete and hardly solvable with SQL.

Now, let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 18th, 2009 at 11:00 pm

Posted in Oracle

Ranging random values

Comments enabled. I *really* need your comment

From Stack Overflow:

Given a set of random numeric values in an SQL Server database, how do I generate a limited list of ranges where each range contains at least one value?

The ranges should not overlap and ideally have a similar amount of values in them.

This is a good job for SQL Server's analytic function NTILE.

This function, given a parameter N, assigns a rank from 1 to N to each value, just as in quadrile (4), quintile (5), percentile (100) etc.

Lower 1/N of values get a rank of 1, from 1/N to 2/N get a rank of 2, etc.

The value, though, are integer and random, and we can get gaps between the adjacent ranges.

To avoid this, we need to extend the ranges a little, so each range gets a common bound with both upper and lower neighbor. This is best done by using the middle of the gap as a common range boundary.

Say, NTILE assigned rank 1 to values from 100 to 200, rank 2 to values from 210 to 300, and rank 3 to values from 320 to 400.

Then we will use 205 as a common bound for the first two ranges, and 310 as a commond bound for the next two.

To do this, for each value selected, we will get also the previous one and the next one.

It's obvious that the value that comes before the first value in any group will be the last value in the previous group, and vice versa: the value next to the last value in a group will be the first value in the next group.

Knowing the previous and the next value for each group, we can use them to calculate the median and use it as the range boundary.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

May 17th, 2009 at 11:00 pm

Posted in SQL Server

UNION vs UNION ALL

Comments enabled. I *really* need your comment

In my previous article:

I described efficient ORDER BY along with LIMIT for UNION'ed resultsets.

One thing left unanswered: why UNION DISTINCT in some cases is faster than UNION ALL?

UNION DISTINCT (for which a mere UNION is an alias) implies selecting DISTINCT values from the resultsets, while UNION ALL just selects all values, including duplicates.

On one hand, UNION ALL does not perform duplicate checking, which should be faster (and almost always is). On the other hand, it returns more rows.

If some complex calculations are performed on these rows, this may kill all performance benefits.

Let's create the sample tables:
Read the rest of this entry »

Written by Quassnoi

May 16th, 2009 at 11:00 pm

Posted in MySQL

Ordering UNION’s

Comments enabled. I *really* need your comment

From Stack Overflow:

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run.

Also, a UNION ALL takes longer than a UNION DISTINCT.

I would assume allowing duplicates would make the query run faster and not slower.

Am I really just better off running the 2 queries separately? I would prefer to use the UNION.

This kinds of things happen when the queries employ ORDER BY along with LIMIT.

Let's create the sample tables and see what is happening:
Read the rest of this entry »

Written by Quassnoi

May 15th, 2009 at 11:00 pm

Posted in MySQL

PostgreSQL: selecting a function

Comments enabled. I *really* need your comment

From Stack Overflow:

Hello,

I want to write a SELECT statement as follows:

SELECT  field_a
FROM    my_table
WHERE   field_b IN (my_function(field_c))

Is that possible?

Would my_function have to return an array?

It is of course possible in PostgreSQL using a set returning function.

Return type of such a function should be declared as SETOF, so the function will return a rowset of given type.

There are at least two ways to call this function, and in this article I will consider the benefits and drawbacks of each method.

Let's create sample tables:
Read the rest of this entry »

Written by Quassnoi

May 14th, 2009 at 11:00 pm

Posted in PostgreSQL

Oracle: returning deleted records

Comments enabled. I *really* need your comment

From Stack Overflow:

I need to return a rowset from an Oracle procedure, and then delete them in that same procedure.

Is there a neat way of doing this without temp tables?

Something like an in-memory cursor maybe?

Oracle has a nice feature of returning cursor variables from a stored procedure or an anonymous block.

As any cursor, the returned cursor is a handle to a stable rowset.

This means that it's not affected by any changes to database that were made after the cursor was open.

We can exploit this feature to achieve the required behavior.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

May 13th, 2009 at 11:00 pm

Posted in Oracle