2009 - Page 11 of 21 - EXPLAIN EXTENDED at EXPLAIN EXTENDED

Archive for 2009

Click

Back in the end of 90's I had been employed with a small startup which was founded to develop a restaurant management system.

The job listing they posted said PostgreSQL wanted (sic). And PostgreSQL for sure was something I knew about. You see, I already had experience with PostgreSQL by that time. I was able to install PostgreSQL, create a database, run a query against it and even create a trigger in C (I didn't yet know of pl/pgSQL).

So I applied for the job, got hired and started coding.

Coding what you may ask?

Well, I had no idea. And in fact, nobody did. There were only two requirements for the system.

First, it had to work somehow with a Casio Cassiopeia, which allowed a wireless network module to be installed and used. A waiter had to be able to take your order and it had to get to the kitchen instantly.

Second one was personal. The founders of the company (and me too) used to dine in a small cafe at the basement of the office building. They had a menu, but it didn't correlate with reality. You literally couldn't get anything that was in the menu. They were constantly out of meat, out of tomatoes, out of potatoes (potatoes!). Instead they put a small hand-written piece of paper into the menu which listed everything the had at the time (a soup and a something else), that's if they were in good mood. If they were not, you just placed an order, and a waiter came back in 10 minutes to tell you it was beyond their ability to get some tomatoes and cucumbers to chop you some salad, try ordering something else, thank you.

In our system, that was never ever going to happen. Never. If the kitchen was short of something they needed to cook you a dish you ordered, the system would warn the waiter instantly, and she would tell you to order something else right this minute, that's instead of going to the kitchen and back.

It had to be a killer feature.
Read the rest of this entry »

Written by Quassnoi

June 21st, 2009 at 11:00 pm

Posted in Miscellaneous

NVL vs. COALESCE

with 2 comments

From Stack Overflow:

Are there non obvious differences between NVL and COALESCE in Oracle?

The obvious differences are that COALESCE will return the first non-NULL item in its parameter list whereas NVL only takes two parameters and returns the first if it is not NULL, otherwise it returns the second.

It seems that NVL may just be a base case version of COALESCE.

Am I missing something?

In case of two agruments of same type, these functions are basically the same: they both return first non-NULL values.

However, they are implemented differently:

NVL always evaluates both arguments
COALESCE doesn't evaluate anything after it finds the first non-NULL argument

This behavior can be utilized.

Here's a real world example from a restaurant management system I developed about 10 years ago.

A bill that is presented to a customer consists of the items he or she ordered.

Majority of the dishes ordered are plain menu items: something that has its price printed in the menu.

However, there are some special things which are not included to the menu: a customer may want to order something exclusive and pay for it.

The manager should estimate the cost of ingredients, time and effort to make this dish, then calculate the price and name it: everything in 30 seconds.

These things are of course come overpriced, but the customers that make such orders don't seem to care much about money.

In the bill that appears as special order. When a manager enters it into the computer, she should also enter the price she had calculated.

These prices then get back to the server and are stored in a table (not the one where regular menu prices are stored).

At the end of month, a report to the month's receipts should be generated.

Here are the tables containing the data relevant to the reports. They of course are simplified, but the principle remains the same.
Read the rest of this entry »

Written by Quassnoi

June 20th, 2009 at 11:00 pm

Posted in Oracle

Ordering NULLs

Comments enabled. I *really* need your comment

From Stack Overflow:

In our product we have a generic search engine, and trying to optimize the search performance.

A lot of the tables used in the queries allow NULL values.

Should we redesign our table to disallow NULL values for optimization or not?

NULL values surely affect some optimizer decisions in Oracle.

It's widely known that Oracle won't include a column into index if all key values are NULL.

This means that if we don't apply a filter which would imply non-NULL condition on the column (like, any comparison operator in the WHERE clause, or a comparison operation in the ON condition of a JOIN clause), the index is not guaranteed to return all values, and a FULL SCAN will be required on the table.

It is less evident that ORDER BY queries can be affected too.

Let's create a sample table and see how:
Read the rest of this entry »

Written by Quassnoi

June 19th, 2009 at 11:00 pm

Posted in Oracle

Counting bans

Comments enabled. I *really* need your comment

From Stack Overflow:

I have a MyISAM table in MySQL 5.0 like this:

id login ip banned

I would like to find all users not banned (banned = 0) if at least 5 other users with the same ip have already been banned (banned = 1).

To do this, we should first find which IP's have more than 5 bans and then find all non-banned users for these IP's.

To find the IP's we can use two methods:

Find all banned IP's, group them and filter out those that have COUNT(*) < 5
For each IP possible, find if there are 5 banned records for this IP using a LIMIT clause

The first method doesn't require doing an extra JOIN on the table, but it will do count all values just to figure out if the COUNT(*) is more or less than 5.

The second method stops counting whenever it find the 5th value, but it requires an extra JOIN.

Counting is bad if there are many values to count, and the extra join is bad if there are many JOIN loops.

This means that the first method should be faster if there are lots of banned records, the, and the second method is faster for few banned records and few distinct IP's.

Let's create two sample tables and see:
Read the rest of this entry »

Written by Quassnoi

June 18th, 2009 at 11:00 pm

Posted in MySQL

Efficient EXISTS

Comments enabled. I *really* need your comment

In my previous article:

IN vs. JOIN vs. EXISTS

I compared efficiency of different methods to check for existence of a value in a subquery resultset.

These methods included IN and EXISTS predicates.

It was shown that in SQL Server, a poorly written EXISTS can be very slow, while an IN predicate is almost always efficiently optimized using one of the numerous SEMI JOIN methods.

However, a properly written EXISTS predicate can outperform IN.

Let's create a sample table and see how:
Read the rest of this entry »

Written by Quassnoi

June 17th, 2009 at 11:00 pm

Posted in SQL Server

IN vs. JOIN vs. EXISTS

with 16 comments

From Stack Overflow:

I'm wanting to select rows in a table where the primary key is in another table.

I'm not sure if I should use a JOIN or the IN operator in SQL Server 2005:
SELECT  *
FROM    a
WHERE   a.c IN
        (
        SELECT  d
        FROM    b
        )
SELECT  a.*
FROM    a
JOIN    b
ON      a.c = b.d
Is there any significant performance difference between these two SQL queries with a large dataset (i. e. millions of rows)?

Before we begin I should note that the queries provided by the author are not similar in fact and could produce different resultsets.

If the table used in the subquery returns a value twice, a JOIN will also return the matching rows twice, while an IN condition will return them only once.

That's why it's more correct to compare to the following query:

SELECT  a.*
FROM    a
JOIN    (
        SELECT  DISTINCT d
        FROM    b
        ) bo
ON      a.c = bo.d

Now, à nos moutons.

The folk wisdom advises against using IN constructs in favor of JOIN's because they say that the subquery will be executed many times in a loop.

Let's create some sample tables and see if it's really so:
Read the rest of this entry »

Written by Quassnoi

June 16th, 2009 at 11:00 pm

Posted in SQL Server

Group based identity

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm working with an application that adds a new row to the database, based on the last row meeting a certain criteria.

Here is an over-simplified visualization:

A1

A2

A3

B1

B2

Using the visualization above, a web page loads up the highest B value, which is 2.

Then, after some time, it wants to insert B3, the next record in the series.

However, it has to check to make sure that someone else didn't do the same thing.

In actuality, each row corresponds to a place that a piece of equipment visited in a factory.

The page (not easy to rewrite), currently moves the equipment to the next step.

Is there a standard pattern for dealing with this type of problem, or do I simply need to lock the table?

This question combines two other frequently asked questions:

How do I make group-based identity (i. e. keep several identity series identified by the value of a certain column)
How do I ensure there will be no identity gaps?

While there are numerious solutions that involve some kind of selecting MAX(column) and updating the row inside a transaction, there is a more simple way using linked lists.

We need to create the table as following:
Read the rest of this entry »

Written by Quassnoi

June 15th, 2009 at 11:00 pm

Posted in SQL Server

Flattening timespans: PostgreSQL

Comments enabled. I *really* need your comment

In previous three articles:

, I described various approaches to merging a set of intersecting timespans into a smaller set of larger, non-intersecting ranges.

Today, I'll describe how to do it in PostgreSQL.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000

2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000

2009.06.03 10:13:12:000 11:14:56:000

2009.06.03 10:27:25:000 10:27:31:000

2009.06.03 10:27:39:000 13:53:38:000

2009.06.03 11:14:56:000 11:15:03:000

2009.06.03 11:15:30:000 14:02:14:000

2009.06.03 13:53:38:000 13:53:43:000

2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

date	start	stop
2009.06.03	05:54:48:000	10:00:13:000
2009.06.03	09:26:45:000	09:59:40:000

date	start	stop
2009.06.03	10:12:50:000	10:27:25:000
2009.06.03	10:13:12:000	11:14:56:000
2009.06.03	10:27:25:000	10:27:31:000
2009.06.03	10:27:39:000	13:53:38:000
2009.06.03	11:14:56:000	11:15:03:000
2009.06.03	11:15:30:000	14:02:14:000
2009.06.03	13:53:38:000	13:53:43:000
2009.06.03	14:02:14:000	14:02:31:000

PostgreSQL implements various procedural languages, like pl/PGSQL, pl/perl etc., which can be used to solve almost any task using procedural approaches.

Using, for instance, pl/PGSQL, we can implement an approach similar to one we used for SQL Server.

We should create a set-returning function, open a cursor inside it and select all timespans ordered by start, keeping track of a running maximum of stop.

Whenever we find a timespan that starts later than end all previously started timespans, we know we have a gap.

Here's the sample table:
Read the rest of this entry »

Written by Quassnoi

June 14th, 2009 at 11:00 pm

Posted in PostgreSQL

Flattening timespans: MySQL

Comments enabled. I *really* need your comment

In the previous articles:

, I described how to merge intersecting timespans into several contiguous ranges.

In SQL Server, this task required a table-valued function (TVF) and a cursor.

In Oracle, we had to employ window functions to do this.

Now let's see how to solve this task in MySQL.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000

2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000

2009.06.03 10:13:12:000 11:14:56:000

2009.06.03 10:27:25:000 10:27:31:000

2009.06.03 10:27:39:000 13:53:38:000

2009.06.03 11:14:56:000 11:15:03:000

2009.06.03 11:15:30:000 14:02:14:000

2009.06.03 13:53:38:000 13:53:43:000

2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

date	start	stop
2009.06.03	05:54:48:000	10:00:13:000
2009.06.03	09:26:45:000	09:59:40:000

date	start	stop
2009.06.03	10:12:50:000	10:27:25:000
2009.06.03	10:13:12:000	11:14:56:000
2009.06.03	10:27:25:000	10:27:31:000
2009.06.03	10:27:39:000	13:53:38:000
2009.06.03	11:14:56:000	11:15:03:000
2009.06.03	11:15:30:000	14:02:14:000
2009.06.03	13:53:38:000	13:53:43:000
2009.06.03	14:02:14:000	14:02:31:000

MySQL offers easy use of session variables. These are loosely typed variables defined within the session scope which you can assign and evaluate right inside the query.

Using these variables may come in handy for tasks like this.

In MySQL, just like in previous solutions, we will need to order the timespans by start anc calculate the running maximum of the stop.

This is easily done using session variable @edate which we will use to store running maximum. We will update in in the SELECT clause.

We will also declare another variable @r, initialize it with a zero and use it to keep a range number.

If we have a timespan whose start exceeds the previous value of the running maximum then all timespans that had started eariler had also been stopped by this time, and we have a gap.

The current timespan, therefore, belongs to another range, and we need to increment the value of @r.

Finally, we will just group the timespans on the range number and return MIN(start) and MAX(stop) for all timespans in a range. Easy to see that these will be the start and the stop for the whole range.

Here is the sample table:
Read the rest of this entry »

Written by Quassnoi

June 13th, 2009 at 11:00 pm

Posted in MySQL

Flattening timespans: Oracle

Comments enabled. I *really* need your comment

In my previous article:

Flattening timespans: SQL Server

I described a function that merges intersecting timespanss into one contiguous range in SQL Server.

This query turned out to be a good illustration for window functions, so now we'll try to repeat this query in Oracle which is superb in terms of window functions support.

A quick reminder of the problem, taken from Stack Overflow:

I have lots of data with start and stop times for a given ID and I need to flatten all intersecting and adjacent timespans into one combined timespan.

To make things a bit clearer, take a look at the sample data for 03.06.2009:

The following timespans are overlapping or contiunous and need to merge into one timespan:

date start stop
2009.06.03 05:54:48:000 10:00:13:000

2009.06.03 09:26:45:000 09:59:40:000

The resulting timespan would be from 05:54:48 to 10:00:13.

Since there's a gap between 10:00:13 and 10:12:50, we also have the following timespans:

date start stop
2009.06.03 10:12:50:000 10:27:25:000

2009.06.03 10:13:12:000 11:14:56:000

2009.06.03 10:27:25:000 10:27:31:000

2009.06.03 10:27:39:000 13:53:38:000

2009.06.03 11:14:56:000 11:15:03:000

2009.06.03 11:15:30:000 14:02:14:000

2009.06.03 13:53:38:000 13:53:43:000

2009.06.03 14:02:14:000 14:02:31:000

which result in one merged timespan from 10:12:50 to 14:02:31, since they're overlapping or adjacent.

Any solution, be it SQL or not, is appreciated.

date	start	stop
2009.06.03	05:54:48:000	10:00:13:000
2009.06.03	09:26:45:000	09:59:40:000

date	start	stop
2009.06.03	10:12:50:000	10:27:25:000
2009.06.03	10:13:12:000	11:14:56:000
2009.06.03	10:27:25:000	10:27:31:000
2009.06.03	10:27:39:000	13:53:38:000
2009.06.03	11:14:56:000	11:15:03:000
2009.06.03	11:15:30:000	14:02:14:000
2009.06.03	13:53:38:000	13:53:43:000
2009.06.03	14:02:14:000	14:02:31:000

As you may remember, we had to use the cursors in the SQL Server decision, since SQL Server lacks a way to calculate a running maximum.

But Oracle does have such a way, so we don't need cursors and functions here, and everything can be done in a single query.

The principle remains the same and is just changed a little to fit into one query:

Assign a ROW_NUMBER() to each row (since the rows lack a PRIMARY KEY, we will need it later to use as one).
Order the rows by start and calculate running maximum for the stop.
Pick all rows that have a start greater than the running maximum of the previous row. Having such a timespan means that all timespans evaluated before had started earlier than the given one (since they are ordered by start) and also had ended earlier than the given one have started (since its start exceeds the running maximum). Therefore, we have a gap.
Pick the ROW_NUMBER() of each gap row along with the ROW_NUMBER() of the record that comes right before the gap row next to the given one.
Use these ROW_NUMBER()'s to find out the start and the stop of the corresponding records. These will be the timespan bounds.

Let's create a sample table and compose the query:
Read the rest of this entry »

Written by Quassnoi

June 12th, 2009 at 11:00 pm

Posted in Oracle

EXPLAIN EXTENDED

Archive for 2009

Click

NVL vs. COALESCE

Ordering NULLs

Counting bans

Efficient EXISTS

IN vs. JOIN vs. EXISTS

Group based identity

Flattening timespans: PostgreSQL

Flattening timespans: MySQL

Flattening timespans: Oracle

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow

Archive for 2009

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow