Archive for May, 2009
Aggregate AND
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with a foreign key and a boolean value (and a bunch of other columns that aren't relevant here), as such:
CREATE TABLE myTable ( someKey integer, someBool boolean ); INSERT INTO myTable VALUES (1, 't'), (1, 't'), (2, 'f'), (2, 't');Each
someKey
could have 0 or more entries.For any given
someKey
, I need to know if
- All the entries are true, or
- Any of the entries are false
Basically, it's an
AND
.
This solution is often used to represent polls that should be unanimous for the decision to be made (i. e. anyone can put a veto on the decision).
PostgreSQL offers a special aggregate BOOL_AND
to do this.
However, an aggregate may be less efficient here.
The return value of an AND
function is constrained by finding certain values:
- Whenever a
FALSE
is found, the return value cannot beTRUE
anymore. It's eitherFALSE
orNULL
. - Whenever a
NULL
is found, the return value isNULL
Aggregates in PostgreSQL, however, won't take this into account.
What we need here is a method to stop and return whenever first NULL
or FALSE
value is found.
Let's create a sample table and see how can it may be done:
Read the rest of this entry »
Hierarchical queries in PostgreSQL
Comments enabled. I *really* need your comment
Note: this article concerns PostgreSQL 8.3 and earlier.
For hierarchical queries in PostgreSQL 8.4 and higher, see this article:
In one of the previous articles I wrote about using hierarchical queries in MySQL:
PostgreSQL has a contrib module to implement the same functionality.
However, it's not always possible to install and use contribs. Same is true for procedural languages.
Fortunately, this functionality can be implemented using a plain SQL function.
Let's create a sample table and see how it works:
Read the rest of this entry »
Generating XML in subqueries
From Stack Overflow:
I have a query that looks like
SELECT P.Column1, P.Column2, P.Column3, ( SELECT A.ColumnX, A.ColumnY FROM dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A WHERE A.Key = P.Key FOR XML AUTO, TYPE ), ( SELECT B.ColumnX, B.ColumnY, FROM dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B WHERE B.Key = P.Key FOR XML AUTO, TYPE ) FROM ( /* joined tables here */ ) AS P FOR XML AUTO, ROOT('ROOT')
P
has about 5,000 rows.
A
andB
, about 4,000 rows each.This query has a runtime performance of more than 10 minutes.
Changing it to this, however:
SELECT P.Column1, P.Column2, P.Column3 INTO #P FROM ( /* joined tables here ) p SELECT A.ColumnX, A.ColumnY INTO #A FROM dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A SELECT B.ColumnX, B.ColumnY INTO #B FROM dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B SELECT P.Column1, P.Column2, P.Column3, ( SELECT A.ColumnX, A.ColumnY FROM #A AS A WHERE A.Key = P.Key FOR XML AUTO, TYPE ), ( SELECT B.ColumnX, B.ColumnY, FROM #B AS B WHERE B.Key = P.Key FOR XML AUTO, TYPE ) FROM #P AS P FOR XML AUTO, ROOT('ROOT'), has a performance of about 4 seconds.
This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default.
My inclination is that SQL is doing the wrong type of "join" with the subquery, but maybe I've missed it, there's no way to specify the join type to use with correlated subqueries.
Is there a way to achieve this without using
#temptable
's or@table_variable
's via indexes and/or hints?
The reason for this is obvious: SQL Server needs to reevaluate the function each time the subquery is called, which is long.
When the subquery result is cached in the temp table, SQL Server will just build an index over this table in an Eager Spool
, and use this index in the joins.
But is it possible to increase performance without using temporary tables?
Let's create sample functions, tables and see:
Read the rest of this entry »
Converting currencies
Comments enabled. I *really* need your comment
When developing a financial application, one needs to deal with currencies and convert from one currency to another.
There may be different scenarios, but for accounting purposes the currency rates set and published by appropriate official regulator once a day are used most often.
A typical task looks like this: given a set of transactions in one currency, get an appropriate amount in another currency.
Let's create sample tables and see how it may be done:
Read the rest of this entry »
Selecting options
Comments enabled. I *really* need your comment
From Stack Overflow:
I have three checkboxes in my application.
If the user ticks a combination of the boxes, I want to return matches for the boxes ticked, and in the case no box is checked I just want to return everything.
Can I do this with a single
SQL
query?
Assuming MySQL.
The principle is simple: we need to enable filtering only if there is some data to filter. If there are no data, we should return all values.
This is best achieved by using two mutually exclusive SELECT
statements with a UNION ALL
.
Let's create a sample table and try to construct the query:
Read the rest of this entry »
Genealogy: finding a descendant
Comments enabled. I *really* need your comment
In the previous article I described an efficient way to build a genealogy with a hierarchical query in Oracle:
Now, I'd like to describe a query to check if one entity in the table is a descendant on another one.
Let's create a sample table:
Read the rest of this entry »
Genealogy query on both parents
Comments enabled. I *really* need your comment
From Stack Overflow:
I'm trying to fetch a genealogy tree of animals from my Oracle database.
Here's the table:
Animal_ID Parent_Male_ID Parent_Female_ID If I specify an animal, I can get all of its descendants (on the male side) using something like this:
SELECT * FROM animal START WITH animal_id = 123 CONNECT BY PRIOR animal_id = parent_male_idI'm trying to find a way to extend this in such a way that if I specify an animal, it will fetch both parents and then will fetch all of their descendants.
Any thoughts?
This is possible to do with a same CONNECT BY
query by providing a little more comlex condition for START WITH
and CONNECT BY
:
SELECT DISTINCT(animal_id) AS animal_id FROM animal START WITH animal_id IN ( SELECT father FROM animal WHERE animal_id = 9500 UNION ALL SELECT mother FROM animal WHERE animal_id = 9500 ) CONNECT BY PRIOR animal_id IN (father, mother) ORDER BY animal_id
However, my gut feeling says this query need performance checking.
Let's create a sample table:
Read the rest of this entry »
Ordering columns independently
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with 5 columns in it.
What's the easiest way to select all rows, but where each column is individually randomized?
All I can think of is to select each column separately, along with
ROW_NUMBER() OVER (ORDER BY NEWID()) AS lookup
, and then join each column back together on lookup.
Is there an easier way?
Though it's possible to do this by means of JOIN
'ing, there is an easier way.
Since all SQL
operations are row-level, we will need to make 5 independent ordered sets, which we later can use to order each column set. This implies having 5 times as much rows as there are rows in the original table.
But instead of JOIN
'ing, we can use SQL Server's clause which maps rows to columns, which is called PIVOT
.
We need to do the following:
- Generate 5 sets of column values and
UNION ALL
them. - For each set, select a
ROW_NUMBER() OVER (ORDER BY NEWID())
along with each row. It will be unique for each row in any set and will give a row identifier forPIVOT
'ing. - For each set, select a set identifier: just a string with column name. This will give a column identifier for
PIVOT
'ing - Finally,
PIVOT
the column values: they will get into appropriate row and column, based onROW_NUMBER()
and the set name
Since PIVOT
requires aggregation (in case row and column identifiers are not unique), we need some aggregation function, but our query is designed so that there is exactly one row with a given ROW_NUMBER()
and set name, so any aggregation function will do.
We will use MIN
.
Now, let's create a sample table and see how it works:
Read the rest of this entry »
Choosing column order
From Stack Overflow:
I know you can
ALTER
the column order in MySQL withFIRST
andAFTER
, but why would you want to bother?Since good queries explicitly name columns when inserting data, is there really any reason to care what order your columns are in in the table?
Well, there is reason.
MySQL stores some types (out of which VARCHAR
is probably the most important) with variable length.
That means that each variable field stored in the row is prepended with its actual length, like in Pascal string type.
Since the rows have no TOC, it means that the engine should traverse all previous values to find out where the required value begins.
If we need the value of the first column (first non-PRIMARY KEY
column, of course), the engine will do it once per row.
But if we need the value of, say, 20-th column, the engine should find out the lengths of previous 19 columns and add them together.
Let's create a sample table and see how big the impact is:
Read the rest of this entry »
Oracle: matching SYS_GUID’s
Answering questions asked on the site.
Habib asks:
I have a table with a
PRIMARY KEY
defined asConsumerID RAW(16) NOT NULL
.However, when I issue a query like this:
SELECT * FROM Consumers WHERE ConsumerID = '1DFFC45DEDEB42B8B79B060D544C2ACB', it uses a
TABLE ACCESS FULL
and is very slow.How can I improve this query?
Defining a PRIMARY KEY
column as RAW(16)
almost always means using system generated GUID
's returned by SYS_GUID
as PRIMARY KEY
's.
Let's create a sample table and see what's going on:
Read the rest of this entry »