EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: 5 Claims About SQL, Explained. You're welcome to read and comment on it.

Happy New Year!

with 3 comments

In one of my previous New Year's posts we drew snowflakes in PostgreSQL.

The algorithm we used to create the snowflakes is an implementation of an L-system, which is an example of a fractal. There are many more beautiful objects we can see in the winter: frozen trees, frost patterns on windows, cracks on ice etc., all of them being fractals.

Frost patterns

Today we will be constructing escape-time fractals. To build such a fractal, one would need to run a certain function for each point on the plane iteratively and see how many iterations does it take for the function to overflow: the more it takes, the brighter is the point's color.

I won't go deep into fractal theory now, just show that they can be constructed with SQL relatively easily. For instance, Mandelbrot set, one of the best known escape-time fractals, is almost a one-liner in PostgreSQL:

Read the rest of this entry »

Written by Quassnoi

December 31st, 2013 at 11:00 pm

Posted in PostgreSQL

5 Claims About SQL, Explained

Comments enabled. I *really* need your comment

1. Is SQL is a declarative language?

First of all, let's define the terms.

A declarative language describes the result, not the process to achieve it. In a declarative language you tell "what do I need".

A procedural language describes control flow: exact steps the machine needs to perform in order to achieve this result. In a procedural language you tell "what do you do".

SQL, as it was originally designed, is a declarative language.

2980020699_0dfbe889f7_o

For instance you need to know how many customers above 40 live in each city. You issue a query like:

Read the rest of this entry on tech.pro »

Written by Quassnoi

October 28th, 2013 at 11:00 pm

Posted in Miscellaneous

NULL in SQL: explaining its behavior

with one comment

On my blog feedback page I get lots of questions which essentially boil down to one thing: "Those NULL things in databases work in a way I don't freaking get!"

Let's have them explained a little.

The Wikipedia page defines NULL this way:

Introduced by the creator of the relational database model, E. F. Codd, SQL Null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information".

For people new to the subject, a good way to remember what null means is to remember that in terms of information, "lack of a value" is not the same thing as "a value of zero"; similarly, "lack of an answer" is not the same thing as "yes" or "no".

Rather than trying to come up with intuitive rules for how NULL behaves, it's easier to expand on the definition above to demonstrate the motivation behind introducing the NULL at all, and why it ended up so complex.

NULL is like an "unknown variable" in algebraic equation

Let's step away from relations and tables for a moment, and remember some old school math. What is this?

Read the rest of this entry on tech.pro »

Written by Quassnoi

September 27th, 2013 at 11:00 pm

Posted in Miscellaneous

Did You Know You Could Index SUM() and COUNT(): SQL Server Indexed Views, Part 3

Comments enabled. I *really* need your comment

A friend of mine who owns a stock photo website once told me he had designed an awesome photo auto-tagging algorithm which was 80% correct: just tag all photos cats and have four hits out of five.

Cat

But seriously, how do we calculate most used tags on photos?

For those not familiar with different tagging models, I highly recommend this old but very useful article: Tags: Database schemas.

Now, in this tutorial, we'll use the toxi approach to tagging. Later in the article I'll explain why we would choose that approach over the others.

Let us create a simple schema which would allow us to store, tag and sell photos:

Read the rest of this entry on tech.pro »

Written by Quassnoi

August 26th, 2013 at 11:00 pm

Posted in SQL Server

Storing a Friendship: SQL Server Indexed Views, Part 2

Comments enabled. I *really* need your comment

You're building a social network and want to store who is friends with whom. OK. Let's assume you want Facebook-style friends: by agreeing to be friends with someone you allow them to be friends with you too. This is called a symmetric relationship and if your social model sticks to that kind of relationship, your application should enforce its symmetry.

How shall we store friends in a database table? There are several options.

Initiator first

A table with one record per friendship and two fields for each of the friends. The person who requested the friendship is stored in the first field:

Read the rest of this entry on tech.pro »

Written by Quassnoi

August 14th, 2013 at 11:00 pm

Posted in SQL Server

10 things in SQL Server (which don’t work as expected)

Comments enabled. I *really* need your comment

10. Unique NULL

Snowflake_300um_LTSEM[1]

When declaring UNIQUE constraints on columns, SQL Server treats all NULL values as unique. This means you can't have more than one NULL value in a unique column.

This behavior violates the standard and is specific to SQL Server.

To work around this, implement the constraint by declaring a UNIQUE index which would exclude NULL values:

Read the rest of this entry on tech.pro »

Written by Quassnoi

July 19th, 2013 at 11:00 pm

Posted in SQL Server

Score charts, friendship lists and unique multiple columns: easy with SQL Server’s indexed views

Comments enabled. I *really* need your comment

SQL Server supports a really useful feature called indexed views. This means it lets you materialize results of some queries and build indexes over them. It’s as if you haven’t actually run the query yet - but it’s already complete, stored conveniently on the disk for you, ordered and waiting for you to read its results. Not all queries are eligible but some quite often used by the database developers are.

In this series, I'll describe three scenarios which could really benefit from using indexed views.

Scenario One: Unique multiple columns

Assume you are building a database for multiplayer online tic-tac-toe tournaments. The first thing you’d want would be recording game results: who played with whom and who won. This can be easily achieved with a simple layout:

Table game
id round x o winner start duration

Now we have additional requirement: no person should be able to play in a tournament more than once. How do we do that?


Read the rest of this entry on tech.pro »

Written by Quassnoi

June 27th, 2013 at 11:00 pm

Posted in SQL Server

SQL Server: joining NULL values

with one comment

From Stack Overflow:

UPDATE  n
SET    AddressID = a.AddressID
FROM   #NewAddress n
JOIN   dbo.Address a
ON     (a.[LineA] = n.[LineA] OR (a.[LineA] is null AND n.[LineA] is null))
       AND (a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null))
       AND (a.[LineC] = n.[LineC] OR (a.[LineC] is null AND n.[LineC] is null))
WHERE  n.Processed = 0

Any ideas on how to UNION or EXISTS this query? This seems to be a pretty common join condition I'm encountering.

Here's a JOIN condition: we should join two tables on a common key, also treating NULL values as equal. Normal equality conditions don't work here: in SQL, NULL values are not considered equal (strictly speaking, their equality is a boolean NULL rather than true or false).

ANSI standard defines a special predicate for that: IS NOT DISTINCT FROM, which is supported by PostgreSQL. This treats values the same value as DISTINCT does: two equal values, as well as two NULL values, are considered not distinct. MySQL also supports similar operator, < =>. Unfortunately, SQL Server supports neither.

In SQL Server, we can emulate this by using set operators which distinctify the queries. Set operators work on sets, so we would need some kind of a SELECT query for them. This would also help us to overcome SQL Server's inability to compare tuples directly. Basically, we need to tell if two tuples: (a.lineA, a.lineB, a.lineC) and (n.lineA, n.lineB, n.lineC) are distinct or not.

We can do that by selecting those tuples in dummy queries (those without a FROM clause), intersecting those queries (which would give us a one-record or zero-record resultset, depending on whether the two tuples are distinct or not) and applying EXISTS to the resultset. EXISTS is a boolean predicate and as such can serve as a join condition.

Let's check it. We'll create two tables, fill them with random values (including occasional NULLs here and there):

Read the rest of this entry »

Written by Quassnoi

May 8th, 2013 at 11:00 pm

Posted in SQL Server

INSERT … SELECT … OUTPUT …: returning source table data

with 3 comments

From Stack Overflow:

I am trying to insert into the target table values from the source table, but also insert the primary key of the source table into the tracking table.

INSERT
INTO    TRN_TEMP (TRN_TRAN_DATE, TRN_DESCRIPTION, TRN_AMT)
OUTPUT  D.DET_PRIMARY, GETDATE()
INTO    REGISTER (DET_PRIMARY_LINK, INS_DATE)
SELECT  D.DET_DATE, 'SOMETEXT', SUM(D.DET_NET)
FROM    DETAIL D
LEFT JOIN
        REGISTER R
ON      R.DET_PRIMARY_LINK = D.DET_PRIMARY
WHERE   R.LINE_ID IS NULL -- TO REMOVE LINES ALREADY PROCESSED
GROUP BY 
        D.DET_DATE

I can't see a problem with the text above but I get an error:

The multi part identifier 'D.DET_PRIMARY' could not be bound.

I've tried both D.DET_DETAIL and DETAIL.DET_DETAIL and the error is the same.

This is a limitation of SQL Server which can be easily worked around by substituting MERGE instead of INSERT.

Let's create some sample tables and see:

CREATE TABLE
        source
        (
        id INT NOT NULL PRIMARY KEY,
        grouper INT NOT NULL,
        value INT NOT NULL
        )

CREATE TABLE
        destination
        (
        id INT NOT NULL IDENTITY PRIMARY KEY,
        grouper INT NOT NULL,
        aggregated_value INT NOT NULL
        )

CREATE TABLE
        register
        (
        dest_id INT NOT NULL PRIMARY KEY,
        max_source_id INT NOT NULL UNIQUE,
        grouper INT NOT NULL UNIQUE,
        regdate DATE NOT NULL,
        )

INSERT
INTO    source
VALUES
        (1, 1, 5),
        (2, 1, 2),
        (3, 2, 6),
        (4, 2, 8)

, and run the query:

MERGE
INTO    destination d
USING   (
        SELECT  grouper, SUM(value), MAX(id)
        FROM    source
        WHERE   grouper NOT IN
                (
                SELECT  grouper
                FROM    register
                )
        GROUP BY
                grouper
        ) s (grouper, aggregated_value, max_source_id)
ON      (1 = 0)
WHEN NOT MATCHED THEN
INSERT  (grouper, aggregated_value)
VALUES  (grouper, aggregated_value)
OUTPUT  INSERTED.id, s.max_source_id, s.grouper, GETDATE()
INTO    register;

SELECT  *
FROM    register;

You can try it yourself on SQL Fiddle.

As you can see, we are using both destination table's INSERTED.id, source table's s.max_source_id, s.grouper and an arbitrary expression GETDATE() in the OUTPUT clause which works just alright.

An alert reader may notice that there is another problem with the original query, but this will be covered in the next post. Watch this space!

Written by Quassnoi

March 25th, 2013 at 11:00 pm

Empty row if condition does not match

with 4 comments

Just found that in a Google referer to the blog:

I want SQL to return blank row even if the condition does not match

This may be useful for certain ORMs which always expect a single row as a result of a query.

Say, we have a query like that:

SELECT  *
FROM    mytable
WHERE   id = 42

and want it to return a single row (possibly consisting of NULL values) no matter what.

If we had a join and the condition in the ON clause:

SELECT  m.*
FROM    values v
JOIN    mytable m
ON      m.id = v.value

, we could just rewrite an INNER JOIN to a LEFT JOIN.

SELECT  m.*
FROM    values v
LEFT JOIN
        mytable m
ON      m.id = v.value

This way, we would have at least one record returned for each entry in values.

In our original query we don't have a table to join with. But we can easily generate it:

SELECT  m.*
FROM    (
        SELECT  42 AS value
        ) v
LEFT JOIN
        mytable m
ON      m.id = v.value

If id is a PRIMARY KEY on mytable, this query would return exactly one record, regardless of whether it such an id exists in mytable or not.

Written by Quassnoi

January 3rd, 2013 at 11:00 pm

Posted in MySQL