EXPLAIN EXTENDED

How to create fast database queries

Archive for the ‘SQL Server’ Category

SQL Server: deleting with self-referential FOREIGN KEY, handling loops

with 2 comments

From comments to one of my previous posts: SQL Server: deleting with self-referential FOREIGN KEY

In my case the query goes to infinite loop.

If I use OPTION (MAXRECURSION 365) or OPTION (MAXRECURSION 3267), it fails when reaching highest recursion value.

If I use 0 (which is infinite) the query runs for an hour with no output.

I understand it is a data related issue which is causing infinite loop, but I cannot control the data being sent to the DB.

Any other way we can handle it?

Thanks,
Jay

That's a common situation, and we would sometimes need to clean the mess up (i. e. remove the whole loop).

To do this, we would just need to add a little check condition to the recursive CTE.

We just record the topmost entry (or better say the entry we first use in the loop, as there is obviously no topmost entry), and check if we ever come across it again as we do the recursion.

Once we have it on next iteration, we know it's a loop and it's time to stop.

Here's how we do this:

CREATE TABLE TestComposite (
        id INT NOT NULL, 
        siteUrl NVARCHAR(255) NOT NULL,
        name NVARCHAR(MAX) NOT NULL,
        parentId INT NULL,
        PRIMARY KEY (id, siteUrl),
        FOREIGN KEY (parentId, siteUrl) REFERENCES TestComposite (id, siteUrl)
)
GO

INSERT
INTO    TestComposite
VALUES  (1,  'site1', 'name1', NULL),
        (2,  'site2', 'name2', 4),
        (3,  'site2', 'name3', 2),
        (4,  'site2', 'name4', 3),
        (5,  'site5', 'name5', NULL),
        (6,  'site5', 'name6', 5)
GO

WITH    q (start, id, siteUrl) AS
        (
        SELECT  id, id, siteUrl
        FROM    TestComposite
        WHERE   id = 2
                AND siteUrl = 'site2'
        UNION ALL
        SELECT  start, tc.id, tc.siteUrl
        FROM    q
        JOIN    TestComposite tc
        ON      tc.parentID = q.id
                AND tc.siteUrl = q.siteUrl
        WHERE   tc.id <> q.start
        )
DELETE
FROM    TestComposite
OUTPUT  DELETED.*
WHERE   EXISTS
        (
        SELECT  id, siteUrl
        INTERSECT
        SELECT  id, siteUrl
        FROM    q
        )

We had a loop on 2 -> 3 -> 4 which is gone after the query.

See this query on SQL Fiddle.

Written by Quassnoi

July 10th, 2014 at 9:00 pm

Posted in SQL Server

Did You Know You Could Index SUM() and COUNT(): SQL Server Indexed Views, Part 3

Comments enabled. I *really* need your comment

A friend of mine who owns a stock photo website once told me he had designed an awesome photo auto-tagging algorithm which was 80% correct: just tag all photos cats and have four hits out of five.

Cat

But seriously, how do we calculate most used tags on photos?

For those not familiar with different tagging models, I highly recommend this old but very useful article: Tags: Database schemas.

Now, in this tutorial, we'll use the toxi approach to tagging. Later in the article I'll explain why we would choose that approach over the others.

Let us create a simple schema which would allow us to store, tag and sell photos:

Read the rest of this entry on tech.pro »

Written by Quassnoi

August 26th, 2013 at 11:00 pm

Posted in SQL Server

Storing a Friendship: SQL Server Indexed Views, Part 2

with one comment

You're building a social network and want to store who is friends with whom. OK. Let's assume you want Facebook-style friends: by agreeing to be friends with someone you allow them to be friends with you too. This is called a symmetric relationship and if your social model sticks to that kind of relationship, your application should enforce its symmetry.

How shall we store friends in a database table? There are several options.

Initiator first

A table with one record per friendship and two fields for each of the friends. The person who requested the friendship is stored in the first field:

Read the rest of this entry on tech.pro »

Written by Quassnoi

August 14th, 2013 at 11:00 pm

Posted in SQL Server

10 things in SQL Server (which don’t work as expected)

Comments enabled. I *really* need your comment

10. Unique NULL

Snowflake_300um_LTSEM[1]

When declaring UNIQUE constraints on columns, SQL Server treats all NULL values as unique. This means you can't have more than one NULL value in a unique column.

This behavior violates the standard and is specific to SQL Server.

To work around this, implement the constraint by declaring a UNIQUE index which would exclude NULL values:

Read the rest of this entry on tech.pro »

Written by Quassnoi

July 19th, 2013 at 11:00 pm

Posted in SQL Server

Score charts, friendship lists and unique multiple columns: easy with SQL Server’s indexed views

Comments enabled. I *really* need your comment

SQL Server supports a really useful feature called indexed views. This means it lets you materialize results of some queries and build indexes over them. It’s as if you haven’t actually run the query yet - but it’s already complete, stored conveniently on the disk for you, ordered and waiting for you to read its results. Not all queries are eligible but some quite often used by the database developers are.

In this series, I'll describe three scenarios which could really benefit from using indexed views.

Scenario One: Unique multiple columns

Assume you are building a database for multiplayer online tic-tac-toe tournaments. The first thing you’d want would be recording game results: who played with whom and who won. This can be easily achieved with a simple layout:

Table game
id round x o winner start duration

Now we have additional requirement: no person should be able to play in a tournament more than once. How do we do that?

Read the rest of this entry on tech.pro »

Written by Quassnoi

June 27th, 2013 at 11:00 pm

Posted in SQL Server

SQL Server: joining NULL values

with one comment

From Stack Overflow:

UPDATE  n
SET    AddressID = a.AddressID
FROM   #NewAddress n
JOIN   dbo.Address a
ON     (a.[LineA] = n.[LineA] OR (a.[LineA] is null AND n.[LineA] is null))
       AND (a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null))
       AND (a.[LineC] = n.[LineC] OR (a.[LineC] is null AND n.[LineC] is null))
WHERE  n.Processed = 0

Any ideas on how to UNION or EXISTS this query? This seems to be a pretty common join condition I'm encountering.

Here's a JOIN condition: we should join two tables on a common key, also treating NULL values as equal. Normal equality conditions don't work here: in SQL, NULL values are not considered equal (strictly speaking, their equality is a boolean NULL rather than true or false).

ANSI standard defines a special predicate for that: IS NOT DISTINCT FROM, which is supported by PostgreSQL. This treats values the same value as DISTINCT does: two equal values, as well as two NULL values, are considered not distinct. MySQL also supports similar operator, < =>. Unfortunately, SQL Server supports neither.

In SQL Server, we can emulate this by using set operators which distinctify the queries. Set operators work on sets, so we would need some kind of a SELECT query for them. This would also help us to overcome SQL Server's inability to compare tuples directly. Basically, we need to tell if two tuples: (a.lineA, a.lineB, a.lineC) and (n.lineA, n.lineB, n.lineC) are distinct or not.

We can do that by selecting those tuples in dummy queries (those without a FROM clause), intersecting those queries (which would give us a one-record or zero-record resultset, depending on whether the two tuples are distinct or not) and applying EXISTS to the resultset. EXISTS is a boolean predicate and as such can serve as a join condition.

Let's check it. We'll create two tables, fill them with random values (including occasional NULLs here and there):

Read the rest of this entry »

Written by Quassnoi

May 8th, 2013 at 11:00 pm

Posted in SQL Server

INSERT … SELECT … OUTPUT …: returning source table data

with 3 comments

From Stack Overflow:

I am trying to insert into the target table values from the source table, but also insert the primary key of the source table into the tracking table.

INSERT
INTO    TRN_TEMP (TRN_TRAN_DATE, TRN_DESCRIPTION, TRN_AMT)
OUTPUT  D.DET_PRIMARY, GETDATE()
INTO    REGISTER (DET_PRIMARY_LINK, INS_DATE)
SELECT  D.DET_DATE, 'SOMETEXT', SUM(D.DET_NET)
FROM    DETAIL D
LEFT JOIN
        REGISTER R
ON      R.DET_PRIMARY_LINK = D.DET_PRIMARY
WHERE   R.LINE_ID IS NULL -- TO REMOVE LINES ALREADY PROCESSED
GROUP BY 
        D.DET_DATE

I can't see a problem with the text above but I get an error:

The multi part identifier 'D.DET_PRIMARY' could not be bound.

I've tried both D.DET_DETAIL and DETAIL.DET_DETAIL and the error is the same.

This is a limitation of SQL Server which can be easily worked around by substituting MERGE instead of INSERT.

Let's create some sample tables and see:

CREATE TABLE
        source
        (
        id INT NOT NULL PRIMARY KEY,
        grouper INT NOT NULL,
        value INT NOT NULL
        )

CREATE TABLE
        destination
        (
        id INT NOT NULL IDENTITY PRIMARY KEY,
        grouper INT NOT NULL,
        aggregated_value INT NOT NULL
        )

CREATE TABLE
        register
        (
        dest_id INT NOT NULL PRIMARY KEY,
        max_source_id INT NOT NULL UNIQUE,
        grouper INT NOT NULL UNIQUE,
        regdate DATE NOT NULL,
        )

INSERT
INTO    source
VALUES
        (1, 1, 5),
        (2, 1, 2),
        (3, 2, 6),
        (4, 2, 8)

, and run the query:

MERGE
INTO    destination d
USING   (
        SELECT  grouper, SUM(value), MAX(id)
        FROM    source
        WHERE   grouper NOT IN
                (
                SELECT  grouper
                FROM    register
                )
        GROUP BY
                grouper
        ) s (grouper, aggregated_value, max_source_id)
ON      (1 = 0)
WHEN NOT MATCHED THEN
INSERT  (grouper, aggregated_value)
VALUES  (grouper, aggregated_value)
OUTPUT  INSERTED.id, s.max_source_id, s.grouper, GETDATE()
INTO    register;

SELECT  *
FROM    register;

You can try it yourself on SQL Fiddle.

As you can see, we are using both destination table's INSERTED.id, source table's s.max_source_id, s.grouper and an arbitrary expression GETDATE() in the OUTPUT clause which works just alright.

An alert reader may notice that there is another problem with the original query, but this will be covered in the next post. Watch this space!

Written by Quassnoi

March 25th, 2013 at 11:00 pm

What’s UNPIVOT good for?

with 5 comments

Answering questions asked on the site.

Karen asks:

… I've always thought PIVOT and UNPIVOT are signs of a poorly designed database. I mean, is there a legitimate use for them if your model is OK?

I've made an actual use for them in a project I've been working on for the last several months (which is partly why there were no updates for so long!)

Part of the project is a task management system where each task has several persons related to it. There can be the creator of the task, the person the task is assigned to, the actual author of the task (on behalf of whom the task is created), and the task can be possible completed by a person or deleted by a person. A total of 5 fields related to persons.

Now, we need to take all tasks within a certain time range and list all people involved in them.

Let's create a sample table and see how would we do that.

Read the rest of this entry »

Written by Quassnoi

June 30th, 2011 at 11:00 pm

Posted in SQL Server

GROUP_CONCAT in SQL Server

with one comment

I'm finally back from my vacation. Tunisia's great: dates, Carthage, sea and stuff.

Now, to the questions.

Mahen asks:

Create a table called Group:

Group
id prodname
1 X
1 Y
1 Z
2 A
2 B
2 C

The resultset should look like this:

id prodname
1 X,Y,Z
2 A,B,C

Can you please help me to solve the above problem using a recursive CTE?

This is out good old friend, GROUP_CONCAT. It's an aggregate function that returns all strings within a group, concatenated. It's somewhat different from the other aggregate functions, because, first, dealing with the concatenated string can be quite a tedious task for the groups with lots of records (large strings tend to overflow), and, second, the result depends on the order of the arguments (which is normally not the case for the aggregate functions). It's not a part of a standard SQL and as for now is implemented only by MySQL with some extra vendor-specific keywords (like ORDER BY within the argument list).

This functionality, however, is often asked for and I have written some articles about implementing this in PostgreSQL and Oracle.

Now, let's see how to do it in SQL Server.

Usually, SQL Server's FOR XML clause is exploited to concatenate the strings. To do this, we obtain a list of group identifiers and for each group, retrieve all it's product names with a subquery appended with FOR XML PATH(''). This makes a single XML column out of the recordset:
Read the rest of this entry »

Written by Quassnoi

June 21st, 2010 at 11:00 pm

Posted in SQL Server

SQL Server: deleting with self-referential FOREIGN KEY

with 5 comments

From Stack Overflow:

I have an SQL Server table defined as below:

TestComposite
id (PK) siteUrl (PK) name parentId
1 site1 Item1 NULL
2 site1 Item2 NULL
3 site1 Folder1 NULL
4 site1 Folder1.Item1 3
5 site1 Folder1.Item2 3
6 site1 Folder1.Folder1 3
7 site1 Folder1.Folder1.Item1 6

Items and folders are stored inside the same table

If an item is inside a folder, the parentID column is the id of the folder.

I would like to be able to DELETE CASCADE items/folders when I delete a folder.

I tried to define a constraint similar to:

ALTER TABLE [TestComposite] 
ADD CONSTRAINT fk_parentid 
FOREIGN KEY (ParentID, SiteUrl) 
REFERENCES [TestComposite] (ID, SiteUrl) ON DELETE CASCADE

, but it gives me this error:

Introducing FOREIGN KEY constraint 'fk_parentid' on table 'TestComposite' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.

SQL Server does support chained CASCADE updates, but does not allow one table to participate more that once in a chain (i. e. does not allow loops).

SQL Server, unlike most other engines, optimizes cascading DML operations to be set-based which requires building a cycle-free DML order (which you can observe in the execution plan). With the loops, that would not be possible.

However, it is possible to define such a constraint without cascading operations, and with a little effort it is possible to delete a whole tree branch at once.

Let's create a sample table:
Read the rest of this entry »

Written by Quassnoi

March 3rd, 2010 at 11:00 pm

Posted in SQL Server