EXPLAIN EXTENDED

How to create fast database queries

Archive for July, 2014

SQL Server: deleting with self-referential FOREIGN KEY, handling loops

with 2 comments

From comments to one of my previous posts: SQL Server: deleting with self-referential FOREIGN KEY

In my case the query goes to infinite loop.

If I use OPTION (MAXRECURSION 365) or OPTION (MAXRECURSION 3267), it fails when reaching highest recursion value.

If I use 0 (which is infinite) the query runs for an hour with no output.

I understand it is a data related issue which is causing infinite loop, but I cannot control the data being sent to the DB.

Any other way we can handle it?

Thanks,
Jay

That's a common situation, and we would sometimes need to clean the mess up (i. e. remove the whole loop).

To do this, we would just need to add a little check condition to the recursive CTE.

We just record the topmost entry (or better say the entry we first use in the loop, as there is obviously no topmost entry), and check if we ever come across it again as we do the recursion.

Once we have it on next iteration, we know it's a loop and it's time to stop.

Here's how we do this:

CREATE TABLE TestComposite (
        id INT NOT NULL, 
        siteUrl NVARCHAR(255) NOT NULL,
        name NVARCHAR(MAX) NOT NULL,
        parentId INT NULL,
        PRIMARY KEY (id, siteUrl),
        FOREIGN KEY (parentId, siteUrl) REFERENCES TestComposite (id, siteUrl)
)
GO

INSERT
INTO    TestComposite
VALUES  (1,  'site1', 'name1', NULL),
        (2,  'site2', 'name2', 4),
        (3,  'site2', 'name3', 2),
        (4,  'site2', 'name4', 3),
        (5,  'site5', 'name5', NULL),
        (6,  'site5', 'name6', 5)
GO

WITH    q (start, id, siteUrl) AS
        (
        SELECT  id, id, siteUrl
        FROM    TestComposite
        WHERE   id = 2
                AND siteUrl = 'site2'
        UNION ALL
        SELECT  start, tc.id, tc.siteUrl
        FROM    q
        JOIN    TestComposite tc
        ON      tc.parentID = q.id
                AND tc.siteUrl = q.siteUrl
        WHERE   tc.id <> q.start
        )
DELETE
FROM    TestComposite
OUTPUT  DELETED.*
WHERE   EXISTS
        (
        SELECT  id, siteUrl
        INTERSECT
        SELECT  id, siteUrl
        FROM    q
        )

We had a loop on 2 -> 3 -> 4 which is gone after the query.

See this query on SQL Fiddle.

Written by Quassnoi

July 10th, 2014 at 9:00 pm

Posted in SQL Server

Top 5 XKCD comics which can illustrate programming questions

with 8 comments

As many of you know, I actively participate on Stack Overflow, the leading Q/A website for software developers.

Kind folks on Stack Overflow have made their data open for examination, and anyone can query their database using this web interface at data.stackexchange.com.

Many of the questions and answers there are illustrated with links to XKCD, the web comics created by Randall Munroe.

So I decided to see which of those comics best illustrate quirks and oddities we keyboard warriors have to deal with in our daily routine.

The query itself is quite simple:

SELECT  link, cnt, id [Post Link], score
FROM    (
        SELECT  link, id, score,
                COUNT(*) OVER (PARTITION BY link) cnt,
                ROW_NUMBER() OVER (PARTITION BY link ORDER BY score DESC) rn
        FROM    (
                SELECT  id, score,
                        'http' + SUBSTRING(start, 1, PATINDEX('%/%', SUBSTRING(start, 13, 30)) + 12)
                FROM    (
                        SELECT  id, score,
                                SUBSTRING(body, PATINDEX('%://xkcd.com/[0-9]%', body), 20) start
                        FROM    (
                                SELECT  TOP 400000
                                        id, body, score
                                FROM    posts
                                ORDER BY
                                        id DESC
                                UNION
                                SELECT  TOP 4000000
                                        postId, text, score
                                FROM    comments
                                ORDER BY
                                        id DESC
                                ) q
                        WHERE   body LIKE '%://xkcd.com/[0-9]%'
                        ) q
                ) q (id, score, link)
        ) q
WHERE   rn = 1
ORDER BY
        cnt DESC

The TOP clauses are there to limit the query size, as their hosting provider does now allow fulltext indexing, and without them the query would just time out. This query only searches recent comments and posts, so some famous and top-voted posts might not be present here. Still, it's a good sample.

You can see all results here, and we'll just discuss the top 5 entries.

#5. Random Number (10 links)

Random Number

Of course the questions which link to it deal with random number generators of any kind and fallacies in their implementation.

Read the rest of this entry »

Written by Quassnoi

July 9th, 2014 at 8:00 pm

Posted in Miscellaneous