Archive for the ‘SQL Server’ Category
From comments to one of my previous posts: SQL Server: deleting with self-referential FOREIGN KEY
In my case the query goes to infinite loop.
If I use
OPTION (MAXRECURSION 365)or
OPTION (MAXRECURSION 3267), it fails when reaching highest recursion value.
If I use 0 (which is infinite) the query runs for an hour with no output.
I understand it is a data related issue which is causing infinite loop, but I cannot control the data being sent to the DB.
Any other way we can handle it?
That's a common situation, and we would sometimes need to clean the mess up (i. e. remove the whole loop).
To do this, we would just need to add a little check condition to the recursive CTE.
We just record the topmost entry (or better say the entry we first use in the loop, as there is obviously no topmost entry), and check if we ever come across it again as we do the recursion.
Once we have it on next iteration, we know it's a loop and it's time to stop.
Here's how we do this:
CREATE TABLE TestComposite ( id INT NOT NULL, siteUrl NVARCHAR(255) NOT NULL, name NVARCHAR(MAX) NOT NULL, parentId INT NULL, PRIMARY KEY (id, siteUrl), FOREIGN KEY (parentId, siteUrl) REFERENCES TestComposite (id, siteUrl) ) GO INSERT INTO TestComposite VALUES (1, 'site1', 'name1', NULL), (2, 'site2', 'name2', 4), (3, 'site2', 'name3', 2), (4, 'site2', 'name4', 3), (5, 'site5', 'name5', NULL), (6, 'site5', 'name6', 5) GO WITH q (start, id, siteUrl) AS ( SELECT id, id, siteUrl FROM TestComposite WHERE id = 2 AND siteUrl = 'site2' UNION ALL SELECT start, tc.id, tc.siteUrl FROM q JOIN TestComposite tc ON tc.parentID = q.id AND tc.siteUrl = q.siteUrl WHERE tc.id <> q.start ) DELETE FROM TestComposite OUTPUT DELETED.* WHERE EXISTS ( SELECT id, siteUrl INTERSECT SELECT id, siteUrl FROM q )
We had a loop on 2 -> 3 -> 4 which is gone after the query.
See this query on SQL Fiddle.
A friend of mine who owns a stock photo website once told me he had designed an awesome photo auto-tagging algorithm which was 80% correct: just tag all photos cats and have four hits out of five.
But seriously, how do we calculate most used tags on photos?
For those not familiar with different tagging models, I highly recommend this old but very useful article: Tags: Database schemas.
Now, in this tutorial, we'll use the toxi approach to tagging. Later in the article I'll explain why we would choose that approach over the others.
Let us create a simple schema which would allow us to store, tag and sell photos:
You're building a social network and want to store who is friends with whom. OK. Let's assume you want Facebook-style friends: by agreeing to be friends with someone you allow them to be friends with you too. This is called a symmetric relationship and if your social model sticks to that kind of relationship, your application should enforce its symmetry.
How shall we store friends in a database table? There are several options.
A table with one record per friendship and two fields for each of the friends. The person who requested the friendship is stored in the first field:
10. Unique NULL
When declaring UNIQUE constraints on columns, SQL Server treats all NULL values as unique. This means you can't have more than one NULL value in a unique column.
This behavior violates the standard and is specific to SQL Server.
To work around this, implement the constraint by declaring a UNIQUE index which would exclude NULL values:
SQL Server supports a really useful feature called indexed views. This means it lets you materialize results of some queries and build indexes over them. It’s as if you haven’t actually run the query yet - but it’s already complete, stored conveniently on the disk for you, ordered and waiting for you to read its results. Not all queries are eligible but some quite often used by the database developers are.
In this series, I'll describe three scenarios which could really benefit from using indexed views.
Scenario One: Unique multiple columns
Assume you are building a database for multiplayer online tic-tac-toe tournaments. The first thing you’d want would be recording game results: who played with whom and who won. This can be easily achieved with a simple layout:
Now we have additional requirement: no person should be able to play in a tournament more than once. How do we do that?
Read the rest of this entry on tech.pro »
From Stack Overflow:
UPDATE n SET AddressID = a.AddressID FROM #NewAddress n JOIN dbo.Address a ON (a.[LineA] = n.[LineA] OR (a.[LineA] is null AND n.[LineA] is null)) AND (a.[LineB] = n.[LineB] OR (a.[LineB] is null AND n.[LineB] is null)) AND (a.[LineC] = n.[LineC] OR (a.[LineC] is null AND n.[LineC] is null)) WHERE n.Processed = 0
Any ideas on how to UNION or EXISTS this query? This seems to be a pretty common join condition I'm encountering.
JOIN condition: we should join two tables on a common key, also treating NULL values as equal. Normal equality conditions don't work here: in SQL,
NULL values are not considered equal (strictly speaking, their equality is a boolean
NULL rather than true or false).
ANSI standard defines a special predicate for that:
IS NOT DISTINCT FROM, which is supported by PostgreSQL. This treats values the same value as
DISTINCT does: two equal values, as well as two
NULL values, are considered not distinct. MySQL also supports similar operator,
< =>. Unfortunately, SQL Server supports neither.
In SQL Server, we can emulate this by using set operators which distinctify the queries. Set operators work on sets, so we would need some kind of a
SELECT query for them. This would also help us to overcome SQL Server's inability to compare tuples directly. Basically, we need to tell if two tuples:
(a.lineA, a.lineB, a.lineC) and
(n.lineA, n.lineB, n.lineC) are distinct or not.
We can do that by selecting those tuples in dummy queries (those without a
FROM clause), intersecting those queries (which would give us a one-record or zero-record resultset, depending on whether the two tuples are distinct or not) and applying
EXISTS to the resultset.
EXISTS is a boolean predicate and as such can serve as a join condition.
Let's check it. We'll create two tables, fill them with random values (including occasional NULLs here and there):
From Stack Overflow:
I am trying to insert into the target table values from the source table, but also insert the primary key of the source table into the tracking table.INSERT INTO TRN_TEMP (TRN_TRAN_DATE, TRN_DESCRIPTION, TRN_AMT) OUTPUT D.DET_PRIMARY, GETDATE() INTO REGISTER (DET_PRIMARY_LINK, INS_DATE) SELECT D.DET_DATE, 'SOMETEXT', SUM(D.DET_NET) FROM DETAIL D LEFT JOIN REGISTER R ON R.DET_PRIMARY_LINK = D.DET_PRIMARY WHERE R.LINE_ID IS NULL -- TO REMOVE LINES ALREADY PROCESSED GROUP BY D.DET_DATE
I can't see a problem with the text above but I get an error:The multi part identifier 'D.DET_PRIMARY' could not be bound.
I've tried both
DETAIL.DET_DETAILand the error is the same.
This is a limitation of SQL Server which can be easily worked around by substituting
MERGE instead of
Let's create some sample tables and see:
CREATE TABLE source ( id INT NOT NULL PRIMARY KEY, grouper INT NOT NULL, value INT NOT NULL ) CREATE TABLE destination ( id INT NOT NULL IDENTITY PRIMARY KEY, grouper INT NOT NULL, aggregated_value INT NOT NULL ) CREATE TABLE register ( dest_id INT NOT NULL PRIMARY KEY, max_source_id INT NOT NULL UNIQUE, grouper INT NOT NULL UNIQUE, regdate DATE NOT NULL, ) INSERT INTO source VALUES (1, 1, 5), (2, 1, 2), (3, 2, 6), (4, 2, 8)
, and run the query:
MERGE INTO destination d USING ( SELECT grouper, SUM(value), MAX(id) FROM source WHERE grouper NOT IN ( SELECT grouper FROM register ) GROUP BY grouper ) s (grouper, aggregated_value, max_source_id) ON (1 = 0) WHEN NOT MATCHED THEN INSERT (grouper, aggregated_value) VALUES (grouper, aggregated_value) OUTPUT INSERTED.id, s.max_source_id, s.grouper, GETDATE() INTO register; SELECT * FROM register;
You can try it yourself on SQL Fiddle.
As you can see, we are using both destination table's
INSERTED.id, source table's
s.max_source_id, s.grouper and an arbitrary expression
GETDATE() in the
OUTPUT clause which works just alright.
An alert reader may notice that there is another problem with the original query, but this will be covered in the next post. Watch this space!
Answering questions asked on the site.
… I've always thought
UNPIVOTare signs of a poorly designed database. I mean, is there a legitimate use for them if your model is OK?
I've made an actual use for them in a project I've been working on for the last several months (which is partly why there were no updates for so long!)
Part of the project is a task management system where each task has several persons related to it. There can be the creator of the task, the person the task is assigned to, the actual author of the task (on behalf of whom the task is created), and the task can be possible completed by a person or deleted by a person. A total of 5 fields related to persons.
Now, we need to take all tasks within a certain time range and list all people involved in them.
Let's create a sample table and see how would we do that.
I'm finally back from my vacation. Tunisia's great: dates, Carthage, sea and stuff.
Now, to the questions.
Create a table called
Group id prodname 1 X 1 Y 1 Z 2 A 2 B 2 C
The resultset should look like this:
id prodname 1 X,Y,Z 2 A,B,C
Can you please help me to solve the above problem using a recursive CTE?
This is out good old friend,
GROUP_CONCAT. It's an aggregate function that returns all strings within a group, concatenated. It's somewhat different from the other aggregate functions, because, first, dealing with the concatenated string can be quite a tedious task for the groups with lots of records (large strings tend to overflow), and, second, the result depends on the order of the arguments (which is normally not the case for the aggregate functions). It's not a part of a standard SQL and as for now is implemented only by MySQL with some extra vendor-specific keywords (like
ORDER BY within the argument list).
Now, let's see how to do it in SQL Server.
Usually, SQL Server's
FOR XML clause is exploited to concatenate the strings. To do this, we obtain a list of group identifiers and for each group, retrieve all it's product names with a subquery appended with
FOR XML PATH(''). This makes a single
XML column out of the recordset:
Read the rest of this entry »
From Stack Overflow:
I have an SQL Server table defined as below:
TestComposite id (PK) siteUrl (PK) name parentId 1 site1 Item1 NULL 2 site1 Item2 NULL 3 site1 Folder1 NULL 4 site1 Folder1.Item1 3 5 site1 Folder1.Item2 3 6 site1 Folder1.Folder1 3 7 site1 Folder1.Folder1.Item1 6
Items and folders are stored inside the same table
If an item is inside a folder, the
parentIDcolumn is the
idof the folder.
I would like to be able to
DELETE CASCADEitems/folders when I delete a folder.
I tried to define a constraint similar to:ALTER TABLE [TestComposite] ADD CONSTRAINT fk_parentid FOREIGN KEY (ParentID, SiteUrl) REFERENCES [TestComposite] (ID, SiteUrl) ON DELETE CASCADE
, but it gives me this error:Introducing FOREIGN KEY constraint 'fk_parentid' on table 'TestComposite' may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.
SQL Server does support chained
CASCADE updates, but does not allow one table to participate more that once in a chain (i. e. does not allow loops).
SQL Server, unlike most other engines, optimizes cascading DML operations to be set-based which requires building a cycle-free DML order (which you can observe in the execution plan). With the loops, that would not be possible.
However, it is possible to define such a constraint without cascading operations, and with a little effort it is possible to delete a whole tree branch at once.
Let's create a sample table:
Read the rest of this entry »