EXPLAIN EXTENDED

How to create fast database queries

Group based identity

Comments enabled. I *really* need your comment

From Stack Overflow:

I'm working with an application that adds a new row to the database, based on the last row meeting a certain criteria.

Here is an over-simplified visualization:

A1
A2
A3
B1
B2

Using the visualization above, a web page loads up the highest B value, which is 2.

Then, after some time, it wants to insert B3, the next record in the series.

However, it has to check to make sure that someone else didn't do the same thing.

In actuality, each row corresponds to a place that a piece of equipment visited in a factory.

The page (not easy to rewrite), currently moves the equipment to the next step.

Is there a standard pattern for dealing with this type of problem, or do I simply need to lock the table?

This question combines two other frequently asked questions:

  1. How do I make group-based identity (i. e. keep several identity series identified by the value of a certain column)
  2. How do I ensure there will be no identity gaps?

While there are numerious solutions that involve some kind of selecting MAX(column) and updating the row inside a transaction, there is a more simple way using linked lists.

We need to create the table as following:

CREATE SCHEMA [20090615_identity]
CREATE TABLE t_state (
        id INT NOT NULL IDENTITY,
        grouper VARCHAR(20) NOT NULL,
        parent INT NOT NULL
)
 ..GO
ALTER TABLE [20090615_identity].t_state ADD CONSTRAINT pk_state_grouper_id PRIMARY KEY (grouper, id)
CREATE UNIQUE INDEX ix_state_grouper_parent ON [20090615_identity].t_state (grouper, parent)
GO
BEGIN TRANSACTION

INSERT
INTO    [20090615_identity].t_state (grouper, parent)
SELECT   'A', 0
UNION ALL
SELECT   'A', 1
UNION ALL
SELECT   'A', 2
UNION ALL
SELECT   'B', 0
UNION ALL
SELECT   'B', 4

COMMIT
GO

We have two UNIQUE indexes here: a PRIMARY KEY on (grouper, id) and a UNIQUE INDEX on (grouper, parent).

Such design ensures that within one grouper there will be at most one child for each id.

An id field has a single continuous identity whose value does not depend on grouper.

To obtain the serial numbers within each grouper, we use the following query:

WITH    q (id, grouper, parent, step) AS
        (
        SELECT  id, grouper, parent, 1
        FROM    [20090615_identity].t_state
        WHERE   parent = 0
        UNION ALL
        SELECT  m.id, m.grouper, m.parent, q.step + 1
        FROM    q
        JOIN    [20090615_identity].t_state m
        ON      m.grouper = q.grouper
                AND m.parent = q.id
        )
SELECT  grouper, step
FROM    q                                        
ORDER BY
        grouper, step

grouper step
A 1
A 2
A 3
B 1
B 2
5 rows fetched in 0.0003s (0.0010s)

To insert a new value into any group, we issue the following query:

INSERT
INTO    [20090615_identity].t_state (grouper, parent)
SELECT  grouper, id
FROM    [20090615_identity].t_state so
WHERE   grouper = 'B'
        AND NOT EXISTS
        (
        SELECT  1
        FROM    [20090615_identity].t_state si
        WHERE   si.grouper = so.grouper
                AND si.parent = so.id
        )

This query searches for the last value in the linked list (that is the value that has no children) and uses it as a parent for the newly inserted record.

Note that we don't even care for actual value of id: it's an auxiliary column to enforce a UNIQUE constraint. We do not directly search for it or even SELECT its value.

After the INSERT completes, we get new resultset:

WITH    q (id, grouper, parent, step) AS
        (
        SELECT  id, grouper, parent, 1
        FROM    [20090615_identity].t_state
        WHERE   parent = 0
        UNION ALL
        SELECT  m.id, m.grouper, m.parent, q.step + 1
        FROM    q
        JOIN    [20090615_identity].t_state m
        ON      m.grouper = q.grouper
                AND m.parent = q.id
        )
SELECT  grouper, step
FROM    q                                        
ORDER BY
        grouper, step
grouper step
A 1
A 2
A 3
B 1
B 2
B 3
6 rows fetched in 0.0003s (0.0014s)

If two transactions will try to insert two different children into one group at one time, one of the following will occur:

  • If the LOCK_TIMEOUT is 0, the second transaction will fail immediately
  • If the first transaction commits within the LOCK_TIMEOUT time since the INSERT command in the second transaction begins, the second transaction will rerun the query and insert the second child (after the one inserted by the first transaction)

One should use LOCK_TIMEOUT to control the desired behavior of the application.

To delete the last value from each group, use this command:

WITH    q AS
        (
        SELECT  *
        FROM    [20090615_identity].t_state so
        WHERE   so.grouper = 'B'
                AND NOT EXISTS
                (
                SELECT  1
                FROM    [20090615_identity].t_state si
                WHERE   si.grouper = so.grouper
                        AND si.parent = so.id
                )
        )
DELETE
FROM    q

Again, we don't need to know or use the actual value of id.

Though the actual value of id is deleted and will never be reused, the maximum value of step that we select as a pseudocolumn in the recursive CTE will be decreased after DELETE and increased again after an INSERT.

This solution has some drawbacks, the most serious of them being inability to select more than 100 items per group (this is a limitation put by default value of MAXRECURSION).

However, for a table of factory items (which most probably implies lots of grouper's and a few states per grouper), this should not be a problem.

Written by Quassnoi

June 15th, 2009 at 11:00 pm

Posted in SQL Server

Leave a Reply