EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Things SQL needs: determining range cardinality. You're welcome to read and comment on it.

Endless loops

Imagine that we are developing an email server that keeps the outgoing messages in a database.

The server is always busy sending the messages if the outgoing queue is not empty. It may have up to 10 connections at once.

If the message is failed to be sent, it’s reinserted to the end of the queue until the retry count is exceeded.

As soon as all messages from a batch succeed or fail to be sent, next 10 messages from the queue are fetched.

How do we write a query to fetch a batch of messages from the queue?

This all can be done in a single query run in a loop. No range checking, no reinserting.

Let’s create the sample table:

Table creation details

SELECT  *
FROM    t_queue
id header attempt
1 Message 1 0
2 Message 2 0
3 Message 3 0
4 Message 4 0
5 Message 5 0
6 Message 6 0
7 Message 7 0
8 Message 8 0
9 Message 9 0
10 Message 10 0
11 Message 11 0
12 Message 12 0
13 Message 13 0
14 Message 14 0
15 Message 15 0
16 Message 16 0
17 Message 17 0
18 Message 18 0
19 Message 19 0
20 Message 20 0
20 rows fetched in 0.0006s (0.0018s)

We have a queue of 20 fresh messages, none of them having been attempted to send.

Let’s select a batch of the messages to be sent. We need the messages for which the retry counter had not been exceeded, i. e. those having attempt < 3.

We initialize a session variable:

SET @r := -1;

and issue the following query:

SELECT  q.*
FROM    (
        SELECT  @r AS _current,
                @r := COALESCE(
                (
                SELECT  id
                FROM    t_queue
                WHERE   id > _current
                        AND attempt < 3
                ORDER BY
                        id
                LIMIT 1
                ),
                (
                SELECT  MIN(id)
                FROM    t_queue
                WHERE   attempt < 3
                )
                ) AS nid
        FROM    t_queue
        LIMIT 10
        ) ids, t_queue q
WHERE   id = nid
FOR UPDATE
id header attempt
1 Message 1 0
2 Message 2 0
3 Message 3 0
4 Message 4 0
5 Message 5 0
6 Message 6 0
7 Message 7 0
8 Message 8 0
9 Message 9 0
9 rows fetched in 0.0003s (0.0067s)

This selects a list of first 10 messages.

Now, if we just reissue the exactly same query, we get:

id header attempt
11 Message 11 0
12 Message 12 0
13 Message 13 0
14 Message 14 0
15 Message 15 0
16 Message 16 0
17 Message 17 0
18 Message 18 0
19 Message 19 0
20 Message 20 0
10 rows fetched in 0.0004s (0.0068s)

Exactly same query, but the results are different.

Now let's assume that messages 5, 10 and 15 failed to be sent and update their attempts to 3's:

UPDATE t_queue
SET    attempt = 3
WHERE  id IN (5, 10, 15)

, and rerun the batch query. We get the following result:

id header attempt
1 Message 1 0
2 Message 2 0
3 Message 3 0
4 Message 4 0
6 Message 6 0
7 Message 7 0
8 Message 8 0
9 Message 9 0
11 Message 11 0
12 Message 12 0
10 rows fetched in 0.0004s (0.0068s)

The query has wrapped around automatically.

If we repeat the query again, we will get:

id header attempt
13 Message 13 0
14 Message 14 0
16 Message 16 0
17 Message 17 0
18 Message 18 0
19 Message 19 0
20 Message 20 0
1 Message 1 0
2 Message 2 0
3 Message 3 0
10 rows fetched in 0.0003s (0.0059s)

The query makes a rollover right inside the resultset: it selects 1 after 20.

So we can basically just run the query inside an endless loop without having to bother about rewinding, rollovers etc. The query will just select 10 next messages again and again.

So how does it work?

The principle is simple:

  1. We use t_queue in the inner subquery as a dummy rowsource, limiting rows as necessary
  2. We keep current id in the session variable @r
  3. For each row in the dummy rowset, we get the id next to @r that satisfies the conditions (attempt < 3)
  4. If there is no next id for a given @r, we select the minimal id satisfying the conditions, thus making a rollover
  5. Finally, we join t_queue on this next id and mark it FOR UPDATE

As the intermediate result is stored in @r between the queries, we assign @r only once (right after connecting), and just run the batch query over and over again.

Written by Quassnoi

April 3rd, 2009 at 11:00 pm

Posted in MySQL

Comments are closed.