EXPLAIN EXTENDED

How to create fast database queries

Mixed ASC/DESC sorting in MySQL

with 10 comments

From Stack Overflow:

I have a table of blog posts, each with a foreign key back to it's author. There are less than 15,000 entries in this table.

This query scans over 19,000 rows (per EXPLAIN), requires a filesort (that might be regular MySQL behavior), and takes over 400ms to return 5 rows. possibly because of the complicated WHERE used to check if the item is actually published.

SELECT  blog_post.id,
blog_post.title,
blog_post.author_id,
blog_post.has_been_fact_checked,
blog_post.published_date,
blog_post.ordering,
auth_user.username,
auth_user.email
FROM    blog_post
INNER JOIN
auth_user
ON      auth_user.id = blog_post.author_id
WHERE   blog_post.is_approved = 1
AND blog_post.has_been_fact_checked = 1
AND blog_post.published_date <=  '2010-10-25 22:40:05'
ORDER BY
        blog_post.published_date DESC,
        blog_post.ordering ASC,
        blog_post.id DESC
LIMIT 5


How can I wrangle this query under control?

This query is quite simple: a filtering condition with two equalities and a range and an order by. The range in the filter fits the ORDER BY perfectly, so the whole query could be executed using an index scan without any filesorts.

The only problem is that we have the mixed directions in ORDER BY, and MySQL does not support ASC / DESC clauses for the indexes.

With a simple table redesign, the problem could easily be solved: we would just need to reverse the order in the ordering column, say, by creating its copy and storing negative values in it. This way, we could just create a composite index (with all columns ascending) and rewrite the query to use the reverse column instead. That's what many engines do, MediaWiki (which Wikipedia runs on) being one of the most well-known examples.

This solution is nice, I hear people saying, but requires a database redesign which as we all know is never as simple as some development pundits on their blogs seem to think.

OK, this is a good point. Let's see what we can do with the current design, and, as always, create a sample table to do it:

Table creation details

There are 1,000 authors and 500,000 blog posts in the respective tables. The blog posts intentionally have duplicates on published_date.

The original query:

SELECT  blog_post.id,
blog_post.title,
blog_post.author_id,
blog_post.has_been_fact_checked,
blog_post.published_date,
blog_post.ordering,
auth_user.username,
auth_user.email
FROM    blog_post
INNER JOIN
auth_user
ON      auth_user.id = blog_post.author_id
WHERE   blog_post.is_approved = 1
AND blog_post.has_been_fact_checked = 1
AND blog_post.published_date <= '2010-10-25 22:40:05'
ORDER BY
        blog_post.published_date DESC,
        blog_post.ordering ASC,
        blog_post.id DESC
LIMIT 5

id title author_id has_been_fact_checked published_date ordering username email
156410 Post 156410 984 1 2010-10-25 22:00:00 1 author984 author984@example.com
451417 Post 451417 140 1 2010-10-25 22:00:00 8 author140 author140@example.com
262749 Post 262749 719 1 2010-10-25 22:00:00 19 author719 author719@example.com
415157 Post 415157 430 1 2010-10-25 22:00:00 22 author430 author430@example.com
307578 Post 307578 611 1 2010-10-25 22:00:00 72 author611 author611@example.com
5 rows fetched in 0.0004s (0.4558s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE blog_post ref ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 2 const,const 189693 100.00 Using where; Using filesort
1 SIMPLE auth_user eq_ref PRIMARY PRIMARY 4 20101102_desc.blog_post.author_id 1 100.00
select `20101102_desc`.`blog_post`.`id` AS `id`,`20101102_desc`.`blog_post`.`title` AS `title`,`20101102_desc`.`blog_post`.`author_id` AS `author_id`,`20101102_desc`.`blog_post`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`blog_post`.`published_date` AS `published_date`,`20101102_desc`.`blog_post`.`ordering` AS `ordering`,`20101102_desc`.`auth_user`.`username` AS `username`,`20101102_desc`.`auth_user`.`email` AS `email` from `20101102_desc`.`blog_post` join `20101102_desc`.`auth_user` where ((`20101102_desc`.`auth_user`.`id` = `20101102_desc`.`blog_post`.`author_id`) and (`20101102_desc`.`blog_post`.`has_been_fact_checked` = 1) and (`20101102_desc`.`blog_post`.`is_approved` = 1) and (`20101102_desc`.`blog_post`.`published_date` <= '2010-10-25 22:40:05')) order by `20101102_desc`.`blog_post`.`published_date` desc,`20101102_desc`.`blog_post`.`ordering`,`20101102_desc`.`blog_post`.`id` desc limit 5

As the asker said, this query takes 455 ms and the filesort is used.

Now, as we cannot use a single index on mixed ASC / DESC ordering conditions, we should squeeze the most from our existing index.

The longest common prefix of the index and the ORDER BY expression (which implicitly includes all columns from the WHERE clause) is (is_approved, has_been_fact_checked, published_date).

What does it mean? It means that the chunks of data sharing the distinct values of these three columns will always go in the same order both in the index and in the query, though the order may and will differ within each chunk.

See the queries below, first one with the initial ORDER BY sorting:

SELECT  published_date, id, ordering
FROM    blog_post
WHERE   is_approved = 1
AND has_been_fact_checked = 1
ORDER BY
published_date DESC,
ordering ASC,
id DESC
LIMIT 16
published_date id ordering
2010-11-02 00:00:00 111429 97
 
2010-11-01 23:00:00 48638 6
2010-11-01 23:00:00 245245 7
2010-11-01 23:00:00 153099 19
2010-11-01 23:00:00 334624 33
2010-11-01 23:00:00 200844 45
2010-11-01 23:00:00 163114 52
2010-11-01 23:00:00 312847 54
2010-11-01 23:00:00 205531 72
2010-11-01 23:00:00 265795 94
 
2010-11-01 22:00:00 409475 24
2010-11-01 22:00:00 213483 45
2010-11-01 22:00:00 72571 57
 
2010-11-01 21:00:00 102956 32
2010-11-01 21:00:00 422579 48
2010-11-01 21:00:00 280297 98

, the second one with the index sorting:

SELECT  published_date, id, ordering
FROM    blog_post
WHERE   is_approved = 1
AND has_been_fact_checked = 1
ORDER BY
published_date DESC,
ordering DESC,
id DESC
LIMIT 16
published_date id ordering
2010-11-02 00:00:00 111429 97
 
2010-11-01 23:00:00 265795 94
2010-11-01 23:00:00 205531 72
2010-11-01 23:00:00 312847 54
2010-11-01 23:00:00 163114 52
2010-11-01 23:00:00 200844 45
2010-11-01 23:00:00 334624 33
2010-11-01 23:00:00 153099 19
2010-11-01 23:00:00 245245 7
2010-11-01 23:00:00 48638 6
 
2010-11-01 22:00:00 72571 57
2010-11-01 22:00:00 213483 45
2010-11-01 22:00:00 409475 24
 
2010-11-01 21:00:00 280297 98
2010-11-01 21:00:00 422579 48
2010-11-01 21:00:00 102956 32

As you can see, the chunks of data occupy the same places within both recordsets. Post dated 2010-11-02 00:00:00 is at row 1. Those dated 2010-11-01 23:00:00 are at rows 210 etc. The orders differ only within the chunks, but not between chunks.

This means that if we select the same chunks as the original query does and reorder them, we get the same order.

And the chunks we can select quite efficiently using the index, since their order, as we shown above, is the same for both queries.

The original query had LIMIT 5, this means we have to select at most 5 chunks. To do this, we just need to count off 5 distinct dates satisfying the condition.

This can be done using a technique I described in one of my earlier posts:

Selecting 5 dates

Here's a query to select 5 distinct dates less than the given one:

SELECT  bp.published_date
FROM    blog_post bp
WHERE   bp.is_approved = 1
AND bp.has_been_fact_checked = 1
AND bp.published_date &lt; '2010-10-25 22:40:05'
        AND bp.id =
        (
        SELECT  id
        FROM    blog_post bpi
        WHERE   bpi.is_approved = 1
                AND bpi.has_been_fact_checked = 1
                AND bpi.published_date = bp.published_date
        ORDER BY
                bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC
        LIMIT 1
        )
ORDER BY
        bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC
LIMIT 5
published_date
2010-10-25 22:00:00
2010-10-25 21:00:00
2010-10-25 20:00:00
2010-10-25 19:00:00
2010-10-25 18:00:00
5 rows fetched in 0.0001s (0.0018s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY bp range ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 189693 100.00 Using where; Using index
2 DEPENDENT SUBQUERY bpi ref ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 const,const,20101102_desc.bp.published_date 1 100.00 Using where; Using index
Field or reference '20101102_desc.bp.published_date' of SELECT #2 was resolved in SELECT #1
select `20101102_desc`.`bp`.`published_date` AS `published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 5

This uses the index access and is very efficient.

Selecting chunks of data

Now we need to select all records within the chunks defined by the dates. This can be done with a simple BETWEEN condition. The upper bound of the BETWEEN will be the date that we used as the parameter to the query, and the lower one will be the last date of those selected on the previous stage.

Since it should be a scalar, we will just replace LIMIT 5 (which returns 5 records) with LIMIT 4, 1 (which returns the last of these 5 records) and put it into a scalar subquery:

SELECT  *
FROM    blog_post bpo
WHERE   is_approved = 1
AND has_been_fact_checked = 1
AND published_date BETWEEN
(
SELECT  bp.published_date
FROM    blog_post bp
WHERE   bp.is_approved = 1
AND bp.has_been_fact_checked = 1
AND bp.published_date &lt; '2010-10-25 22:40:05'
                AND bp.id =
                (
                SELECT  id
                FROM    blog_post bpi
                WHERE   bpi.is_approved = 1
                        AND bpi.has_been_fact_checked = 1
                        AND bpi.published_date = bp.published_date
                ORDER BY
                        bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC
                LIMIT 1
                )
        ORDER BY
                bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC
        LIMIT 4, 1
        )
        AND '2010-10-25 22:40:05'
id title is_approved has_been_fact_checked published_date ordering author_id
418694 Post 418694 1 1 2010-10-25 18:00:00 16 579
354679 Post 354679 1 1 2010-10-25 18:00:00 44 733
480570 Post 480570 1 1 2010-10-25 18:00:00 70 62
402339 Post 402339 1 1 2010-10-25 18:00:00 85 498
282148 Post 282148 1 1 2010-10-25 18:00:00 87 152
358065 Post 358065 1 1 2010-10-25 18:00:00 88 301
 
328191 Post 328191 1 1 2010-10-25 19:00:00 33 627
95345 Post 95345 1 1 2010-10-25 19:00:00 39 497
73886 Post 73886 1 1 2010-10-25 19:00:00 77 1
 
29391 Post 29391 1 1 2010-10-25 20:00:00 13 259
632 Post 632 1 1 2010-10-25 20:00:00 15 952
 
93117 Post 93117 1 1 2010-10-25 21:00:00 65 773
278821 Post 278821 1 1 2010-10-25 21:00:00 72 49
 
156410 Post 156410 1 1 2010-10-25 22:00:00 1 984
451417 Post 451417 1 1 2010-10-25 22:00:00 8 140
262749 Post 262749 1 1 2010-10-25 22:00:00 19 719
415157 Post 415157 1 1 2010-10-25 22:00:00 22 430
225161 Post 225161 1 1 2010-10-25 22:00:00 72 398
307578 Post 307578 1 1 2010-10-25 22:00:00 72 611
426932 Post 426932 1 1 2010-10-25 22:00:00 92 64
20 rows fetched in 0.0012s (0.0041s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY bpo range ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 20 75.00 Using where
2 SUBQUERY bp range ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 189693 100.00 Using where; Using index
3 DEPENDENT SUBQUERY bpi ref ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 20101102_desc.bp.published_date 1 100.00 Using where; Using index
Field or reference '20101102_desc.bp.published_date' of SELECT #3 was resolved in SELECT #2
select `20101102_desc`.`bpo`.`id` AS `id`,`20101102_desc`.`bpo`.`title` AS `title`,`20101102_desc`.`bpo`.`is_approved` AS `is_approved`,`20101102_desc`.`bpo`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`bpo`.`published_date` AS `published_date`,`20101102_desc`.`bpo`.`ordering` AS `ordering`,`20101102_desc`.`bpo`.`author_id` AS `author_id` from `20101102_desc`.`blog_post` `bpo` where ((`20101102_desc`.`bpo`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpo`.`is_approved` = 1) and (`20101102_desc`.`bpo`.`published_date` between (select `20101102_desc`.`bp`.`published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 4,1) and '2010-10-25 22:40:05'))

This query returns (almost instantly) 20 records in 5 chunks of different size (split for readability). Note that the chunks are arranged in order reverse to the order the dates were returned on the previous step: we omitted the ORDER BY clause and, by default, the index is traversed in forward (ASC) direction.

Ordering and joining

Now, all that's left to do is to add the correct ORDER BY condition to rearrange the data within the chunks, join the authors table and apply the LIMIT:

SELECT  bpo.id,
bpo.title,
bpo.author_id,
bpo.has_been_fact_checked,
bpo.published_date,
bpo.ordering,
au.username,
au.email
FROM    blog_post bpo
JOIN    auth_user au
ON      au.id = bpo.author_id
WHERE   bpo.is_approved = 1
AND bpo.has_been_fact_checked = 1
AND bpo.published_date BETWEEN
(
SELECT  bp.published_date
FROM    blog_post bp
WHERE   bp.is_approved = 1
AND bp.has_been_fact_checked = 1
AND bp.published_date &lt; '2010-10-25 22:40:05'
                AND bp.id =
                (
                SELECT  id
                FROM    blog_post bpi
                WHERE   bpi.is_approved = 1
                        AND bpi.has_been_fact_checked = 1
                        AND bpi.published_date = bp.published_date
                ORDER BY
                        bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC
                LIMIT 1
                )
        ORDER BY
                bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC
        LIMIT 4, 1
        )
        AND '2010-10-25 22:40:05'
ORDER BY
        bpo.published_date DESC,
        bpo.ordering ASC,
        bpo.id DESC
LIMIT 5
id title author_id has_been_fact_checked published_date ordering username email
156410 Post 156410 984 1 2010-10-25 22:00:00 1 author984 author984@example.com
451417 Post 451417 140 1 2010-10-25 22:00:00 8 author140 author140@example.com
262749 Post 262749 719 1 2010-10-25 22:00:00 19 author719 author719@example.com
415157 Post 415157 430 1 2010-10-25 22:00:00 22 author430 author430@example.com
307578 Post 307578 611 1 2010-10-25 22:00:00 72 author611 author611@example.com
5 rows fetched in 0.0004s (0.0044s)
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY bpo range ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 20 100.00 Using where; Using filesort
1 PRIMARY au eq_ref PRIMARY PRIMARY 4 20101102_desc.bpo.author_id 1 100.00
2 SUBQUERY bp range ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 189693 100.00 Using where; Using index
3 DEPENDENT SUBQUERY bpi ref ix_blogpost_approved_checked_published_ordering_id ix_blogpost_approved_checked_published_ordering_id 11 20101102_desc.bp.published_date 1 100.00 Using where; Using index
Field or reference '20101102_desc.bp.published_date' of SELECT #3 was resolved in SELECT #2
select `20101102_desc`.`bpo`.`id` AS `id`,`20101102_desc`.`bpo`.`title` AS `title`,`20101102_desc`.`bpo`.`author_id` AS `author_id`,`20101102_desc`.`bpo`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`bpo`.`published_date` AS `published_date`,`20101102_desc`.`bpo`.`ordering` AS `ordering`,`20101102_desc`.`au`.`username` AS `username`,`20101102_desc`.`au`.`email` AS `email` from `20101102_desc`.`blog_post` `bpo` join `20101102_desc`.`auth_user` `au` where ((`20101102_desc`.`au`.`id` = `20101102_desc`.`bpo`.`author_id`) and (`20101102_desc`.`bpo`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpo`.`is_approved` = 1) and (`20101102_desc`.`bpo`.`published_date` between (select `20101102_desc`.`bp`.`published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 4,1) and '2010-10-25 22:40:05')) order by `20101102_desc`.`bpo`.`published_date` desc,`20101102_desc`.`bpo`.`ordering`,`20101102_desc`.`bpo`.`id` desc limit 5

The records have been rearranged (using a filesort) and a LIMIT was applied.

Since the records are only taken from 5 chunks with 20 records in total, the filesort is not a problem at all and the query completes in only 4 ms.

Conclusion

MySQL does not support ASC / DESC clauses in the indices.

However, with an ORDER BY / LIMIT condition that mixes the orders of columns, it is possible to use an index.

To do that, we should limit the recordset to the minimal set that would still retain the order on the leading columns that share one direction. This can be done by selecting distinct set of those leading columns up to the original limit, and filtering on them using the index.

Then we can revert to a filesort to order this (much smaller) set on the remaining columns, which would be much more efficient.

Written by Quassnoi

November 2nd, 2010 at 11:00 pm

Posted in MySQL

10 Responses to 'Mixed ASC/DESC sorting in MySQL'

Subscribe to comments with RSS

  1. What are the advantages and disadvantages of using PostgreSQL over MySQL?…

    First off, I believe that whatever you’re doing, you’ll probably be fine using either. That said, I’m generally more frustrated when using MySQL than when using PG. Here’s a sample of the problems I’ve encountered from using both MySQL and PG. I h…

    Quora

    23 Jun 11 at 13:49

  2. Why do people typically choose PostgreSQL over MySQL?…

    First off, I believe that whatever you’re doing, you’ll probably be fine using either. That said, I’m generally more frustrated when using MySQL than when using PG. Here’s a sample of the problems I’ve encountered from using both MySQL and PG. I h…

    Quora

    23 Jun 11 at 13:59

  3. Why don’t you just make your ordering column negative, then you can use a covering index for all columns. Ordering by a negative column DESC, is, in practice…”ascending”.

    Steve

    6 Feb 12 at 23:36

  4. @Steve: have you read the intro? From “with a simple table redesign, the problem could easily be solved: we would just need to reverse the order in the ordering column …” and on?

    Quassnoi

    6 Feb 12 at 23:39

  5. Ah man…. sorry… I tend to skim read when it’s approaching midnight….

    Steve

    8 Feb 12 at 00:19

  6. “The maximal common prefix of the index and the ORDER BY condition is (is_approved, has_been_fact_checked, published_date).”

    Should that read “…and the WHERE condition”? I’m struggling to understand it otherwise.

    Ralph Corderoy

    2 Feb 13 at 18:02

  7. @Ralph: the WHERE filters are constant, that’s why ORDER BY implicitly orders on them too.

    It’s actually ORDER BY is_approved DESC, has_been_fact_checked DESC, published_date DESC, ordering ASC, id ASC.

    Only is_approved and has_been_fact_checked are handled by the WHERE condition (they are constants within the resultset) and you don’t need to mention them explicitly in the ORDER BY.

    Quassnoi

    2 Feb 13 at 18:29

  8. Thanks Quassnoi, that helps. When MySQL is deciding which indexes are helpful does it consider the WHERE, ORDER BY, or both together? It sounds like the last of these from what you’re saying.

    Ralph Corderoy

    2 Feb 13 at 22:54

  9. It considers both, of course, and also GROUP BY and LIMIT be they used in the query.

    The set of columns used in the SELECT list also matters.

    Quassnoi

    3 Feb 13 at 02:53

  10. This was very helpfull! I have implemented the reverse ordering column to solve the mixed ASC and DESC order optimization issue. Awesome! Thanks!

    Jeroen

    14 Jan 14 at 19:14

Leave a Reply