EXPLAIN EXTENDED

How to create fast database queries

10 things in MySQL (that won’t work as expected)

with 45 comments

(I just discovered cracked.com)

#10. Searching for a NULL

SELECT  *
FROM    a
WHERE   a.column = NULL

In SQL, a NULL is never equal to anything, even another NULL. This query won't return anything and in fact will be thrown out by the optimizer when building the plan.

When searching for NULL values, use this instead:

SELECT  *
FROM    a
WHERE   a.column IS NULL

#9. LEFT JOIN with additional conditions

SELECT  *
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
WHERE   b.column = 'something'

A LEFT JOIN is like INNER JOIN except that it will return each record from a at least once, substituting missing fields from b with NULL values, if there are no actual matching records.

The WHERE condition, however, is evaluated after the LEFT JOIN so the query above checks column after it had been joined. And as we learned earlier, no NULL value can satisfy an equality condition, so the records from a without corresponding record from b will unavoidably be filtered out.

Essentially, this query is an INNER JOIN, only less efficient.

To match only the records with b.column = 'something' (while still returning all records from a), this condition should be moved into ON clause:

SELECT  *
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
        AND b.column = 'something'

#8. Less than a value but not a NULL

Quite often I see the queries like this:

SELECT  *
FROM    b
WHERE   b.column < 'something'
        AND b.column IS NOT NULL
&#91;/sourcecode&#93;

This is actually not an error: this query is valid and will do what's intended. However, <code>IS NOT NULL</code> here is redundant.

If <code>b.column</code> is a <code>NULL</code>, then <code>b.column < 'something'</code> will never be satisfied, since any comparison to <code>NULL</code> evaluates to a boolean <code>NULL</code> and does not pass the filter.

It is interesting that this additional <code>NULL</code> check is never used for <q>greater than</q> queries (like in <code>b.column > 'something'</code>).

This is because <code>NULL</code> go first in <code>ORDER BY</code> in <strong>MySQL</strong> and hence are incorrectly considered <q>less</q> than any other value by some people.

This query can be simplified:


SELECT  *
FROM    b
WHERE   b.column < 'something'
&#91;/sourcecode&#93;

and will still never return a <code>NULL</code> in <code>b.column</code>.

<h3 class="cracked">#7. Joining on NULL</h3>

<img src="https://explainextended.com/wp-content/uploads/2010/11/MG_3163-e1288839302867.jpg" alt="" title="Helicopter" width="700" height="467" class="aligncenter size-full wp-image-5105 noborder" />


SELECT  *
FROM    a
JOIN    b
ON      a.column = b.column

When column is nullable in both tables, this query won't return a match of two NULLs for the reasons described above: no NULLs are equal.

Here's a query to do that:

SELECT  *
FROM    a
JOIN    b
ON      a.column = b.column
        OR (a.column IS NULL AND b.column IS NULL)

MySQL's optimizer treats this as an equijoin and provides a special join condition, ref_or_null.

#6. NOT IN with NULL values

SELECT  a.*
FROM    a
WHERE   a.column NOT IN
        (
        SELECT column
        FROM    b
        )

This query will never return anything if there is but a single NULL in b.column. As with other predicates, both IN and NOT IN against NULL evaluate to NULL.

This should be rewritten using a NOT EXISTS:

SELECT  a.*
FROM    a
WHERE   NOT EXISTS
        (
        SELECT NULL
        FROM    b
        WHERE   b.column = a.column
        )

Unlike IN, EXISTS always evaluates to either true or false.

#5. Ordering random samples

SELECT  *
FROM    a
ORDER BY
        RAND(), column
LIMIT 10

This query attempts to select 10 random records ordered by column.

ORDER BY orders the output lexicographically: that is, the records are only ordered on the second expression when the values of the first expression are equal.

However, the results of RAND() are, well, random. It's infeasible that the values of RAND() will match, so ordering on column after RAND() is quite useless.

To order the randomly sampled records, use this query:

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        ORDER BY
                RAND()
        LIMIT 10
        ) q
ORDER BY
        column

#4. Sampling arbitrary record from a group

This query intends to select one column from each group (defined by grouper)

SELECT  DISTINCT(grouper), a.*
FROM    a

DISTINCT is not a function, it's a part of SELECT clause. It applies to all columns in the SELECT list, and the parentheses here may just be omitted. This query may and will select the duplicates on grouper (if the values in at least one of the other columns differ).

Sometimes, it's worked around using this query (which relies on MySQL's extensions to GROUP BY):

SELECT  a.*
FROM    a
GROUP BY
        grouper

Unaggregated columns returned within each group are arbitrarily taken.

At first, this appears to be a nice solution, but it has quite a serious drawback. It relies on the assumption that all values returned, though taken arbitrarily from the group, will still belong to one record.

Though with current implementation is seems to be so, it's not documented and can be changed in any moment (especially if MySQL will ever learn to apply index_union after GROUP BY). So it's not safe to rely on this behavior.

This query would be easy to rewrite in a cleaner way if MySQL supported analytic functions. However, it's still possible to make do without them, if the table has a PRIMARY KEY defined:

SELECT  a.*
FROM    (
        SELECT  DISTINCT grouper
        FROM    a
        ) ao
JOIN    a
ON      a.id = 
        (
        SELECT  id
        FROM    a ai
        WHERE   ai.grouper = ao.grouper
        LIMIT 1
        )

#3. Sampling first record from a group

This is a variation of the previous query:

SELECT  a.*
FROM    a
GROUP BY
        grouper
ORDER BY
        MIN(id) DESC

Unlike the previous query, this one attempts to select the record holding the minimal id.

Again: it is not guaranteed that the unaggregated values returned by a.* will belong to a record holding MIN(id) (or even to a single record at all).

Here's how to do it in a clean way:

SELECT  a.*
FROM    (
        SELECT  DISTINCT grouper
        FROM    a
        ) ao
JOIN    a
ON      a.id = 
        (
        SELECT  id
        FROM    a ai
        WHERE   ai.grouper = ao.grouper
        ORDER BY
                ai.grouper, ai.id
        LIMIT 1
        )

This query is just like the previous one but with ORDER BY added to ensure that the first record in id order will be returned.

#2. IN and comma-separated list of values

This query attempts to match the value of column against any of those provided in a comma-separated string:

SELECT  *
FROM    a
WHERE   column IN ('1, 2, 3')

This does not work because the string is not expanded in the IN list.

Instead, if column column is a VARCHAR, it is compared (as a string) to the whole list (also as a string), and of course will never match. If column is of a numeric type, the list is cast into the numeric type as well (and only the first item will match, at best).

The correct way to deal with this query would be rewriting it as a proper IN list

SELECT  *
FROM    a
WHERE   column IN (1, 2, 3)

, or as an inline view:

SELECT  *
FROM    (
        SELECT  1 AS id
        UNION ALL
        SELECT  2 AS id
        UNION ALL
        SELECT  3 AS id
        ) q
JOIN    a
ON      a.column = q.id

, but this is not always possible.

To work around this without changing the query parameters, one can use FIND_IN_SET:

SELECT  *
FROM    a
WHERE   FIND_IN_SET(column, '1,2,3')

This function, however, is not sargable and a full table scan will be performed on a.

#1. LEFT JOIN with COUNT(*)

SELECT  a.id, COUNT(*)
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
GROUP BY
        a.id

This query intends to count number of matches in b for each record in a.

The problem is that COUNT(*) will never return a 0 in such a query. If there is no match for a certain record in a, the record will be still returned and counted.

COUNT should be made to count only the actual records in b. Since COUNT(*), when called with an argument, ignores NULLs, we can pass b.a to it. As a join key, it can never be a null in an actual match, but will be if there were no match:

SELECT  a.id, COUNT(b.a)
FROM    a
LEFT JOIN
        b
ON      b.a = a.id
GROUP BY
        a.id

P.S. In case you were wondering: no, the pictures don't have any special meaning. I just liked them.

Written by Quassnoi

November 3rd, 2010 at 11:00 pm

Posted in MySQL

45 Responses to '10 things in MySQL (that won’t work as expected)'

Subscribe to comments with RSS

  1. I like your posts.

    Anyways about this post I specifically liked
    – #8. Less than a value but not a NULL
    – P.S. In case you were wondering: no, the pictures don’t have any special meaning. I just liked them.

    I have to say:
    For point #5. Ordering random samples
    amm…I think if we use the optimized way, we don’t need to bother about “expected from mysql”:

    SELECT * FROM a where id>= floor(rand() * (select max(id) from a)) order by column limit 10;

    Thanks.

    Kedar

    4 Nov 10 at 09:44

  2. From the ten things only

    #3. Sampling first record from a group

    and

    #4. Sampling arbitrary record from a group

    are MySQL specific, all other items are about general SQL behavior.

    For most of the NULL related issues raised there *is* a MySQL specific solution though: the operator which behaves like regular = in most cases, but returns TRUE instead of NULL when comparing two NULL values as

    NULL NULL

    http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#operator_equal-to

    It helps to think of NULL as “something i don’t know anything about besides the fact that i don’t know anything about it (although i may know its data type)”

    For something you don’t know anything about it simply does not make sense to ask if its larger, smaller or equal to something else, no matter what you know about that something else. The result of such a comparison *has to be* ‘something i don’t know about’
    aka. NULL again … and that’s how SQL NULL is supposed to work in *every* SQL dialect

    hartmut

    4 Nov 10 at 10:03

  3. mysql> select ‘1’ in (‘1,2,3’);
    +——————+
    | ‘1’ in (‘1,2,3’) |
    +——————+
    | 0 |
    +——————+
    1 row in set (0.00 sec)

    mysql> select 1 in (‘1,2,3’);
    +—————-+
    | 1 in (‘1,2,3’) |
    +—————-+
    | 1 |
    +—————-+
    1 row in set, 1 warning (0.00 sec)

    so in part #2. IN and comma-separated list of values
    , the column is NOT cast to varchar.

    huarong

    4 Nov 10 at 10:25

  4. mysql> select 1 in (‘1,2,3’);
    +—————-+
    | 1 in (‘1,2,3’) |
    +—————-+
    | 1 |
    +—————-+
    1 row in set, 1 warning (0.00 sec)

    mysql> show warnings;
    +———+——+——————————————-+
    | Level | Code | Message |
    +———+——+——————————————-+
    | Warning | 1292 | Truncated incorrect DOUBLE value: ‘1,2,3’ |
    +———+——+——————————————-+
    1 row in set (0.00 sec)

    actually the string has been casted to DOUBLE value.

    huarong

    4 Nov 10 at 10:35

  5. Hi!

    nice post. But, I agree with Hartmut – most items in here are about SQL’s NULL. I wouldn’t say it offers unexpected behaviour, it’s just that you need to be aware of its semantics. Anyway.

    In Hartmuts comment, it seems the operator he mentioned is stripped away due to html escaping, but I think he means “the NULL-safe equals to operator”:

    <=>

    (see http://dev.mysql.com/doc/refman/5.1/en/comparison-operators.html#operator_equal-to)

    Roland Bouman

    4 Nov 10 at 12:19

  6. Nice

    kisglaci

    4 Nov 10 at 12:33

  7. Hi,

    Good writeup. With regard to #4, see a single-query solution: Selecting a specific non aggregated column data in GROUP BY

    Shlomi Noach

    4 Nov 10 at 20:05

  8. For #3, couldn’t you just avoid the subquery by doing a left join, a la http://www.xaprb.com/blog/2007/03/14/how-to-find-the-max-row-per-group-in-sql-without-subqueries/ ? Something like

    SELECT a1.*
    FROM a as a1
    LEFT JOIN a as a2
    ON a1.grouper = a2.grouper
    AND a1.val > a2.val
    WHERE a2.val IS NULL

    ggunson

    5 Nov 10 at 03:24

  9. @Kedar: the distribution of random values on such a query will be greatly skewed.

    Quassnoi

    5 Nov 10 at 14:02

  10. @huarong: you are right, will correct it in the post. Thanks!

    Quassnoi

    5 Nov 10 at 14:03

  11. @Shlomi: nice solution, though it will be inefficient (if feasible at all) if the groups are large.

    Quassnoi

    5 Nov 10 at 14:04

  12. @ggunson: you can, but it will be less efficient.

    Quassnoi

    5 Nov 10 at 14:05

  13. What’s with the random photos? Quite distracting…

    kats

    8 Nov 10 at 07:11

  14. Nice post. I’m already aware of some situations you described, and will put in practice the other ones. :)

    Thx,
    Pampa

    Fernando Martin

    8 Nov 10 at 07:13

  15. @kats: you can always disable pictures in your browser.

    Quassnoi

    8 Nov 10 at 12:51

  16. LOL Quassnoi. :-P

    Anyway, looking forward to more of your great posts… I thought more were going to be coming from a couple months ago when you posted that you had a backlog of questions to answer? You mentioned like 6-8 names. :-)

    Matt

    8 Nov 10 at 18:04

  17. @Matt: some are answered privately, the other ones are yet to answer. I’ve almost oversold my consulting services for the past four months and had no time for blogging and participating on Stack Overflow :) Thanks and stay subscribed :)

    Quassnoi

    8 Nov 10 at 18:13

  18. Nice post. As for the second situation “#2. IN and comma-separated list of values”:

    >SELECT *
    >FROM a
    >WHERE column IN (‘1, 2, 3’)

    Why quote all values at the same time? To achieve what you want one can write:

    SELECT *
    FROM a
    WHERE column IN (‘1’, ‘2’, ‘3’)

    So nothing special here. The same result can be achieved using query:

    SELECT *
    FROM a
    WHERE column = ‘1’ OR column = ‘2’ OR column = ‘3’

    geniuss

    10 Nov 10 at 14:09

  19. @geniuss: the other 9 queries are also wrong. Aren’t you going to repost solutions for them as well?

    Quassnoi

    10 Nov 10 at 14:21

  20. @Quassnoi

    Well, I agree but others are not obvious at first glance and that case is really simple.

    geniuss

    10 Nov 10 at 15:03

  21. @geniuss: to me, all 10 queries are obviously wrong. But, unfortunately, not everybody is so smart as you and me are.

    Here are just three random links from StackOverflow:

    http://stackoverflow.com/questions/4037145/mysql-how-to-select-rows-where-value-is-in-array
    http://stackoverflow.com/questions/3946831/mysql-where-problem
    http://stackoverflow.com/questions/3734161/mysql-select-where-a-in-field

    , all asking how to deal with situation described in #2.

    The goal of the article was to tell how to deal with common faults, and #2 is quite common (though still obvious to a seasoned developer).

    Quassnoi

    10 Nov 10 at 15:08

  22. If that is the case then ok ;)

    geniuss

    10 Nov 10 at 15:24

  23. nice :D

    AbiusX

    28 Mar 11 at 06:40

  24. Usefull. Thx!

    Kcko

    20 Apr 11 at 18:21

  25. about #10: “nooooothing compares, nothing compares.. to null”

    Sinead O'Connor

    27 Jul 11 at 01:46

  26. @Sinead: it does, in special cases.

    Quassnoi

    27 Jul 11 at 02:41

  27. Really useful. Thanks a lot.

    Saji Nediyanchath

    20 Dec 11 at 08:31

  28. Is it possible to do
    >SELECT *
    >FROM a
    >WHERE column IN (3, 1, 2)

    and get the values back in the list order? MySQL seems to always order ascending – I’m guessing it’s to do with the way it searches the database for the list values

    Tim

    12 Jun 12 at 14:20

  29. @Tim: this way:

    SELECT  a.*
    FROM    (
            SELECT  3 id, 1 o
            UNION ALL
            SELECT  1 id, 2 o
            UNION ALL
            SELECT  2 id, 3 o
            ) q
    JOIN    a
    ON      a.column = q.id
    ORDER BY
            o
    

    Quassnoi

    13 Jun 12 at 10:05

  30. Hmm .. easy to do with a few fixed values, a little harder with a random number of items to search for.
    But thanks for the pointer

    Tim

    13 Jun 12 at 15:26

  31. Nice work but the picture titles and descriptions are really confusing. Other than that, a great photo blog.

    Andy

    20 Aug 12 at 21:36

  32. Thanx alot!!! I was breaking my nuts on a query and your point #9 & #1 helped me out. Phewww!!

    akC

    24 Dec 12 at 14:41

  33. Thanks for a great post. Great explanation on #2 – I’ve been using “WHERE IN” for years and ran up against a situation where it wasn’t working consistently–this helped me devise a solution.

    David Albert

    9 Jul 13 at 07:34

  34. Last pic is taken in the center of Saint-Petersburg, Russia. Zinger’s House. VKHQ.

    Alexander

    15 Sep 14 at 17:48

  35. @Alexander: what’s VKHQ?

    Quassnoi

    15 Sep 14 at 20:48

  36. For me, these are interesting observations that still apply. But how they differ from expected (ie. behaviour of DB2, Oracle, SQLServer) is not clear. Prticularly for 2. this is very much expected.

    However we could have considered the “surprises” at a fundamental level.

    Number one issue for me.
    – trailing spaces dropped from strings.

    If I populate a char(5) index column with :
    UNHEX(‘FFFF202020’)

    .. the last three bytes are lost.

    This requires dropping the same from selection criteria to match or padding the column value and is particularly messy for hashes or binaries.

    I don’t know anywhere else this is considered normal.

    Matthew

    3 Nov 14 at 21:05

  37. I disagree with “#9. LEFT JOIN with additional conditions”

    You want records where b.column = ‘something’ or b.column is null, so just write it.

    name

    28 Jul 15 at 12:23

  38. @name: what exactly is it you don’t agree with?

    Quassnoi

    28 Jul 15 at 18:07

  39. Thanks so much! #9, the left join with conditions, was driving me crazy giving me the wrong results until I altered how I was writing the conditions, following your post. After I did that, it worked great!

    Domini

    4 Aug 15 at 00:45

  40. For #8 too, beware of removing “redundant” clauses without looking at the performance result. Quite often the extra criteria is there to influence an access path or “strongly suggest” a particular index be used. One of the risks of this is there may be no effect on dev sized data used in test but a real benefit on the full sized production database. In these cases you would probably have an in house DBA reviewing the changes in any case as the queries concerned will not be as simple as this example.

    Matthew

    4 Aug 15 at 01:01

  41. Hi!Thanks for providing such a great article.
    These are just basic things for dba’s but very powerful.
    WE all need them in later stages also.

    Anil Joshi

    19 Aug 15 at 12:54

  42. Thank you for the great info! #9 was tripping me for almost a day (I am sort of new to SQL, but as far as I was concerned the left join x on y where id=1 should be working!). Your article saved me much grief. The other points might save me some more in the near future too. So, again, thank you.

    erosan

    20 Feb 16 at 23:08

  43. > In case you were wondering: no, the pictures don’t have any special meaning. I just liked them.

    Uh, really? They look totally out of context! I was trying to make sense, I even thought the picture for left join is used because it has a big railing on the left side that is joined with the road in the middle. Honestly, you could put them in a separate post and just link it here.

    AD

    6 May 18 at 05:06

  44. Great insight. Some how it is strange that the list conveniently ends at 10. I suppose that is why it is the TOP 10. More sure would be interesting.

    me

    11 May 18 at 18:47

  45. FIND_IN_SET saved me thanks

    Vanawy

    7 May 21 at 15:37

Leave a Reply