EXPLAIN EXTENDED

How to create fast database queries

My latest article on SQL in general: Happy New Year!. You're welcome to read and comment on it.

Matching 3 of 4

From Stack Overflow:

Say I have a query like this:

SELECT  *
FROM    my_table
WHERE   name = "john doe"
        AND phone = "8183321234"
        AND email = "johndoe@yahoo.com"
        AND address = "330 some lane"

But say I only need 3 conditions out of the 4 to match.

I know I can write a very long query with several OR‘s, but I was wondering if there was a feature for this?

A nice and elegant solution proposed by Tomalak says:

SELECT  *
FROM    my_table
WHERE   CASE WHEN name = "john doe"           THEN 1 ELSE 0 END +
        CASE WHEN phone = "8183321234"        THEN 1 ELSE 0 END +
        CASE WHEN email = "johndoe@yahoo.com" THEN 1 ELSE 0 END +
        CASE WHEN address = "330 some lane"   THEN 1 ELSE 0 END
        >= 3

, but, as noted by the author, it does not use the indexes.

Since the values seem to be quite selective, it will be better to use indexes on them.

Let’s create sample tables:

Table creation details

Here we have 100 distinct values on each of the fields (except phone, which has 90).

The straightforward query:

SELECT  *
FROM    t_info t
WHERE   CASE WHEN name  = 'Eve Chianese' THEN 1 ELSE 0 END +
        CASE WHEN phone = '+15558000042' THEN 1 ELSE 0 END +
        CASE WHEN email = '42@example.com' THEN 1 ELSE 0 END +
        CASE WHEN address = '42 North Lane' THEN 1 ELSE 0 END
        >= 3
id name phone email address
176846 Eve Chianese +15558000066 42@example.com 42 North Lane
445548 Eve Chianese +15558000099 42@example.com 42 North Lane
988383 Alice de Matteo +15558000042 42@example.com 42 North Lane
3 rows fetched in 0.0002s (3.0312s)

uses full table scan:

EXPLAIN EXTENDED
SELECT  *
FROM    t_info t
WHERE   CASE WHEN name  = 'Eve Chianese' THEN 1 ELSE 0 END +
        CASE WHEN phone = '+15558000042' THEN 1 ELSE 0 END +
        CASE WHEN email = '42@example.com' THEN 1 ELSE 0 END +
        CASE WHEN address = '42 North Lane' THEN 1 ELSE 0 END
        >= 3
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t ALL 1000204 100.00 Using where
1 row fetched in 0.0003s (0.0030s)

and runs for more than 3 seconds.

To enable indexes we need:

  1. Make a range scan of each of the conditions
  2. UNION ALL the resultsets (it’s UNION ALL the resultsets, not UNION all the resultsets!)
  3. COUNT(*) the matches grouped by id
  4. Filter out those id‘s with COUNT(*) < 3
  5. Join with the original table

Here's the query:

SELECT  i.*
FROM    (
        SELECT  id, COUNT(*)
        FROM    (
                SELECT  id
                FROM    t_info t
                WHERE   name  = 'Eve Chianese'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   phone = '+15558000042'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   email = '42@example.com'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   address = '42 North Lane'
                ) q
        GROUP BY
                id
        HAVING  COUNT(*) >= 3
        ) dq
JOIN    t_info i
ON      i.id = dq.id
id name phone email address
176846 Eve Chianese +15558000066 42@example.com 42 North Lane
445548 Eve Chianese +15558000099 42@example.com 42 North Lane
988383 Alice de Matteo +15558000042 42@example.com 42 North Lane
3 rows fetched in 0.0002s (0.1472s)

This runs much faster (0.15 seconds) and of course uses all four indexes:

EXPLAIN EXTENDED
SELECT  i.*
FROM    (
        SELECT  id, COUNT(*)
        FROM    (
                SELECT  id
                FROM    t_info t
                WHERE   name  = 'Eve Chianese'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   phone = '+15558000042'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   email = '42@example.com'
                UNION ALL
                SELECT  id
                FROM    t_info t
                WHERE   address = '42 North Lane'
                ) q
        GROUP BY
                id
        HAVING  COUNT(*) >= 3
        ) dq
JOIN    t_info i
ON      i.id = dq.id
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <derived2> ALL 3 100.00
1 PRIMARY i eq_ref PRIMARY PRIMARY 4 dq.id 1 100.00
2 DERIVED <derived3> ALL 40977 100.00 Using temporary; Using filesort
3 DERIVED t ref ix_info_name ix_info_name 602 12450 100.00 Using where; Using index
4 UNION t ref ix_info_phone ix_info_phone 62 8330 100.00 Using where; Using index
5 UNION t ref ix_info_email ix_info_email 602 10048 100.00 Using where; Using index
6 UNION t ref ix_info_address ix_info_address 602 9616 100.00 Using where; Using index
UNION RESULT <union3,4,5,6> ALL
8 rows fetched in 0.0008s (0.1483s)

As was proposed by Brent Baisley, it's actually possible to make this query more simple by using INDEX MERGE:

EXPLAIN EXTENDED
SELECT  *
FROM    t_info t
WHERE   (
        name  = 'Eve Chianese'
        OR phone = '+15558000042'
        OR email = '42@example.com'
        OR address = '42 North Lane'
        )
        AND
        CASE WHEN name  = 'Eve Chianese' THEN 1 ELSE 0 END +
        CASE WHEN phone = '+15558000042' THEN 1 ELSE 0 END +
        CASE WHEN email = '42@example.com' THEN 1 ELSE 0 END +
        CASE WHEN address = '42 North Lane' THEN 1 ELSE 0 END
        >= 3
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index_merge ix_info_name,ix_info_phone,ix_info_email,ix_info_address ix_info_name,ix_info_phone,ix_info_email,ix_info_address 602,62,602,602 40444 100.00 Using union(ix_info_name,ix_info_phone,ix_info_email,ix_info_address); Using where
1 row fetched in 0.0004s (0.0032s)

Unfortunately, as I already mentioned in one of the previos articles, INDEX MERGE implementation is not very efficient in MySQL, that's why this query is more slow than that with UNION ALL:

SELECT  *
FROM    t_info t
WHERE   (
        name  = 'Eve Chianese'
        OR phone = '+15558000042'
        OR email = '42@example.com'
        OR address = '42 North Lane'
        )
        AND
        CASE WHEN name  = 'Eve Chianese' THEN 1 ELSE 0 END +
        CASE WHEN phone = '+15558000042' THEN 1 ELSE 0 END +
        CASE WHEN email = '42@example.com' THEN 1 ELSE 0 END +
        CASE WHEN address = '42 North Lane' THEN 1 ELSE 0 END
        >= 3
id name phone email address
176846 Eve Chianese +15558000066 42@example.com 42 North Lane
445548 Eve Chianese +15558000099 42@example.com 42 North Lane
988383 Alice de Matteo +15558000042 42@example.com 42 North Lane
3 rows fetched in 0.0002s (0.3427s)

Summary: for a query that requries some of the indexed values to match, it's better to use UNION ALL of single matches and filter by COUNT(*)'ing the matched records.

Written by Quassnoi

April 17th, 2009 at 11:00 pm

Posted in MySQL

Comments are closed.