Selecting timestamps for a time zone
From Stack Overflow:
How can I reuse a computed column in an SQL query in MySQL?
My query is something like this:
SELECT CONVERT_TZ( IF(timestamp_start > last_update, timestamp_start, last_update), 'GMT', user.timezone ) AS time_usr_tz FROM schedule JOIN user ON schedule.user_id = user.user_id WHERE CONVERT_TZ( IF(timestamp_start > last_update, timestamp_start, last_update), 'GMT', user.timezone ) < CURRENT_TIMESTAMP()As you can see, the
CONVERT_TZ (…)part is repeated.This is only a sample query. Actually, I have to use that computed column several times. So if I make change in one place, I have to change in many places. And the size of the query becomes scary, too.
Is there any way to avoid such duplication?
The duplication part is simple. We just need to wrap the query in a subquery or use HAVING clause, like this:
SELECT *
FROM (
SELECT CONVERT_TZ(
IF(timestamp_start > last_update, timestamp_start, last_update),
'GMT',
user.timezone
) AS time_usr_tz
FROM schedule
JOIN user
ON schedule.user_id = user.user_id
) q
WHERE time_usr_tz < CURRENT_TIMESTAMP()
, or this:
SELECT CONVERT_TZ(
IF(timestamp_start > last_update, timestamp_start, last_update),
'GMT',
user.timezone
) AS time_usr_tz
FROM schedule
JOIN user
ON schedule.user_id = user.user_id
HAVING time_usr_tz < CURRENT_TIMESTAMP()
But what I'd really like to write about today is the performance of such a query.
This query looks for the list of schedules and finds the schedules the were to be started by the time the query is issued.
The problem is that the schedule timestamps are not bound to any timezone, and should be converted to the user's timezone to find out if they're are due or not.
That can be seen as a design flaw, but in fact that can be exactly what a user needs. Say, a user wants to be woken up every day at 9:00 and reminded to take a pill at 13:00, but if she goes from Washington, D.C. to Honolulu, she may want all her alarm clocks to change from EST to HST. Instead of updating all the timestamps it's of course easier to update a single field in her profile.
The query uses columns from both tables inside a function in the WHERE condition, that's why no indexes can be used to filter the timestamps.
If schedule is large enough, this can be a problem for a busy service.
Now, let's create the sample tables and see what can be done. For the sake of simplicity, we'll leave only timestamp_start column in the schedules, which is enough to illustrate the point.
Table creation details
The tables above contain 1,000,000 rows in schedule and 10,000 rows in user. Timezones are uniformly distributed over all possible values from UTC -12:00 (Baker Island) to UTC +14:00 (Line Islands).
The original query:
SELECT *
FROM (
SELECT s.id, s.user_id,
CONVERT_TZ(timestamp_start, '+00:00', timezone) AS time_usr_tz
FROM schedule s
JOIN user u
ON s.user_id = u.user_id
) q
WHERE time_usr_tz >= '2009-04-25 10:00:00'
| id | user_id | time_usr_tz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 94569 | 4570 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 123654 | 3655 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 198401 | 8402 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 478347 | 8348 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 488454 | 8455 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 502076 | 2077 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 688301 | 8302 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 822783 | 2784 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 960232 | 233 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 rows fetched in 0.0005s (4.5625s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
runs for almost 5 seconds, and uses FULL TABLE SCAN on schedule:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | <derived2> | ALL | 1000000 | 100.00 | Using where | ||||
| 2 | DERIVED | s | ALL | 1003088 | 100.00 | |||||
| 2 | DERIVED | u | eq_ref | PRIMARY | PRIMARY | 4 | 20090425_timestamp.s.user_id | 1 | 100.00 |
How can we deal with it?
First, we can change the expression a little. We don't need both columns to be in the formula. Instead, we can change the expression so that only timestamp_start will stay on the left side of the equation and the index on (timestamp_start, user_id) can be applied:
SELECT s.id, s.user_id,
CONVERT_TZ(timestamp_start, '+00:00', timezone) AS time_usr_tz
FROM user u
STRAIGHT_JOIN schedule s
ON s.user_id <= u.user_id
AND s.user_id >= u.user_id
AND timestamp_start >= CONVERT_TZ('2009-04-25 10:00:00', timezone, '+00:00')
| id | user_id | time_usr_tz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 960232 | 233 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 502076 | 2077 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 822783 | 2784 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 123654 | 3655 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 94569 | 4570 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 688301 | 8302 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 478347 | 8348 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 198401 | 8402 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 488454 | 8455 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 rows fetched in 0.0003s (0.7344s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | u | ALL | PRIMARY | 10225 | 100.00 | ||||
| 1 | SIMPLE | s | ALL | ix_schedule_user_timestamp,ix_schedule_timestamp | 998125 | 100.00 | Range checked for each record (index map: 0x6) |
This query uses users as a leading table and an index scan on schedule on for each record from users.
Note that we used s.user_id <= u.user_id AND s.user_id >= u.user_id instead of mere s.user_id = u.user_id: this is required to force MySQL to use RANGE CHECKING FOR EACH RECORD.
This query is much faster than before, but this approach required a full scan on users anyway, and 10,000 rows is still many rows, and 734 milliseconds is way too long.
Can we improve this query yet a little more?
Yes, we can.
If we look into the original query, we will see that for each record from schedule and corresponding record from user we need to perform some calculation on timestamp_start and timezone, and compare the resulting value to the constant. The fact of calculation being performed hinders the index from being used.
But what is this calculation, in fact? It just adds or subtracts some hours from timestamp_start, according to the timezone in the user's profile. This calculation cannot return anything less than -12:00 or more than +14:00 from the value stored in timestamp_start, since there are no places on Earth where time differs from the UTC more than these two extreme cases.
Since we need all timestamps that, beign converted to user's time zone, are less than a given current date, we can safely add a coarse filtering condition to our query that filters out all timestamp_start's earlier than 2009-04-25 10:00:00 -12:00, since in no case our expression can satisfy something earlier than that.
We can use this coarse filter to make the index usable, and then fine-filter the results just like before:
SELECT *
FROM (
SELECT s.id, s.user_id,
CONVERT_TZ(timestamp_start, '+00:00', timezone) AS time_usr_tz
FROM schedule s
JOIN user u
ON u.user_id = s.user_id
WHERE s.timestamp_start >= CONVERT_TZ('2009-04-25 10:00:00', '+00:00', '-12:00')
) q
WHERE time_usr_tz >= '2009-04-25 10:00:00'
| id | user_id | time_usr_tz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 198401 | 8402 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 688301 | 8302 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 94569 | 4570 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 123654 | 3655 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 478347 | 8348 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 488454 | 8455 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 502076 | 2077 | 2009-04-25 10:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 822783 | 2784 | 2009-04-25 12:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 960232 | 233 | 2009-04-25 11:00:00 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 9 rows fetched in 0.0003s (0.0031s) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | PRIMARY | <derived2> | ALL | 59 | 100.00 | Using where | ||||
| 2 | DERIVED | s | range | ix_schedule_user_timestamp,ix_schedule_timestamp | ix_schedule_timestamp | 8 | 59 | 100.00 | Using where | |
| 2 | DERIVED | u | eq_ref | PRIMARY | PRIMARY | 4 | 20090425_timestamp.s.user_id | 1 | 100.00 |
By adding the seemingly redundant expression (that in fact serves as an efficient coarse filter), we make the index usable and the overall query time drops from 4,562 ms to 31 ms.
More than 100 times improvement in speed.
Subscribe in a reader