Selecting last forum posts
Comments enabled. I *really* need your comment
From Stack Overflow:
I built a custom forum for my site using MySQL.
The listing page is essentially a table with the following columns: Topic, Last Updated, and Replies.
The DB table has the following columns:
id name body date topic_id A topic has the
topic_id
of 0, and replies have thetopic_id
of their parent topic.How do I query 20 last updated topics efficiently?
We have two kinds of topics here: those that had been answered and those that had not.
The queries for these two kinds of topics should be optimized using different methods and it's better to query for them using two different queries merged later using UNION ALL
:
- To find last 20 replies, we will employ the technique described in this article:
This technique uses a composite index on
(date, id)
to find 20 latest replies to distincttopic_id
's:- First, it scans the index on
(date, id)
to retrieve records with a non-zerotopic_id
in descending date order. - For each record found on step 1, it scans the index on
(date, id)
again in descending order and finds the first record with the sametopic_id
. - If the
id
's of records found on steps 1 and 2 match, this means that the record found on step 1 is the latest for thistopic_id
and we should return it.
- First, it scans the index on
- To find last 20 unreplied topics, we just use a good old
NOT EXISTS
.In this case a composite index on
(topic_id, date)
will come handy: we will need to select the last20
unreplied topics, and this index will be useful to implement both filtering ontopic_id
and ordering bydate DESC
. - Finally, we will merge these resultsets using
UNION ALL
and limit the result.
Now, let's create a sample table:
CREATE TABLE filler ( id INT NOT NULL PRIMARY KEY AUTO_INCREMENT ) ENGINE=Memory; CREATE TABLE wp_pod_tbl_forum ( id INT NOT NULL PRIMARY KEY, name VARCHAR(50) NOT NULL, body VARCHAR(1000) NOT NULL, date DATETIME NOT NULL, topic_id INT NOT NULL, email VARCHAR(100), KEY ix_forum_topic_date (topic_id, date), KEY ix_forum_date (date) ) ENGINE=InnoDB DEFAULT CHARSET=UTF8; DELIMITER $$ CREATE PROCEDURE prc_filler(cnt INT) BEGIN DECLARE _cnt INT; SET _cnt = 1; WHILE _cnt <= cnt DO INSERT INTO filler SELECT _cnt; SET _cnt = _cnt + 1; END WHILE; END $$ DELIMITER ; START TRANSACTION; CALL prc_filler(100000); COMMIT; INSERT INTO wp_pod_tbl_forum (id, name, body, date, topic_id) SELECT id, CONCAT('Post ', id), RPAD(CONCAT('Message ', id), 1000, ' '), CAST('2009-07-07 00:00:00' AS DATETIME) - INTERVAL CASE WHEN id <= 10 THEN 0 ELSE RAND(20090707) * 100000 END MINUTE, CASE WHEN id <= 1000 THEN 0 ELSE FLOOR(RAND(20090707 << 1) * 990) + 10 END FROM filler; [/sourcecode] We have a random-filled forum table with <strong>100,000</strong> records. The first <strong>1,000</strong> records are topics, all other are replies. First <strong>10</strong> topics are unreplied. Post dates are completely random except for the first <strong>9</strong> unreplied topics, which are the latest records in the whole table. Now, let's compose the query. This query selects the <code>topic_id</code> and the date of <strong>20</strong> last replies to <strong>20</strong> distinct topics: SELECT topic_id, date AS last_reply FROM wp_pod_tbl_forum fo WHERE id = ( SELECT id FROM wp_pod_tbl_forum fp WHERE fp.topic_id = fo.topic_id ORDER BY fp.date DESC, fp.id DESC LIMIT 1 ) AND fo.topic_id <> 0 ORDER BY fo.date DESC, fo.id DESC LIMIT 20
topic_id | last_reply | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
700 | 2009-07-06 23:59:00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
405 | 2009-07-06 23:58:00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
329 | 2009-07-06 23:43:00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
606 | 2009-07-06 23:43:00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 rows fetched in 0.0005s (0.0065s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | fo | index | ix_forum_topic_date | ix_forum_date | 8 | 39 | 128228.20 | Using where | |
2 | DEPENDENT SUBQUERY | fp | index | ix_forum_topic_date | ix_forum_date | 8 | 1 | 4800.00 | Using where |
To find the number of replies and the topic name, we need to join it with the table again and calculate the total number of replies.
The latter is better done in a SELECT
list subquery rather than in a GROUP
'ed aggregate:
SELECT id, name, last_reply, ( SELECT COUNT(*) FROM wp_pod_tbl_forum fc WHERE fc.topic_id = fl.topic_id ) AS replies FROM ( SELECT topic_id, date AS last_reply FROM wp_pod_tbl_forum fo WHERE id = ( SELECT id FROM wp_pod_tbl_forum fp WHERE fp.topic_id = fo.topic_id ORDER BY fp.date DESC, fp.id DESC LIMIT 1 ) AND fo.topic_id <> 0 ORDER BY fo.date DESC, fo.id DESC LIMIT 20 ) fl JOIN wp_pod_tbl_forum ft ON ft.id = fl.topic_id
id | name | last_reply | replies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
700 | Post 700 | 2009-07-06 23:59:00 | 101 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
405 | Post 405 | 2009-07-06 23:58:00 | 132 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
329 | Post 329 | 2009-07-06 23:43:00 | 96 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
606 | Post 606 | 2009-07-06 23:43:00 | 124 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 rows fetched in 0.0007s (0.0100s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived3> | ALL | 20 | 100.00 | |||||
1 | PRIMARY | ft | eq_ref | PRIMARY | PRIMARY | 4 | fl.topic_id | 1 | 100.00 | |
3 | DERIVED | fo | index | ix_forum_topic_date | ix_forum_date | 8 | 39 | 128228.20 | Using where | |
4 | DEPENDENT SUBQUERY | fp | index | ix_forum_topic_date | ix_forum_date | 8 | 1 | 4800.00 | Using where | |
2 | DEPENDENT SUBQUERY | fc | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | fl.topic_id | 48 | 100.00 | Using index |
Now, let's find the records for last 20 unanswered topics. We do it by using NOT EXISTS
:
SELECT id, name, date AS last_reply, 0 AS replies FROM wp_pod_tbl_forum t WHERE NOT EXISTS ( SELECT NULL FROM wp_pod_tbl_forum r WHERE r.topic_id = t.id ) AND t.topic_id = 0 ORDER BY date DESC, id DESC LIMIT 20
id | name | last_reply | replies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | Post 10 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | Post 9 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | Post 8 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | Post 7 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | Post 6 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | Post 5 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | Post 4 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | Post 3 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Post 2 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | Post 1 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
10 rows fetched in 0.0004s (0.0253s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | t | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | const | 2000 | 100.00 | Using where |
2 | DEPENDENT SUBQUERY | r | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | 20090707_group.t.id | 48 | 100.00 | Using index |
There are but 10 unreplied topics in the table.
Note that we are selecting a constant zero as replies
, since there are no actual replies to these topics, but UNION ALL
requires us to have this column.
Finally, we merge these two queries:
SELECT id, name, last_reply, ( SELECT COUNT(*) FROM wp_pod_tbl_forum fc WHERE fc.topic_id = fl.topic_id ) AS replies FROM ( SELECT topic_id, date AS last_reply FROM wp_pod_tbl_forum fo WHERE id = ( SELECT id FROM wp_pod_tbl_forum fp WHERE fp.topic_id = fo.topic_id ORDER BY fp.date DESC, fp.id DESC LIMIT 1 ) AND fo.topic_id <> 0 ORDER BY fo.date DESC, fo.id DESC LIMIT 20 ) fl JOIN wp_pod_tbl_forum ft ON ft.id = fl.topic_id UNION ALL SELECT * FROM ( SELECT id, name, date AS last_reply, 0 AS replies FROM wp_pod_tbl_forum t WHERE NOT EXISTS ( SELECT NULL FROM wp_pod_tbl_forum r WHERE r.topic_id = t.id ) AND t.topic_id = 0 ORDER BY date DESC, id DESC LIMIT 20 ) fu ORDER BY last_reply DESC, id DESC LIMIT 20
id | name | last_reply | replies | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | Post 10 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | Post 9 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | Post 8 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | Post 7 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | Post 6 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | Post 5 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | Post 4 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | Post 3 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | Post 2 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 | Post 1 | 2009-07-07 00:00:00 | 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
700 | Post 700 | 2009-07-06 23:59:00 | 101 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
405 | Post 405 | 2009-07-06 23:58:00 | 132 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
296 | Post 296 | 2009-07-06 23:58:00 | 94 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
312 | Post 312 | 2009-07-06 23:56:00 | 92 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
160 | Post 160 | 2009-07-06 23:55:00 | 90 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
974 | Post 974 | 2009-07-06 23:54:00 | 104 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
129 | Post 129 | 2009-07-06 23:54:00 | 98 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
836 | Post 836 | 2009-07-06 23:53:00 | 99 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
664 | Post 664 | 2009-07-06 23:53:00 | 112 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
245 | Post 245 | 2009-07-06 23:52:00 | 113 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
20 rows fetched in 0.0009s (0.0311s) |
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived3> | ALL | 20 | 100.00 | |||||
1 | PRIMARY | ft | eq_ref | PRIMARY | PRIMARY | 4 | fl.topic_id | 1 | 100.00 | |
3 | DERIVED | fo | index | ix_forum_topic_date | ix_forum_date | 8 | 39 | 128228.20 | Using where | |
4 | DEPENDENT SUBQUERY | fp | index | ix_forum_topic_date | ix_forum_date | 8 | 1 | 4800.00 | Using where | |
2 | DEPENDENT SUBQUERY | fc | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | fl.topic_id | 48 | 100.00 | Using index |
5 | UNION | <derived6> | ALL | 10 | 100.00 | |||||
6 | DERIVED | t | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | 2000 | 100.00 | Using where | |
7 | DEPENDENT SUBQUERY | r | ref | ix_forum_topic_date | ix_forum_topic_date | 4 | 20090707_group.t.id | 48 | 100.00 | Using index |
UNION RESULT | <union1,5> | ALL | Using filesort |
This query completes in 30 ms, which is allright even for a heavy loaded forum.
Note that this time hardly depends on the total number of columns in the system:
- Both subqueries select first 20 values using appropriate indexes
NOT EXISTS
uses the index on(topic_id, date)
and stops on the first row- The only filesort we see in the plan sorts the final 40 records in constant time (and the modern computers are fast in sorting 40 records like these)
So as soon as the topics replies are balanced (i. e. you don't need to dig too deep into reply history to find 20 distinct topics), and there are no overgrown topics (which would hamper COUNT(*)
performance), this query will complete in constant time which will most probably be dozens of milliseconds.