Archive for March 18th, 2010
Answering questions asked on the site.
Vlad Enache asks:
In MySQL I have a table called
meaningswith three columns:
meanings person word meaning 1 1 4 1 2 19 1 2 7 1 3 5
wordhas 16 possible values,
A person assigns one or more meanings to each word. In the sample above, person 1 assigned two meanings to word 2.
There will be thousands of persons.
I need to find the top three meanings for each of the 16 words, with their frequencies. Something like:
- word 1: meaning 5 (35% of people), meaning 19 (22% of people), meaning 2 (13% of people)
- word 2: meaning 8 (57%), meaning 1 (18%), meaning 22 (7%)
Is it possible to solve this with a single MySQL query?
This task is a typical greatest-n-per-group problem.
Earlier I described some solutions to it, one using session variables, and another one using
LIMIT in a subquery. However, these solutions imply that the records are taken from a single table, while this task needs to retrieve three greatest aggregates, not three greatest records. It is not recommended to mix the session variables with the
JOINs, and using the
LIMIT solution would be inefficient, since the aggregates can not be indexed.
Some databases, namely, PostgreSQL, used to exploit the array functionality for this task (before the window functions were introduced in 8.4).
Unfortunately, MySQL does not support arrays, but we can emulate this behavior using string functions and
Let's create a sample table:
Read the rest of this entry »