Archive for March 11th, 2009
Analytic functions: optimizing SUM and ROW_NUMBER
Comments enabled. I *really* need your comment
In the previous articles I wrote about emulating numerous analytic function in MySQL.
Using methods described above, it's possible to emulate almost all analytic functions present in Oracle and SQL Server.
Here are these methods in a nutshell:
- Select all table rows ordered by
PARTITION BY
columns, then byORDER BY
columns of the analytic function - Track the grouing sets by using session variables initialized in the first subquery
- If the analytic function needs some precalculations to be evaluated (like, count of the rows in the grouping set, sum of the values etc), join the table with the precalculated aggregates
- Use state session variables to calculate the analytic function and store intermediate values between rows
- Initialize state session variables whenever the grouping set changes
This may sound confusing, but if you take a look on the examples from the previous articles, it will become clear as a bell.
This methods work and work well, if you need to select all rows from the tables.
But what if you need to implement some filtering? Do we really need to count millions of rows if we need first three? Do we really need to inspect all rows to find a maximum if we have an index?
Of course, no.
Analytic functions can be optimized as well as any other queries.
Read the rest of this entry »