Archive for the ‘MySQL’ Category
Counting page views
Comments enabled. I *really* need your comment
From Stack Overflow:
I have tables called
pages
,page_views
,page_items
andpage_votes
.The latter three tables contain a
page_id
foreign key in order to keep a record of each individual view, item and vote that belongs to a page.When I query a page, I also want to retrieve
COUNT(*)
frompage_views
,COUNT(*)
frompage_items
andSUM(vote)
frompage_votes
.I have pasted a query below. It retrieves the total number of views.
SELECT Page.*, COUNT(*) AS views FROM pages AS Page INNER JOIN page_views AS PageView ON Page.id = PageView.page_id GROUP BY Page.id ORDER BY views DESC LIMIT 10 OFFSET 0I've made various attempts to add items and votes to it, but the result is either a syntax error or views/items/votes returned as an identical and
wrongnumber, probably due to the way I am joining.How can I add items and votes to this query?
On a first thought, it's simple:
Read the rest of this entry »
Aggregated hours
Comments enabled. I *really* need your comment
From Stack Overflow:
I would like to know the sum of all web services completions per hour of day.
Obviously, this can be easily done with
SUM()
andGROUP BY
:SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
hour SUM(calls) 0 634 1 642 2 633 3 624 4 420 5 479 6 428 7 424 8 473 9 434 10 485 11 567 12 526 13 513 14 555 15 679 16 624 17 796 18 752 19 843 20 827 21 774 22 647 23 533 My problem is that in old sets, the web service calls in the hours
[00-11]
were already summed up.The simple statement as listed above would therefore lead to
SELECT hour, SUM(calls) FROM sample s GROUP BY hour;
hour SUM(calls) 0 6234 12 526 13 513 14 555 15 679 16 624 17 796 18 752 19 843 20 827 21 774 22 647 23 533 This is an undesirable result. To make the old sets
[00,12,...,23]
comparable to the new sets[00,01,...,23]
, I would like to have one statement that averages the value of[00]
and distributes it over the missing hours, e.g.:
hour SUM(calls) 0 6243/12 1 6243/12 12 526 23 533 I can easily do this using temporary tables or views, but i don't know how to accomplish this without them.
In this rowset, we have a row that needs to be split into 12 another rows.
Read the rest of this entry »
Finding incomplete orders
Comments enabled. I *really* need your comment
Imagine we are keeping an online shop and want to find the customers that don't have complete orders.
We'll make the structure of the orders a little bit complex:
- Each customer may have a number of baskets
- Each basket will have a list of positions in it
- Each position has a number of discounts
- An order is considered complete, when all entites are present: there is at least one basket, all baskets should have at least one position, and each position should have at least one discount
We will keep the data in four tables as following:
Read the rest of this entry »
Analytic functions: optimizing LAG, LEAD, FIRST_VALUE, LAST_VALUE
Comments enabled. I *really* need your comment
In the previous article I wrote about optimized emulation of the analytic functions in MySQL.
Now, let's try to optimize LAG
, LEAD
, FIRST_VALUE
and LAST_VALUE
.
Imagine we have a table that keeps actions of a PC in an online game. This table has the following design:
- Action id
- PC id
- Current PC level
- Current PC score
- Action data
For each action, the current level and current score of the PC are keeped. The table, of course, is designed in such a bad way just to illustrate our task :)
Now, for first 2 player characters, we need to know first 2 actions performed on first 2 levels; how much score did these actions yield; and how much score left to reach the next level.
With analytic functions, it would be done the following way:
Read the rest of this entry »
Analytic functions: optimizing SUM and ROW_NUMBER
Comments enabled. I *really* need your comment
In the previous articles I wrote about emulating numerous analytic function in MySQL.
Using methods described above, it's possible to emulate almost all analytic functions present in Oracle and SQL Server.
Here are these methods in a nutshell:
- Select all table rows ordered by
PARTITION BY
columns, then byORDER BY
columns of the analytic function - Track the grouing sets by using session variables initialized in the first subquery
- If the analytic function needs some precalculations to be evaluated (like, count of the rows in the grouping set, sum of the values etc), join the table with the precalculated aggregates
- Use state session variables to calculate the analytic function and store intermediate values between rows
- Initialize state session variables whenever the grouping set changes
This may sound confusing, but if you take a look on the examples from the previous articles, it will become clear as a bell.
This methods work and work well, if you need to select all rows from the tables.
But what if you need to implement some filtering? Do we really need to count millions of rows if we need first three? Do we really need to inspect all rows to find a maximum if we have an index?
Of course, no.
Analytic functions can be optimized as well as any other queries.
Read the rest of this entry »
Analytic functions: FIRST_VALUE, LAST_VALUE, LEAD, LAG
In the previous articles I wrote about emulating some of the analytic functions in MySQL.
Today, I'll write about four more userful functions: FIRST_VALUE
, LAST_VALUE
, LEAD
and LAG
.
These functions also do not have aggregate analogs.
FIRST VALUE(column)
returns the value of column from the first row of the grouping set.
LAST_VALUE(column)
returns the value of column from the last row of the grouping set.
This can be illustrated by the following query:
Read the rest of this entry »
Analytic functions: NTILE
Comments enabled. I *really* need your comment
In the previous article we dealt with analytic functions SUM
, AVG
and ROW_NUMBER()
.
Now we will try to emulate NTILE
.
NTILE(N)
is a special function that has no aggregate analog. It divides each grouping set of rows into N
subranges, based on ORDER BY
clause, and returns the subrange number for each row.
Read the rest of this entry »
Analytic functions: SUM, AVG, ROW_NUMBER
In one of the previous articles I wrote about emulating some of analytic functions in MySQL.
Now, I'd like to cover this question more extensively.
A quick reminder: an analytic function is a function that behaves like an aggregate function with one exception: aggregate function returns one last row for each aggregated set, while an analytic function returns intermediate results too.
An analytic function can be made out of almost all aggregate functions by adding keyword OVER
to them with two additional clauses: PARTITION BY
and ORDER BY
.
PARTITION BY
is analog of GROUP BY
. ORDER BY
defines order in which the intermediate rows will be evaluated.
The behaviour of analytic functions can probably be best illustrated with an example:
Read the rest of this entry »
Selecting friends
If you are building a Yet Another Great Social Network Service to beat MySpace, you'll certainly need to keep a list of friends there, so that Alice may communicate in private with Bob, and they both can show pictures to Chris, and Eve cannot eavesdrop on them and the rest of them can do all these kinds of things these people are supposed to do.
On most networks, friendship is an irreflexive symmetric binary relation:
- Symmetric means that if Alice is a friend of Bob, then Bob is a friend of Alice too.
- Irreflexive means that Alice is never a friend to herself.
As it's a many-to-many relation, we sure need a separate table for it.
But how will we keep it? Should we keep the relation in the table as is (i. e. two separate rows for Alice/Bob and Bob/Alice), or keep just one row and reconstruct the relation using the set operators?
Let's check.
Read the rest of this entry »
Advanced row sampling
Yesterday I wrote an article on how to emulate analytiс function ROW_NUMBER()
that is present in SQL Server and Oracle, but absent in MySQL.
Today, we will try to optimize this query.
Read the rest of this entry »