Archive for 2009
Matching 3 of 4
Comments enabled. I *really* need your comment
From Stack Overflow:
Say I have a query like this:
SELECT * FROM my_table WHERE name = "john doe" AND phone = "8183321234" AND email = "johndoe@yahoo.com" AND address = "330 some lane"But say I only need 3 conditions out of the 4 to match.
I know I can write a very long query with several
OR
's, but I was wondering if there was a feature for this?
A nice and elegant solution proposed by Tomalak says:
SELECT * FROM my_table WHERE CASE WHEN name = "john doe" THEN 1 ELSE 0 END + CASE WHEN phone = "8183321234" THEN 1 ELSE 0 END + CASE WHEN email = "johndoe@yahoo.com" THEN 1 ELSE 0 END + CASE WHEN address = "330 some lane" THEN 1 ELSE 0 END >= 3
, but, as noted by the author, it does not use the indexes.
Since the values seem to be quite selective, it will be better to use indexes on them.
Let's create sample tables:
Read the rest of this entry »
Missing entries
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a case where I want to choose any database entry that have an invalid
Country
,Region
, orArea
ID
.By invalid, I mean an
ID
for a country or region or area that no longer exists in my tables.I have four tables:
Properties
,Countries
,Regions
,Areas
.I was thinking to do it like this:
SELECT * FROM Properties WHERE CountryID NOT IN ( SELECT CountryID FROM Countries ) OR RegionID NOT IN ( SELECT RegionID FROM Regions ) OR AreaID NOT IN ( SELECT AreaID FROM Areas )Now, is my query right?
We have two options here:
- Use a
LEFT OUTER JOIN
with the three dimension tables (Countries
,Regions
andAreas
), filtering out non-NULL
values - Use a
NOT IN
/NOT EXISTS
approach shown above
LEFT JOIN
intuitively seems to be more elegant and more efficient than subqueries. But is it really?
Read the rest of this entry »
Keeping rows
It's very simple to delete, say, 10 top rows in MySQL
:
DELETE FROM t_deleter ORDER BY id LIMIT 10
What if we want to delete all rows except the first 10?
Read the rest of this entry »
Emulating FULL OUTER JOIN in MySQL
Comments enabled. I *really* need your comment
It is widely known that MySQL lacks support for FULL OUTER JOIN
.
A common solution to work around this is using a UNION ALL
to union two result sets from a LEFT JOIN
and a RIGHT JOIN
of two tables, adding join_column IS NULL
condition to the latter.
Though it works, it's quite inefficient on large tables when used with ORDER BY … LIMIT
queries, as it uses a filesort
.
Let's create two sample tables and OUTER JOIN
them together.
The tables are quite simple: each of them contains a million of multiples of 13 and 17, respectively, thus making each of LEFT JOIN
, RIGHT JOIN
and INNER JOIN
of these tables non-empty:
Read the rest of this entry »
GROUP_CONCAT in Oracle 10g
MySQL has a nice aggregate function called GROUP_CONCAT
, which concatenates all strings in a group in given order, separating them with a given separator.
In one of the previous articles, I wrote about emulating this function in PostgreSQL.
Now, we'll try to emulate this function in Oracle 10g.
There are numerous solutions using Oracle's hierarchical function SYS_CONNECT_BY_PATH
, but they're bad in the following ways:
- Not all queries can be rewritten using
CONNECT BY
- Even if they can,
CONNECT BY
performs poorly on some conditions
Instead, we will emulate this function using Oracle's MODEL
clause.
Read the rest of this entry »
Banning IP’s
Many webmasters and network service providers keep a database of IP addresses that are not allowed to access their resources.
Such a database can be efficiently queried using MySQL
's spatial abilities.
Let's create and fill the sample tables:
Read the rest of this entry »
Endless loops
Comments enabled. I *really* need your comment
Imagine that we are developing an email server that keeps the outgoing messages in a database.
The server is always busy sending the messages if the outgoing queue is not empty. It may have up to 10 connections at once.
If the message is failed to be sent, it's reinserted to the end of the queue until the retry count is exceeded.
As soon as all messages from a batch succeed or fail to be sent, next 10 messages from the queue are fetched.
How do we write a query to fetch a batch of messages from the queue?
This all can be done in a single query run in a loop. No range checking, no reinserting.
Read the rest of this entry »
Emulating SKIP SCAN
Comments enabled. I *really* need your comment
In the previous article I wrote about ranged conditions (<
, >
, <=
, >=
or BETWEEN
) on one field combined with ORDER BY
another field.
We have to choose an index either for filtering or for sorting, as a single index cannot be used for both purposes.
Well, actually, it can.
There is a special access method called SKIP SCAN
, designed just for these cases. This method can be used when a filtering condition applied to a secondary column of a composite index. Like, we have and index on (orderer, ranger)
and search for ranger < 10
.
This method scans the index bottom to top, selecting each possible orderer
values. For each orderer
, it applies the filtering condition on ranger
, thus obtaining a valid range
condition on the index that can be iterated. As soon and this range is iterated, the method defines the range for the next orderer
, skipping all other values (hence the name).
SKIP SCAN
does not perform as well as a simple index scan, but a query can benefit from it is there are few orderer
's in the table.
Oracle supports this method directly, but MySQL
, of course, not. SKIP SCAN
, though, can be easily emulated.
Read the rest of this entry »
Choosing index
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a table with roughly 100,000 blog postings, linked to a table with 50 feeds via an
1:n
relationship.When I query both tables with a
SELECT
statement, ordered by a datetime field of the postings table, MySQL always usesfilesort
, resulting in very slow query times (more than 1 second).SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15
This is a nice question that illustrates how to choose which indexes to create.
At first blush it seems to be a job for a composite index on (is_active, user_offtopic_count, post_date)
. Really, we could use the first two fields to filter on and the third field to order by.
But the problem here is that the query uses a range
condition on postings.user_offtopic_count
. In this case, the index cannot be used for ordering.
Read the rest of this entry »
Matching whole sets
Comments enabled. I *really* need your comment
From Stack Overflow:
Given the following:
declare @a table ( pkid int, value int ) declare @b table ( otherID int, value int ) insert into @a values (1, 1000) insert into @a values (1, 1001) insert into @a values (2, 1000) insert into @a values (2, 1001) insert into @a values (2, 1002) insert into @b values (-1, 1000) insert into @b values (-1, 1001) insert into @b values (-1, 1002), how do I query for all the values in
@a
that completely match up with@b
?
{@a.pkid = 1, @b.otherID = -1}
would not be returned (only 2 of 3 values match)
{@a.pkid = 2, @b.otherID = -1}
would be returned (3 of 3 values match)
We need to compare each set of values from @a
to each set of @b
and return the id
's of non-matching sets.
Two sets are considered matching when for each row from the first set there is an equal row from the second set and vise versa. That is, both the values and the count of the rows should match in both sets.
First thing that comes to mind is to calculate COUNT(*)
's of both sets, then calculate the COUNT(*)
for the result of both sets joined on value
and make sure all three are equal.
But we don't actually need to COUNT(*)
all values. We just need to find out the first row from either of the subsets that has no match in another subset. As soon as we find such a row, we can tell that the sets don't match; if we don't find one, then the sets do match.
Read the rest of this entry »