Archive for April, 2009
Counting missing rows: using DISTINCT
Comments enabled. I *really* need your comment
This is the 2nd of 5 articles covering implementation of NOT IN
predicate in several RDBMS'es:
NOT IN
in MySQLNOT IN
in MySQL (usingDISTINCT
)NOT IN
in SQL ServerNOT IN
in OracleNOT IN
in PostgreSQL
Today, I will tell about some useful tricks that help to optimize these subqueries even more.
Let's create some sample tables:
Read the rest of this entry »
Counting missing rows
Comments enabled. I *really* need your comment
From Stack Overflow:
In my first programming job I was told that a query within the
IN()
predicate gets executed for every row contained in the parent query, and therefore usingIN
should be avoided.For example, given the query:
SELECT COUNT(*) FROM Table1 WHERE Table1Id NOT IN ( SELECT Table1Id FROM Table2 WHERE id_user = 1 )
Rows Inner subquery executions 10 10 100 100 1000 1000 10000 10000 Is this correct? How does the
IN
predicate actually work?
This is the 1st of 5 articles covering implementation of NOT IN
predicate in several RDBMS'es:
NOT IN
in MySQLNOT IN
in MySQL (usingDISTINCT
)NOT IN
in SQL ServerNOT IN
in OracleNOT IN
in PostgreSQL
In this article I will describe how MySQL implements this predicate.
Let's create the sample tables:
Read the rest of this entry »
Matching 3 of 4
Comments enabled. I *really* need your comment
From Stack Overflow:
Say I have a query like this:
SELECT * FROM my_table WHERE name = "john doe" AND phone = "8183321234" AND email = "johndoe@yahoo.com" AND address = "330 some lane"But say I only need 3 conditions out of the 4 to match.
I know I can write a very long query with several
OR
's, but I was wondering if there was a feature for this?
A nice and elegant solution proposed by Tomalak says:
SELECT * FROM my_table WHERE CASE WHEN name = "john doe" THEN 1 ELSE 0 END + CASE WHEN phone = "8183321234" THEN 1 ELSE 0 END + CASE WHEN email = "johndoe@yahoo.com" THEN 1 ELSE 0 END + CASE WHEN address = "330 some lane" THEN 1 ELSE 0 END >= 3
, but, as noted by the author, it does not use the indexes.
Since the values seem to be quite selective, it will be better to use indexes on them.
Let's create sample tables:
Read the rest of this entry »
Missing entries
Comments enabled. I *really* need your comment
From Stack Overflow:
I have a case where I want to choose any database entry that have an invalid
Country
,Region
, orArea
ID
.By invalid, I mean an
ID
for a country or region or area that no longer exists in my tables.I have four tables:
Properties
,Countries
,Regions
,Areas
.I was thinking to do it like this:
SELECT * FROM Properties WHERE CountryID NOT IN ( SELECT CountryID FROM Countries ) OR RegionID NOT IN ( SELECT RegionID FROM Regions ) OR AreaID NOT IN ( SELECT AreaID FROM Areas )Now, is my query right?
We have two options here:
- Use a
LEFT OUTER JOIN
with the three dimension tables (Countries
,Regions
andAreas
), filtering out non-NULL
values - Use a
NOT IN
/NOT EXISTS
approach shown above
LEFT JOIN
intuitively seems to be more elegant and more efficient than subqueries. But is it really?
Read the rest of this entry »
Keeping rows
It's very simple to delete, say, 10 top rows in MySQL
:
DELETE FROM t_deleter ORDER BY id LIMIT 10
What if we want to delete all rows except the first 10?
Read the rest of this entry »
Emulating FULL OUTER JOIN in MySQL
Comments enabled. I *really* need your comment
It is widely known that MySQL lacks support for FULL OUTER JOIN
.
A common solution to work around this is using a UNION ALL
to union two result sets from a LEFT JOIN
and a RIGHT JOIN
of two tables, adding join_column IS NULL
condition to the latter.
Though it works, it's quite inefficient on large tables when used with ORDER BY … LIMIT
queries, as it uses a filesort
.
Let's create two sample tables and OUTER JOIN
them together.
The tables are quite simple: each of them contains a million of multiples of 13 and 17, respectively, thus making each of LEFT JOIN
, RIGHT JOIN
and INNER JOIN
of these tables non-empty:
Read the rest of this entry »
GROUP_CONCAT in Oracle 10g
MySQL has a nice aggregate function called GROUP_CONCAT
, which concatenates all strings in a group in given order, separating them with a given separator.
In one of the previous articles, I wrote about emulating this function in PostgreSQL.
Now, we'll try to emulate this function in Oracle 10g.
There are numerous solutions using Oracle's hierarchical function SYS_CONNECT_BY_PATH
, but they're bad in the following ways:
- Not all queries can be rewritten using
CONNECT BY
- Even if they can,
CONNECT BY
performs poorly on some conditions
Instead, we will emulate this function using Oracle's MODEL
clause.
Read the rest of this entry »
Banning IP’s
Many webmasters and network service providers keep a database of IP addresses that are not allowed to access their resources.
Such a database can be efficiently queried using MySQL
's spatial abilities.
Let's create and fill the sample tables:
Read the rest of this entry »
Endless loops
Comments enabled. I *really* need your comment
Imagine that we are developing an email server that keeps the outgoing messages in a database.
The server is always busy sending the messages if the outgoing queue is not empty. It may have up to 10 connections at once.
If the message is failed to be sent, it's reinserted to the end of the queue until the retry count is exceeded.
As soon as all messages from a batch succeed or fail to be sent, next 10 messages from the queue are fetched.
How do we write a query to fetch a batch of messages from the queue?
This all can be done in a single query run in a loop. No range checking, no reinserting.
Read the rest of this entry »
Emulating SKIP SCAN
Comments enabled. I *really* need your comment
In the previous article I wrote about ranged conditions (<
, >
, <=
, >=
or BETWEEN
) on one field combined with ORDER BY
another field.
We have to choose an index either for filtering or for sorting, as a single index cannot be used for both purposes.
Well, actually, it can.
There is a special access method called SKIP SCAN
, designed just for these cases. This method can be used when a filtering condition applied to a secondary column of a composite index. Like, we have and index on (orderer, ranger)
and search for ranger < 10
.
This method scans the index bottom to top, selecting each possible orderer
values. For each orderer
, it applies the filtering condition on ranger
, thus obtaining a valid range
condition on the index that can be iterated. As soon and this range is iterated, the method defines the range for the next orderer
, skipping all other values (hence the name).
SKIP SCAN
does not perform as well as a simple index scan, but a query can benefit from it is there are few orderer
's in the table.
Oracle supports this method directly, but MySQL
, of course, not. SKIP SCAN
, though, can be easily emulated.
Read the rest of this entry »