Archive for May 7th, 2010
Things SQL needs: MERGE JOIN that would seek
One of the most known and least used join algorithms in SQL engines is MERGE JOIN
.
This algorithm operates on two sorted recordsets, keeping two pointers that chase each other.
The Wikipedia entry above describes it quite well in terms of algorithms. I'll just make an animated GIF to make it more clear:
This is quite a nice and elegant algorithm, which, unfortunately, has two major drawbacks:
- It needs the recordsets to be sorted
- Even with the recordsets sorted, it is no better than a
HASH JOIN
The sorting part is essential for this algorithm and there is nothing that can be done with it: the recordsets should be sorted, period. Databases, however, often provide the records in the sorted order: from clustered tables, indexes, previously sorted and ordered subqueries, spool tables etc.
But even when the recordsets are already sorted, on equijoins the MERGE JOIN
is hardly faster than a HASH JOIN
.