EXPLAIN EXTENDED

How to create fast database queries

Archive for February 26th, 2010

Efficient circle distance testing

Comments enabled. I *really* need your comment

Answering questions asked on the site.

eptil asks:

I am using SQL Server 2008, but not the spatial features.

I have a table with few entries, only 40,000. There is an id INT PRIMARY KEY column and two columns storing a 2d coordinate, both decimals.

I would like to find all the records that do not have other records within a given radius. The query I am using at the moment is:

SELECT  id, x, y
FROM    mytable t1
WHERE   (
        SELECT  COUNT(*)
        FROM    mytable t2
        WHERE   ABS(t1.x - t2.x) < 25
                AND ABS(t1.y - t2.y) < 25
        ) = 1


This is taking 15 minutes to run at times.

Is there a better way?

Of course using spatial abilities would be a better way, but it is possible to make do with plain SQL. This will also work in SQL Server 2005.

In most database engines, the spatial indexes are implemented as the R-Tree structures. SQL Server, however, uses another approach: surface tesselation.

Basically, it divides the surface into a finite number of tiles, each assigned with a unique number. The identifiers of tiles covered by the object are stored as keys of a plain B-Tree index.

When SQL Server's optimizer sees a geometrical predicate against an indexed column, it calculates the numbers of tiles that possibly can satisfy this predicate. Say, if the tiles are defined as squares with side 1, the predicate column.STDistance(@mypoint) < 2 can only be satisfied by the objects within 2 tiles away from @mypoint's tile. This gives a square of 25 tiles with @mypoint's tile in the center. The tile numbers can be found and searched for using the index. Exact filtering condition is then applied to each candidate value returned by the index.

Same solution can be used in our case even without the spatial functions. Comparing tile numbers is an equijoin and hash join method is eligible for this operation. We can even choose the tiling algorithm individually for each query, since we don't have to store the tile identifiers in the table, and the hash table will be built dynamically anyway.

Let's create a sample table and see how it works:
Read the rest of this entry »

Written by Quassnoi

February 26th, 2010 at 11:00 pm

Posted in SQL Server