How to create fast database queries

Shared Plan and Algorithm Network Cache (SPANC)

with 5 comments

Due to the nature of my work I have to deal with various database systems.

While SQL is more or less standardized, the optimizers are implemented differently in the different systems. Some systems cannot join tables with anything other than nested loops, the other can only GROUP BY using a sort, etc.

So when you write a join in, say, MySQL, you cannot expect it to be a sort merge join (and you should consider this fact when designing the query). Or, when you write a DISTINCT in SQL Server, you can't expect a loose index scan. These are limitations put by their optimizers.

However, in the last three months I noticed a great improvement in the queries where I could not expect any.

It started when I tried to debug this in SQL Server:

FROM    orderItem

while yielded this plan:

Similar results were obtained on Oracle:

Plan hash value: 1345318323

| Id  | Operation                                                         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT                                                  |             |       |   200 |     2  (50)| 00:00:01 |
|   1 |  REMOTE SPANC QUERY (SQLSERVER, MYSQL, POSTGRESQL, STACKOVERFLOW) |             |       |   200 |     2  (50)| 00:00:01 |

, MySQL:

| id | select_type | table     | type  | possible_keys | key     | key_len | ref  | rows    | Extra                                               |
|  1 | SIMPLE      | orderItem | spanc | NULL          | ALL     | NULL    | NULL |         | Using Oracle, PostgreSQL, SQL Server, StackOverflow |

and PostgreSQL:

Seq Scan on OrderItem  (cost=0.00..6.44 width=4)
 -> Remote Scan on SPANC (Oracle, MySQL, SQL Server, StackOverflow)   (cost=0.00..100.00 width=4)

Network analysis has shown weird encrypted activity between the servers in my internal network which host SQL Server, Oracle, PostgreSQL and MySQL servers.

Ultimately, there was unencrypted activity outside of the internal network which turned out to be an HTTP POST request followed by several GET polls to http://stackoverflow.com/questions/5518080/distinct-optimization.

It seems that the developers of major database systems agreed to share the knowledge about the most efficient query plans in some kind of a distributed storage (which probably is called SPANC as we can see in the query plans) and provide an interface to access each other's systems.

It also seems that these systems treat Stack Overflow as an external optimization engine where the most experienced developers can build their plans for them in a most efficient way.

I would be glad to have further clarification from the companies staff.

This also begs a question: how many of regular Stack Overflow participants are in fact query engines disguised as curious fellow developers?

It would be definitely nice to know.

Written by Quassnoi

April 1st, 2011 at 11:00 pm

Posted in Miscellaneous

5 Responses to 'Shared Plan and Algorithm Network Cache (SPANC)'

Subscribe to comments with RSS

  1. Liked


    2 Apr 11 at 01:01

  2. @Pentium10: I’ve seen your posts on Stack Overflow several times, and this fact, along with your nickname, may evoke some thoughts!


    2 Apr 11 at 01:07

  3. Ah, April 1st is indeed a special day in the tech world.


    27 Apr 11 at 11:18

  4. Alex, I did not see your posts for long time. I really misted it.

    What this is an April joke? :)

    Hope that everything OK


    leonard strashnoy

    29 Jun 11 at 03:24

  5. @leonard: yes, everything’s fine, just been busy as a beaver, thanks!

    No, this is not an April joke, this is serious. Just try it!


    29 Jun 11 at 12:50

Leave a Reply