EXPLAIN EXTENDED

How to create fast database queries

Schema junk

with one comment

Many tasks in SQL world can be efficiently performed using stored procedures.

All database management systems of the Big Four (that is Oracle, SQL Server, MySQL and PostgreSQL) have a concept of a stored procedure: a little server-managed program written in a procedural language that allows running SQL statements and returning results.

They are widely recommended for use, since they allow easy separation of business rules from data storage rules, fine-grained security, integrity control and all other things software analysts mumble for their living.

People widely accept this theory and if you ask anyone should I use the stored procedures?, the answer you will get will be a unanimous yes.

The problem is that people rarely use them in fact in real applications. Everybody knows they should but nobody really does.

More than that: despite the fact that stored procedures in fact can be more efficient than plain queries (even for one-time tasks), people often prefer the latter. Writing a procedure to obtain a resultset seems to be an overkill from the point of view of most developers, they rather prefer doing it with a plain SQL query.

Why so?

Let me tell you another boring story. Don't worry, it won't take long.

When I was seventeen I built my first home LAN with one of my friends.

As you may suggest, we were not particularly wealthy that time, so that LAN didn't use fancy things like a server, router or a network switch.

Instead, we used a piece of 50 Ω thin Ethernet cable I found in my father's garage, two cheapest NE2000 clones, a pair of plugs and two self-made terminators. Lunacy of paying two dollars for a thing you can make yourself from a 5-cent resistor was of course avoided.

The LAN worked well for the most of winter, but on April 12th, 1998 a heavy snowfall hit central Russia and the cable broke under the weight of snow.

By that time two more friends wished to join the LAN.

As you may know, a thin Ethernet LAN works only when all computers are properly connected, and if but a single computer disconnects the whole net becomes unusable. It's not a problem when you have two computers but gets annoying when you cannot play Quake with one of your friends because the other one decided to install his new S3 Trio64V+ right this Sunday.

I decided to purchase UTP-5 and a cheapest 10-Mbit hub instead.

It was the first time I saw two things that amazed me: RJ45 connectors and nylon zip ties.

What was so special with them is that if you did something wrong, you didn't have a second chance to fix it. There is a quantum leap between the states and you better do everything right.

A BNC connector used by thin Ethernet should be soldered to the cable, and even if you suck at soldering you can try as many times as you want. You can keep soldering and desoldering the connector, removing excess tin and flux and finally you get something that doesn't immediately break (usually due to large quantities of electrical tape).

It's not the same with RJ-45: the plug is gone that very moment you squeeze the pliers, and you better place the wires right. You cannot undo your action.

Same with zip ties: a tie is gone when you pull its tail through the ratchet. If you left a wire out, well, too bad, you've just wasted a tie.

Ties and connectors of course are very cheap, but people just hate to waste things, no matter how cheap they are. When you try to light a fire with a match and it goes out, you feel frustration — much deeper frustration than if you would just have to strike the lighter one more time. It's like if you were an Ice Age hunter and that fire was your tribe's last hope and now they all are going to die and it's all because you suck at making fire.

That's why most people feel better when they use reusable things like velcro tie wraps.

However, it's not the wasting of priceless resources that make you feel better. A lighter can run out of gas just as well, and the solderable connector can be ruined with overheating (and soldering requires electricity of course).

There are two real reasons.

The first reason is that even if you waste the thing, you need put yet some more effort to make things as they were before.

Before you had crimped the connector onto the cable, you had a piece of stripped cable. After you had crimped the connector onto the cable, you have a piece of cable with a wasted connector on it connected the wrong way. It didn't get better, not even the same. It got worse. You have to reach for some other tool to cut if off and strip the cable again just to revert the situation.

Soldering a connector seems to be harder to do, but much easier to undo. At least you are already holding the damn iron, you don't need to put it down and reach for another tool.

The second reason is that resources like gas and electricity are uncountable, while connectors and cable ties are countable.

For obvious psychological reasons, it's more easy to waste uncountable resources, the ones that should be measured, not counted.

When you have twenty cigarettes and break one, you are left with nineteen cigarettes. But when you have a pocketful of tobacco and spill some, you still have a pocketful of tobacco.

It's interesting that having large quantities of countable items turns them into uncountables. System administrators that deal a lot with nylon ties do not treat them as many ties but rather as lots of consumables

Surprisingly, same rules apply to programming, despite the programming objects being not the real objects but rather the fluctuations of magnetic fields or electric potentials.

That's why:

  • Shell scripts feel easier to use than compiled programs
  • Lambda expressions feel easier to use than the function pointers
  • CTE's and subqueries feel easier to use than views, and, finally,
  • Queries feel easier to use than stored procedures

It's not that the former ones are easier to create. It's that the former ones are easier to get rid of.

Dropping a stored procedure, even a one-time stored procedure you won't ever need again, feels like a waste, something that you shouldn't do.

And it's just harder to do, it requires more work and it's not that kind of work you're doing automatically.

To delete your query, you can just hit Ctrl-A and Del, a thing that you probably do hundred times a day. You possess an automated skill in doing that.

To delete a stored procedure, you should either issue a command which starts with DROP (this alone rings a bell in the brain of most programmers), or point your mouse to the tree, right-click for the context menu to pop up, choose drop and give a right answer to the confirmation message box.

Not so hard in fact but hard enough to keep some users from ever creating it.

Second, a stored procedure is an object in the database which takes a row in the data dictionary and a file to keep in the version control. A query usually is a piece of code which is uncountable, as in much code, and a procedure is an object, as first-class object, as in many procedures.

And having 5 unused objects in the database doesn't feels quite the same as having a source file 100 lines longer.

5 unused objects make the database have 5 unused objects more than it should, which doesn't feel right and requires an effort to clean, a thing people generally avoid unconsciously even before cluttering the database.

100 unused lines of code make a heap of code a bigger heap of code, which is OK.

Now, what to do?

First, queries are really easier to use. And probably they will suffice for you. For some applications it's really OK not to have stored procedures in the database.

Second, Oracle and SQL Server have concept of anonymous stored procedures. You don't need to create an object in the database to perform some kinds of tasks and still be able to use procedural language statements.

Creating and removing an anonymous procedure is as easy as creating and removing a query: Ctrl-A, Del and no side effects.

However, there are cases when anonymous procedures cannot be used. Say, MySQL and PosgreSQL don't support them at all. Or maybe you need to join the resultset with a table (which requires creating a TVF in SQL Server or a pipelined function in Oracle).

Facilitating usage of stored procedures and functions is in fact simple. You should make creating and removing your stored procedures as easy as creating and removing code.

For one of my Delphi / Oracle projects, I made the following things:

  • Put all packages into a set of files with a certain extension for them to be in one place
  • Wrote a parser to create Object Pascal classes and methods from the package definitions
  • Made a custom build step that updated the packages on the test database during the build

Here you are: instead of alt-tabbing between Delphi and PL/SQL Developer and pixelhunting the object tree with your mouse to do something with a package, you just select and edit the text, i. e. you do that very thing that do when writing all other code.

You don't have to waste your brain resources on switching the tasks and rather concentrate on the tasks themselves. As easy as a velcro tie.

Other good thing is that all objects are in one place so you don't have to worry about dependencies. When you remove a stored procedure from code, it gets removed from the database on the next build just as well.

Oracle tracks package dependencies, so when you remove a procedure used by some other code, be it client-side or server-side, you get the error in either case.

While most developers agree that the updates to the database should be performed only using the stored procedures, there are two opinions on whether read-only queries should be allowed from the client applications or not.

Using packages and stored procedures for all data access, including SELECT queries, has these advantages:

  • You can completely remove the method from your database access layer that lets you run the random queries
  • You can revoke all permissions on tables from the users that are able to connect to the system, leaving permissions only on the packages

This will leave but a single entry point to your database. You can create more complex data access rules and perform auditing tasks.

However, this makes creating dynamic queries and custom repors a large pain, so it's not a best option if your application relies on them heavily.

Summary:

With a little effort of customizing your development framework, using stored procedures can be a fun. They can be made just as easy to add or delete as any function or method in your code, facilitating the development process and allowing you to squeeze more from your database system.

Written by Quassnoi

July 5th, 2009 at 11:00 pm

Posted in Miscellaneous

One Response to 'Schema junk'

Subscribe to comments with RSS

  1. I’ve written a few small- to medium-sized web-based, database-backed applications over the past several years. I have discovered (which I had long hypothesized) that using stored procedures makes development substantially easier and more efficient both in human and computer terms. My early workflow didn’t easily integrate stored procedures into the development, but once I committed myself to doing so, I’ll never go back.

    Thank you for your contributions!

    Gary Chambers

    12 Mar 10 at 21:12

Leave a Reply