EXPLAIN EXTENDED

How to create fast database queries

Click

with 2 comments

Back in the end of 90's I had been employed with a small startup which was founded to develop a restaurant management system.

The job listing they posted said PostgreSQL wanted (sic). And PostgreSQL for sure was something I knew about. You see, I already had experience with PostgreSQL by that time. I was able to install PostgreSQL, create a database, run a query against it and even create a trigger in C (I didn't yet know of pl/pgSQL).

So I applied for the job, got hired and started coding.

Coding what you may ask?

Well, I had no idea. And in fact, nobody did. There were only two requirements for the system.

First, it had to work somehow with a Casio Cassiopeia, which allowed a wireless network module to be installed and used. A waiter had to be able to take your order and it had to get to the kitchen instantly.

Second one was personal. The founders of the company (and me too) used to dine in a small cafe at the basement of the office building. They had a menu, but it didn't correlate with reality. You literally couldn't get anything that was in the menu. They were constantly out of meat, out of tomatoes, out of potatoes (potatoes!). Instead they put a small hand-written piece of paper into the menu which listed everything the had at the time (a soup and a something else), that's if they were in good mood. If they were not, you just placed an order, and a waiter came back in 10 minutes to tell you it was beyond their ability to get some tomatoes and cucumbers to chop you some salad, try ordering something else, thank you.

In our system, that was never ever going to happen. Never. If the kitchen was short of something they needed to cook you a dish you ordered, the system would warn the waiter instantly, and she would tell you to order something else right this minute, that's instead of going to the kitchen and back.

It had to be a killer feature.

And more than that. You could program some replacements into the recipe. For instance, red onions can be used instead of white ones to make a salad, so even if the kitchen was out of white onions, you could still get your salad, but not pilaf (for which red onions are just not good).

Of course, there also had to be some inventory control, reports and a front office, but just to make these two features work.

With this in mind, the development started. And I think I don't remember any other task more fascinating than that.

I had to make PostgreSQL client library to compile for Windows CE. To make the handheld to reconnect whenever it loses network. To create a best replacement algorithm for complex dishes (did you know it's NP-complete?). To make a thermal receipt printer work using a multiport MOXA RS-232 serial adapter under Linux.

These all were interesting, fascinating tasks.

And the best part of all of these tasks was that everything seemed so… logical. It fitted. It made a click when got into the place.

A client library used some console I/O functions (like printf) that were not even declared in Windows CE SDK. It didn't have <stdio.h>, since it didn't have a console to write to. You had to #define everything to be a nop. It's a routine, but as soon as the code compiles, it just works. You hear a click, an audible click. You issue the query to a Linux box, get a result back and display it on the handheld. You may even do it while standing in the hallway.

A printer uses a poorly documented protocol and the MOXA's driver is a pain to install. But once installed, it appears as a /dev/ttyS0. You can do ps ax > /dev/ttyS0. Click. You get a narrow sheet of thermal paper with the list of running programs printed on it.

You then put some hex characters to a file using hexedit, then send the file back to the terminal using ps ax | cat - cut.txt > /dev/ttyS0. Click. You get a list of your running programs on the paper and the paper is cut with the printer's cutter.

Everything fits. Everything works and you know why.

Well, almost everything. Everything, but one thing.

SQL.

I just didn't get it. Yes, I knew that you had to make a foreign key so that you just put the id's of things instead of things themselves. But I just could not make it work right.

I had a book on SQL (don't remember author or title), and I had to look into this book every time I had to make a query. I usually managed to make it work after an hour or two, and it even returned correct results. Well, almost correct. Some rows were duplicated, there were some extra fields, but it was allright, I could always process the duplicates and extra fields in the code.

It made no click, though. It's as if you hammered a screw or drilled a hole in a piece of IKEA furniture. It's strong and sturdy, but you cannot fight the feeling you do something wrong.

I wondered what was good in that SQL. Sure, you can put all your data into a table, which is better than using a plain file that is hard to parse. You can even use a WHERE clause to filter some data, which is also nice. INSERT and DELETE — fine, no messing with file offsets.

But what was the point of using multiple tables in the FROM clause had always been an enigma to me. FROM table1, table2 ? It's like opening two files at once. No sane person does it. No OS has a function to do it. The book said WHERE is for filtering, and what's the point of WHERE it this case? I don't want to filter anything, just grab all records but with id's substituted with actual values. The book said it's possible, why don't I get it?

I almost wanted to switch to plain files, but I just did have to find out how to beat the damned thing. And the job listing said PostgreSQL, after all.

One evening I had been fighting for against some query for two hours and was almost banging my head against the wall. Tables. Filters. Structured query language, my ass. Could I please just get my data back and process it in a loop?

At some point I issued a query like this:

SELECT  *
FROM    t_ingredient, t_price

There were exactly 2 records in each of the tables. And the id's turned out to be 1 and 2 in the first table, and 3 and 4 in the second one.

And here's what I got:

ingedient_id ingedient_name price_id price_ingredient price_value
1 Tomatoes 3 1 10.00
2 Cucumbers 3 1 10.00
1 Tomatoes 4 2 15.00
2 Cucumbers 4 2 15.00

This is just pointless, I thought. There are two rows in each of the tables, why does it return four? Why are all the records mixed and twisted together? And what is that cartesian product warning I get?

And finally it got to me that cartesian is adjective derived from Descartes. And the set theory they taught me in the college popped into my head.

And I saw when I issue SELECT * FROM table1, table2, PostgreSQL kindly calculates this cartesian product for me. And everything I need to do is apply a filter to the product which will select the matching rows from the product. Those rows that have the id's matching. It's a filter. So logical.

And you know, you can actually apply any filter, not just equality. You can select, well, everything. Just everything, right there, with a single statement.

And this was a doggone loud 140 dB click that makes your ears bleed.

It's a very strong emotion, hearing this click. Very. I don't know the name for it, but you probably know what I'm talking about. And if I were Quentin Tarantino, I wouldn't inject Uma Thurman with adrenaline, I would rather make her learn SQL.

I could hardly sleep that night. How smart a tool it was, that SQL! All nice, and set based, and you could do everything, everything with it, and this is just with three simple clauses.

And they even wrote a book. A book on what, the three clauses? SELECT, FROM and WHERE? Any idiot can work them out, no book is needed for that. This is just elemenary school arithmetics, so simple. You join, filter and choose the columns.

The next morning I came to work and started coding. I could hardly believe it was the same room: everything looked so different. I couldn't wait to complete a report I had started the previous day: I already knew it how to do it using pure SQL. I was actually sorry I couldn't type faster. And here it is: all nice, and no missing rows, and all columns in place, and everything on the server side.

I checked the report code in and the next report I had to write was something about daily receipts. It involved calculating some cumulative sums: for each day I had to display cash received during the day and total cash received so far, which was the sum of the previous rows.

Imagine what? I couldn't do it with SQL. There was no such filter and no such column permutation. You can select a first row or a second row but you cannot compare them or add them together. What kind of a moron invented a language that doesn't allow you make such a simple thing? You cannot even compare the sets to each other, because if you put the table name into the FROM clause twice, it raises an error.

I don't remember how exactly I realized there's a possibility to use aliases to make self joins, but I did. Click. Now this SQL thing seemed reasonable again.

You remember the killer features of the system? Handhelds and instant warnings they were.

But our first customer was a fast food chain: they didn't need handhelds since they took the orders at the counter. We had to rewrite the whole thing to work on desktop computers using a touch screen.

And this ingredient replacement thing was of course implemented too (using SQL of course) and it did work.

Oh, yes, it did.

When the first customer came in (literally first, number one), he ordered a cake and a can of Cola.

It was OK with the cake, but there was no Cola. Nobody bothered to enter the Cola they bought into the system. The fridge was full of cans, the shelves were full of cans, but the stored procedure said clearly: 0 cans in stock. No product, no sale; no sale, no bill; no bill, no money.

They couldn't even open the money box: the system opened it automatically when you placed an order. And you couldn't: there was no Cola in stock.

I told the people at the counter to write the orders down on paper and use the calculators, and run two simple SQL statements. First one: each and every dish is made of exactly one ml of plain water. Second one: there are 1,000,000 liters of water in stock.

It fixed the situation, but the next day SQL procedures (stress tested under heavy load of 100 random orders) started to leak. It took a minute for the stored procedure to complete and the money box to open (with people waiting in a line). And there was nothing to be done about it: it was just the way SQL works. Damn stupid thing.

But who need these replacements to be calculated anyway? We can just collect the data and make reports later. It doesn't have to be online, and SQL allows the data to be collected without processing merely by disabling the triggers. Click. What a nice tool!

Now, the reports are slow and a menu takes 18 hours to print. Stupid thing. Wait. Don't hammer everything into one query, use the procedures. Click. Nice.

PostgreSQL just can't handle all the data. Stupid. Use Oracle. Click. Brilliant!

Oracle's even more laggy. Damn. Collect statistics. Click. Awesome!

And what really amazes me about SQL is that despite 10 years that passed since the time I learned of the WHERE conditions, I still hear these clicks when I don't expect to hear them.

It's not the clicks themselves that amaze me. It's that they're always unexpected.

With C, or C#, or PHP, or whatever else, the problems are or course interesting, but they are solvable.

Maybe I spend too much time with SQL (or too little time with other technologies), but whenever I need to write a DLL, or an assembly, or a website, I know that I'll meet some interesting problems on my way, and spend some time on research, and these problems will be solved this way or another, and I know it in advance.

It's just a matter of time and effort to get to the right thing.

But it's not the same with SQL.

People always come up with the problems I've never even thought would exist.

And no, it cannot be solved using SQL, no way, sorry.

Well, maybe, only if we… wait…

Click.

Written by Quassnoi

June 21st, 2009 at 11:00 pm

Posted in Miscellaneous

2 Responses to 'Click'

Subscribe to comments with RSS

  1. Wow, you made me a little teary about SQL. Thanks for the story.

    David

    20 Jul 18 at 02:06

  2. Great post.

    Royvia

    13 Sep 19 at 18:02

Leave a Reply