SQL Server: finding date closest to a birthdate at EXPLAIN EXTENDED

The problem is with dynamically getting upcoming what we call namedays in Europe.

Name day is a tradition in many countries in Europe and Latin America of celebrating on a particular day of the year associated with the one's given name.

Here's a sampe name day table (I store the date in this format: 1900-08-22, but we really only need month and day):

First name Name day

Bob 1900-04-22

Bob 1900-09-04

Frank 1900-01-02

The trick is that there might be multiple name days for each name, and someones name day is always the first one found after ones birthday.

So if Bob was born on August 5th, his name day would fall on September 4th, but if we were born after Sep 4th, his name day would be on April 22nd.

We need to find when a given person celebrates his/her name day based on the first name and birthdate.

What I need is an SQL Server query that will be able to get me name days for the people in my database.

First name	Name day
Bob	1900-04-22
Bob	1900-09-04
Frank	1900-01-02

We actually need to make the calendar to wrap around: to build a condition that would select 1904, Jan 1st after 1904, Dec 31th.

We can do it using a COALESCE on the simple queries in OUTER APPLY clause.

Let's create sample tables:

Table creation details

CREATE SCHEMA [20090821_nameday]
CREATE TABLE [20090821_nameday].t_name (
        id INT NOT NULL PRIMARY KEY,
        first_name VARCHAR(20) NOT NULL UNIQUE
)
CREATE TABLE [20090821_nameday].t_user (
        id INT NOT NULL PRIMARY KEY,
        first_name VARCHAR(20) NOT NULL,
        birth_date DATETIME NOT NULL
)
CREATE TABLE [20090821_nameday].t_nameday (
        first_name VARCHAR(20) NOT NULL,
        nameday DATETIME NOT NULL,
        PRIMARY KEY(first_name, nameday),
        CHECK(YEAR(nameday) = 1904)
)
GO
BEGIN TRANSACTION
SELECT  RAND(20090821)
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 30
BEGIN
        WITH    rows AS 
                (
                SELECT  1 AS letter,
                        CAST(RAND() * 11 AS INTEGER) + 1 AS consonant,
                        CAST(RAND() * 5 AS INTEGER) + 1 AS vowel
                UNION ALL
                SELECT  2 AS letter,
                        CAST(RAND() * 11 AS INTEGER) + 1 AS consonant,
                        CAST(RAND() * 5 AS INTEGER) + 1 AS vowel
                UNION ALL
                SELECT  3 AS letter,
                        CAST(RAND() * 11 AS INTEGER) + 1 AS consonant,
                        CAST(RAND() * 5 AS INTEGER) + 1 AS vowel
                UNION ALL
                SELECT  4 AS letter,
                        CAST(RAND() * 11 AS INTEGER) + 1 AS consonant,
                        0 AS vowel
                )
        INSERT
        INTO    &#91;20090821_nameday&#93;.t_name
        SELECT  @cnt,
                (
                SELECT  CASE letter WHEN 1 THEN SUBSTRING('BCDGLMNPSTV', consonant, 1) ELSE SUBSTRING('bcdglmnpstv', consonant, 1) END +
                        CASE letter WHEN 4 THEN 'ius' ELSE SUBSTRING('aeiou', vowel, 1) END AS &#91;text()&#93;
                FROM    rows
                FOR XML PATH('')
                )
        SET @cnt = @cnt + 1
END
SET @cnt = 0
WHILE @cnt < 450
BEGIN
        INSERT
        INTO    &#91;20090821_nameday&#93;.t_nameday
        SELECT  first_name,
                CAST('1904-01-01' AS DATETIME) +
                CAST(CAST(366 AS FLOAT) / 15 * (@cnt / 30) AS INTEGER) +
                CAST(
                RAND() *
                (
                CAST(CAST(366 AS FLOAT) / 15 * ((@cnt / 30) + 1) AS INTEGER) - 
                CAST(CAST(366 AS FLOAT) / 15 * (@cnt / 30) AS INTEGER)
                ) AS INTEGER)
        FROM    &#91;20090821_nameday&#93;.t_name
        WHERE   id = @cnt % 30 + 1
        SET @cnt = @cnt + 1
END
SET @cnt = 0
WHILE @cnt < 30000
BEGIN
        INSERT
        INTO    &#91;20090821_nameday&#93;.t_user
        SELECT  @cnt, first_name,
                CAST('1920-01-01' AS DATETIME) + CAST(RAND() * 25000 AS INTEGER)
        FROM    &#91;20090821_nameday&#93;.t_name
        WHERE   id = @cnt % 30 + 1
        SET @cnt = @cnt + 1
END
COMMIT
&#91;/sourcecode&#93;
</div>

Here we have a set of <strong>30</strong> randomly-generated Latin-like names, <strong>15</strong> random name days for each of these names and <strong>30,000</strong> users:

<a href="#" onclick="xcollapse('X4639');return false;"><strong>See the list of funny names</strong></a>
<br />
<div id="X4639" style="display: none; ">

SELECT  *
FROM    [20090821_nameday].t_name

id	first_name
14	Bitedudius
28	Debabopius
20	Debaculius
25	Depunonius
19	Donumicius
2	Gemevubius
9	Genocotius
10	Gobesuvius
4	Gotapogius
24	Gunucopius
29	Latudebius
16	Legepimius
1	Levotepius
15	Lomipapius
30	Mebusapius
26	Mipomebius
21	Mudumebius
23	Mumapunius
11	Nuducepius
3	Pasavatius
17	Pelelalius
6	Pogibotius
18	Pusigidius
27	Sevonolius
5	Sugusedius
7	Tadecunius
13	Temupudius
8	Viculegius
22	Visetitius
12	Vopetepius

For each user, we have to select the first name day that comes after his birthday (his because it was more simple to make all these names male, no offence, gals).

We first need to find out his birthday as of 1904. To do this, we need to subtract the difference between his year of birth and 1904 using DATEADD.

Then, we select the first name day for his name that is more or equal to his birthday.

However, there can be no nameday if the user's birthday is close enough to the end of year. So if it is, we just need to select first name day of the year for the user's first name, and this will be the name day for this user. We will do it in a subquery which will be a second argument to COALESCE function. This subquery will be evaluated only if no name day will be found using the first method.

And here's the query:

SELECT  *
FROM    [20090821_nameday].t_user u
OUTER APPLY
        (
        SELECT  COALESCE(
                (
                SELECT  TOP 1 nameday
                FROM    [20090821_nameday].t_nameday nd
                WHERE   nd.first_name = u.first_name
                        AND nd.nameday >= DATEADD(year, 1904 - YEAR(u.birth_date), u.birth_date)
                ORDER BY
                        nd.nameday
                ),
                (
                SELECT  TOP 1 nameday
                FROM    [20090821_nameday].t_nameday nd
                WHERE   nd.first_name = u.first_name
                ORDER BY
                        nd.nameday
                )
                ) AS nameday
        ) dates

See the query results

id	first_name	birth_date	nameday
0	Levotepius	1968-11-12 00:00:00.000	1904-11-21 00:00:00.000
1	Gemevubius	1941-07-29 00:00:00.000	1904-08-30 00:00:00.000
2	Pasavatius	1928-03-01 00:00:00.000	1904-03-06 00:00:00.000

29997	Debabopius	1940-05-21 00:00:00.000	1904-05-28 00:00:00.000
29998	Latudebius	1931-07-26 00:00:00.000	1904-08-16 00:00:00.000
29999	Mebusapius	1936-06-17 00:00:00.000	1904-06-29 00:00:00.000
30000 rows fetched in 1.2157s (0.0030s)

Table 't_nameday'. Scan count 60000, logical reads 120383, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 't_user'. Scan count 1, logical reads 184, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 500 ms,  elapsed time = 1113 ms.

One more question: why did I use year 1904 to keep the name day dates instead of 1900 proposed by the author?

This is simple: 1900 was not a leap year, since it's divisible by 100 but not by 400. So the name day that falls on Feb 29th cannot be stored as a DATETIME with year 1900.

1904, on the other hand, was a leap year and had Feb 29th.

Written by Quassnoi

August 21st, 2009 at 11:00 pm

Posted in SQL Server

EXPLAIN EXTENDED

SQL Server: finding date closest to a birthdate

Leave a Reply

Subscribe

Subscribe by email

Contacts

Should I?

Recent articles

Calendar

Archives

Categories

Stack Overflow