EXPLAIN EXTENDED

How to create fast database queries

Sargability of monotonic functions: example

with 2 comments

In my previous article I presented a proposal to add sargability of monotonic functions into the SQL engines.

In a nutshell: a monotonic function is a function that preserves the order of the argument so that it gives the larger results for the larger values of the argument. It is easy to prove that a B-tree with each key replaced by the result of the function will remain the valid B-Tree and hence can be used to search for ranges of function results just like it is used to search for ranges of values.

With a little effort, a B-Tree can also be used to search for the ranges of piecewise monotonic functions: those whose domain can be split into a number of continuous pieces with the function being monotonic within each piece (but it may be not monotonic and even not continuous across the pieces).

In this article, I'll demonstrate the algorithm to do that (implemented in pure SQL on PostgreSQL).

I will show how the performance of simple query

SELECT  *
FROM    t_sine
WHERE   SIN(value) BETWEEN 0.1234 AND 0.1235

could be improved if the sargability of monotonic functions had been implemented in the optimizer.

To do this, I will create a sample table:

Table creation details

This table contains 1,000,000 records with value randomly distributed from 0 to 101.

To select the records we need, we can use a very simple and straightforward query:

SELECT  *
FROM    t_sine
WHERE   SIN(value) BETWEEN 0.4452 AND 0.4453

View query details

which returns 31 records in 3.25 seconds.

The query uses a full table scan with the filter applied to each record.

Let's try to improve it.

Function description

According to the notation I proposed in the previous article, the monotony of the function SIN() should be described as this:

SIN(arg FLOAT) MONOTONIC PIECEWISE
DEFINED BY FLOOR(arg / PI() + 0.5)
CASE PIECE % 2
WHEN 0 THEN DECREASING INVERSE PIECE * PI() + ASIN(RESULT)
ELSE INCREASING INVERSE PIECE * PI() - ASIN(RESULT)
END

This means that:

  1. The function is piecewise monotonic,

  2. The pieces are defined by the function FLOOR(arg / PI() + 0.5) (which essentially returns the number of the half-wave the argument belongs too),

  3. The function monotony varies depending on the piece,

  4. On odd pieces, the function increases,

  5. On even pieces, the function decreases.

  6. A single inverse expression is provided for each monotony

Note that mathematically the function is strictly monotonic on each of its pieces. However, due to the rounding errors, different arguments can yield same function results, so the function value may map back to a range of the arguments rather than a single value.

In theory, it is possible to write a single expression which would map the function's result to the pair of values defining the beginning and the end of such a range. However, the expression would be quite complex. So for illustration purposes I'll make do with a single inverse function that yields an approximation of the back mapping. To find the exact range, some extra effort will be required.

Building the pieces

The function is piecewise monotonic and the pieces are defined by a function. For the pieces to be continuous, the function that defines them should be itself monotonic over all its domain.

The function that defines the pieces is FLOOR(arg / PI() + 0.5).

It is a superposition of the three functions:

  • OPERATOR_DIVISION(arg1 FLOAT, arg2 FLOAT)
    MONOTONIC OVER (arg1)
    CASE WHEN arg2 > 0 THEN INCREASING INVERSE RESULT * arg2
    WHEN arg2 = 0 THEN UNDEFINED
    WHEN arg2 < 0 THEN DECREASING INVERSE RESULT * arg2 END

  • OPERATOR_PLUS(arg1 FLOAT, arg2 FLOAT) MONOTONIC
    OVER (arg1) STRICTLY INCREASING INVERSE RESULT - arg2,
    OVER (arg2) STRICTLY INCREASING INVERSE RESULT - arg1

  • FLOOR(arg FLOAT) MONOTONIC INCREASING
    INVERSE
    FROM RESULT EXACT
    TO RESULT + 1 EXACT EXCLUDE

which are, given the values of the constants provided in the secondary arguments, are increasing over the argument. As we know from math, a superposition of monotonic functions is also monotonic.

Each function is defined with a single inverse condition which maps the result of the function back to a value near the range of the arguments yielding the result. The exact range has to be sought for using index seek over (hopefully) not too many records.

Sequentially applying the inverse expressions of each of the constituent functions to the result of the piece defining function, we get the following inverse expression for the latter:

  1. FLOOR(OPERATOR_PLUS(OPERATOR_DIVISION(arg, PI()), 0.5)) = PIECE
  2. OPERATOR_PLUS(OPERATOR_DIVISION(arg, PI()), 0.5) ∈ [ PIECE, PIECE + 1 )
  3. OPERATOR_DIVISION(arg, PI()) ∈ [ ≈(PIECE - 0.5), ≈((PIECE + 1) - 0.5) ]
  4. arg ∈ [ ≈((PIECE - 0.5) * PI()), ≈(((PIECE + 1) - 0.5) * PI()) ]

For each piece, we how have a pair of values approximately defining the range of values belonging to the piece.

To find out the exact bounds, we need to do the following:

  1. Calculate the piece for the minimal value

  2. Find the approximate upper bound for the piece.

  3. Scanning the keys to the left, find the rightmost key to the left of the upper bound that belongs to the current (or previous) piece.

  4. Scanning the keys to the right, find the first key of the next piece.

  5. Scanning a single key to the left, find the last key of the current piece.

  6. Recursively repeat steps 1 to 5, taking the first value the next piece calculated on step 4 as a seed for the step 1, until step 4 fails (which means that the pieces are over).

This procedure guarantees that we always get the correct bounds even with the inexact inverse value, since it correctly handles both overflow and underflow of the inverse value, as show on the pictures below:

Overflow

Underflow

Here's a query that selects the first and the last key of each piece:

View the query

Written by Quassnoi

February 23rd, 2010 at 11:00 pm

Posted in PostgreSQL

2 Responses to 'Sargability of monotonic functions: example'

Subscribe to comments with RSS

  1. Why not use functional index?

    create index sin_idx on t_sin(sin(value));

    satek

    14 Oct 10 at 16:08

  2. @satek: because this way you will need a separate index for each function you are going to filter on in your query.

    The point is that you can filter on sin(value), as well as any other monotonic function, using an index on value.

    Read the original article: https://explainextended.com/2010/02/19/things-sql-needs-sargability-of-monotonic-functions/

    Quassnoi

    14 Oct 10 at 16:33

Leave a Reply