PostgreSQL's transition relations.


Great news! PostgreSQL 10 is out. Release notes look amazing and there are a lot of new, wonderful changes and additions. In my personal list, the most desirable are improvements related to parallel queries which have been introduced in 9.6. In the new Postgres, more operations can be executed in parallel, such as B-tree index scan, bitmap heap scan, merge joins and non-correlated subqueries. Of course, new Postgres has other great features such as tables’ publication/subscription based on logical replication and declarative partitioning.

Here, I would like to shed some light on another novation related to the trigger functions - transition relations. What does the official documentation say about transition relations? These are sets of rows that include all of the rows inserted, deleted, or modified by the current SQL statement. Key point here is - all. With transition relations, triggers have a global overview of statement’s output, not just one row at a time. Note, that transition relations are only used in conjunction with AFTER statement triggers.

Below I'm going to consider a simple example where it can be used.
Here are two tables, the first one is the insert-only table with historical data, the second - stores average aggregates based on historical data from the first table.

# CREATE TABLE trendbar (
        period INTEGER,
        create_date TIMESTAMP,
        open BIGINT,
        high BIGINT,
        low BIGINT,
        close BIGINT,
        volume BIGINT,
        symbol_id BIGINT
);
# CREATE TABLE trendbar_avg (
        create_date TIMESTAMP,
        avg_open NUMERIC(10,2),
        avg_high NUMERIC(10,2),
        avg_low NUMERIC(10,2),
        avg_close NUMERIC(10,2),
        sum_volume BIGINT,
        symbol_list BIGINT[]
);


Let's imagine that our application populates this table with the following query:

# INSERT INTO trendbar
        SELECT
                1 AS period,
                now() AS create_date,
                (random() * 10000)::bigint AS open,
                (random() * 10000)::bigint AS high,
                (random() * 1000)::bigint AS low,
                (random() * 10000)::bigint AS close,
                (random() * 100000)::bigint AS volume,
                (random() * 9+1)::bigint AS symbol_id
        FROM generate_series(1,10);


Number of inserted rows may vary, and after each insert operation we need to calculate some aggregates which are used by another application,a report or something else.

Currently, the only way to “reach” rows being acted in the source table is to use FOR EACH ROW trigger. Quite often series of changes on the source table will act on just a single row in the aggregated table. In FOR EACH ROW case, database is forced to re-calculate aggregated value and update that single row multiple times - as many as there are operations in the source. This, unfortunately, leads to the table (and indexes) to bloat. Transition relations don’t have this disadvantage and allow to cumulatively handle rows in that source.

This task is easily solved using AFTER STATEMENT trigger with transition table!

As usual, before adding trigger, a trigger function has to be created. In this simple function we use SELECT query which reads rows from transition relation and generates aggregated values. Transition table is named as "inserted", but it isn’ta strict convention and you are able to rename it.

# CREATE OR REPLACE FUNCTION fn_tg_update_avg() RETURNS trigger as
$$
BEGIN
IF TG_OP = 'INSERT' 
THEN EXECUTE 
'INSERT INTO trendbar_avg 
SELECT create_date, avg(open) avg_open, avg(high) avg_high, avg(low) avg_low, avg(close) avg_close, sum(volume) sum_volume, array_agg(symbol_id order by symbol_id) symbol_list
FROM inserted
GROUP BY create_date ';
RETURN NEW;
ELSE 
RAISE EXCEPTION 'Only INSERT operations allowed';
RETURN NULL;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;

When function is created, the last step is to create a trigger. In the trigger declaration we must specify REFERENCING keyword and transition table. Here, I used NEW TABLE keyword because I've just insert rows. With UPDATE or DELETE operations it's possible to use transition tables that contain old rows’ versions - before DELETE or UPDATE operation. Also, there is one thing to keep in mind, transition relations can’t be specified for triggers with more than one event, for example it isn't possible to handle INSERT or UPDATE or DELETE commands using one trigger.

# CREATE TRIGGER tg_trendbar_update_averages AFTER INSERT ON trendbar REFERENCING NEW TABLE inserted FOR EACH STATEMENT EXECUTE PROCEDURE fn_tg_update_avg();

It's quite simple. Now try inserting new data into the first table with INSERT query which used above and then check the second one.

# INSERT INTO trendbar
        SELECT
                1 AS period,
                now() AS create_date,
                (random() * 10000)::bigint AS open,
                (random() * 10000)::bigint AS high,
                (random() * 1000)::bigint AS low,
                (random() * 10000)::bigint AS close,
                (random() * 100000)::bigint AS volume,
                (random() * 9+1)::bigint AS symbol_id
        FROM generate_series(1,10);

# SELECT * FROM trendbar_avg ORDER BY create_date;
        create_date        | avg_open| avg_high| avg_low| avg_close| sum_volume |      symbol_list        
---------------------------+---------+---------+--------+----------+------------+------------------------
2017-10-06 11:44:07.190285 | 4992.50 | 6660.60 | 530.90 |  5609.60 |     632176 | {1,3,4,4,5,6,7,8,8,9}
2017-10-06 11:44:17.341953 | 4943.30 | 3435.70 | 654.50 |  4545.20 |     497750 | {3,3,4,5,5,6,7,8,9,9}
2017-10-06 11:44:20.485938 | 4416.40 | 6137.10 | 508.80 |  5575.70 |     472368 | {3,6,7,7,8,8,9,9,9,10}

It works! So transition tables look neat and useful in cases when data should be post-processed after its modification.

Another usage for this type of triggers is the cases when you would like to know the exact number of rows for large tables. Usually, we would create each row trigger and update table while maintaining exact values for “count ()”.Using transition relations, Postgres allows you to calculate changes for “count ()” only once in each statement trigger and it helps to reduce the load on the database in case of FOR EACH ROW statement triggers.
If you have any other ideas please do share them in the comments.

And all in all, enjoy PostgreSQL 10!

Comments

  1. I'm really excited about this. I hope that some more work goes into making really powerful incrementally refreshed materialized views now that this is in.

    ReplyDelete

Post a Comment

Popular Posts