Upgrading inheritance partitioning made easy.

An alternative way to upgrade to PostgreSQL 10 native partitioning using ALTER TABLEs and PLPGSQL.

Today's post is a bit spontaneous, because a few days ago I read the post about PostgreSQL's new native table partitioning explaining how to migrate from an old inheritance partitioning to the new one.

Author of the post proposed using pg_dump and pg_restore utilities. This way is quite simple and requires minimal effort. However, in some instances, especially in case of large partitions, dumping partition's data, restoring and creating indexes might take a long time.


In this post, I would like to offer an alternative way for partitioning upgrade. This method based on some ALTER commands wrapped into PLPGSQL procedure. Of course, I should note this technique only work if you already upgraded to PostgreSQL 10.
Let's consider a classic example, with master table and several tables which partitioned using timestamp column. Using the following commands it's possible to create our example tables.
# CREATE TABLE events (
    id serial primary key,
    cta_point integer,
    decay_factor integer,
    created_at timestamp without time zone);
# CREATE TABLE events_201708 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201709 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201710 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201711 (LIKE events INCLUDING ALL) INHERITS (events);

Next step is to create trigger function and the trigger itself which would use our function when new records are inserted into the table.

# CREATE OR REPLACE FUNCTION fn_insert_events() RETURNS TRIGGER AS
    $function$
    BEGIN 
    EXECUTE format($$INSERT INTO %I VALUES ($1.*)$$, 'events_'||to_char(NEW.created_at, 'YYYYMM')) USING NEW;
    RETURN NULL;
    END;
    $function$ LANGUAGE plpgsql;
# CREATE TRIGGER events_trigger BEFORE INSERT ON events FOR EACH ROW EXECUTE PROCEDURE fn_insert_events();

As last step, let's add some generated data into our partitions.

# INSERT INTO events (cta_point,decay_factor,created_at)
SELECT (random() * 80 + 60)::int,(random() * 100 + 400)::int,i
FROM generate_series('2017-08-01'::timestamp,'2017-11-30','10 minutes') gs(i);


Check that everything is ok with data distribution. The master table should not contain any rows and partitions should be evenly filled.
# \dt+
                         List of relations
 Schema |     Name      | Type  |   Owner  |   Size  | Description
--------+---------------+-------+----------+---------+-------------
 public |        events | table | postgres | 0 bytes |
 public | events_201708 | table | postgres |  256 kB |
 public | events_201709 | table | postgres |  248 kB |
 public | events_201710 | table | postgres |  256 kB |
 public | events_201711 | table | postgres |  240 kB |

When test partitions are ready we can proceed to preparing the upgrade procedure.

Declarative partitioning in PostgreSQL 10 supports not only CREATE TABLE syntax, but ALTER TABLE as well. Mainly, it gives an opportunity to attach or detach tables like a partition to an existing table. So, there is no barriers to handle all partitions, switch off inheritance and attach to the new master table. This is in general, but how does it actually work?

Of course, first, we must create new master table using CREATE TABLE and, specify, basic partitioning rule - using RANGE or LIST. In our case, RANGE would be an appropriate choice. Next step, is to get the list of all source table partitions which can be obtained from pg_class and pg_inherits system tables. When the list is ready we should disable inheritance and attach partition to the new master table.

There are a few points of note here. First, when we attach partitions to master table we must specify min and max values which are then used like borders for the inserted rows. This values can be obtained from partitions' names - in my practice, partitions' names indirectly point to intervals of stored rows, the most obvious examples are daily or monthly partitions in following format: YYYYMMDD, YYYY_MM, YYYY_MM_DD or similar. Tables' names are not universal though so it doesn’t guarantee 100% accuracy, but it's an easy way to identify intervals and works in most cases.

Thus, we need to extract required information from table's name and translate it into values acceptable by ALTER TABLE command. This can be done using different string or date/time builtin functions.

Another key point is sequence handling. If original table uses sequence, this one also should be attached to the new table.

Well, it seems there are no additional pitfalls to fall into and all these ideas can be implemented in the following function.

# \i ~/Temp/fn_upgrade_partitioning.sql
CREATE FUNCTION

When function is created, we can upgrade our existing partitioning. The function requires 3 arguments: the name of the master table which should be upgraded; the name of the column which used as partition key; and last the name of the column with sequence.

# SELECT fn_upgrade_partitioning('events', 'created_at', 'id');
NOTICE: events_201708 reattached from events to events_new
NOTICE: events_201709 reattached from events to events_new
NOTICE: events_201710 reattached from events to events_new
NOTICE: events_201711 reattached from events to events_new
NOTICE: partitioning for events upgraded.
fn_upgrade_partitioning
-------------------------
(1 row)
Following the upgrade, let's check that generated data inserted without problems.

# INSERT INTO events (cta_point,decay_factor,created_at)
SELECT (random() * 80 + 60)::int,(random() * 100 + 400)::int,i
FROM generate_series('2017-08-01'::timestamp,'2017-11-30','5 minutes') gs(i);
INSERT 0 34849


Sizes and data distribution also look fine.
# \dt+
                         List of relations
 Schema |     Name      | Type  |   Owner  |   Size  | Description
--------+---------------+-------+----------+---------+-------------
 public |        events | table | postgres | 0 bytes |
 public | events_201708 | table | postgres |  712 kB |
 public | events_201709 | table | postgres |  688 kB |
 public | events_201710 | table | postgres |  712 kB |
 public | events_201711 | table | postgres |  664 kB |


As mentioned earlier it's fast and doesn’t require data dump and load. Of course, it requires some PLPGSQL skills but this is a very useful skill to have. Using PLPGSQL and making tiny changes you can adapt the function to your particular partitioning and upgrade to new PostgreSQL 10 declarative partitioning fast.

Hope you enjoyed this post. If you tried alternative ways of migration from an old inheritance partitioning to the new one let me know in comments!

Comments

  1. I will add one note: ALTER TABLE ATTACH PARTITION can use already existed exclude constraints to verify partitioning key in new partition.

    Usually partitions has check constraint. For example this:
    create table events_201708 (LIKE events INCLUDING ALL, check(created_at >= '2017-08-01' and created_at < '2017-09-01')) INHERITS (events);
    In this case reattach table as partition with commands
    alter table events_201708 no inherit events;
    alter table events_master attach partition events_201708 for values from ('2017-08-01') to ('2017-09-01');
    performs very quick and do not rescan table data.

    So with using "alter table no inherit" + "alter table attach partition" migration can be done in few seconds regardless of real table sizes.
    But note: partition key must be NOT NULL in inherited table to speed up constraint check.

    ReplyDelete
  2. All partition tables might not have sequence, So better we should make sequence part for "fn_upgrade_partitioning()" optional . Right?

    ReplyDelete
    Replies
    1. Yes, of course. In real world, each case might have little differences, so "fn_upgrade_partitioning()" isn't a universal swiss knife for all cases.

      Delete

Post a Comment

Popular Posts