This page in other versions: 0.9 / 1.0 / 1.0.3

8.1. Executing DDL on BDR systems

A BDR group is not the same as a standalone PostgreSQL server. It is based on asynchronous multi-master replication without a central locking and transaction co-ordinator. This has important implications when executing DDL.

To safely manipulate the database schema in an asynchronous multimaster setup, all pending changes have to be replicated first. Otherwise it is possible that a row being replicated contains data for a row that has been dropped, or has no data for a row that is marked NOT NULL. More complex cases also exist. To handle this problem, BDR acquires a so-called DDL lock the first time in a transaction where schema changes are made.

Acquiring the global DDL lock requires contacting all nodes in a BDR group, asking them to replicate all pending changes, and prevent further changes from being made. Once all nodes are fully caught up, the originator of the DDL lock is free to perform schema changes and replicate them to the other nodes. While the global DDL lock is held by a node, no nodes may perform any DDL or make any changes to rows.

This means that schema changes, unlike with data changes, can only be performed while all configured nodes are reachable. If DDL has to be performed while a node is down, it has to be removed from the configuration (using bdr.bdr_part_by_node_names) first.

DDL is a heavier weight operation than on standalone PostgreSQL. Performing DDL on any node will cancel (abort) currently running transactions on all nodes with an ERROR, and will reject new DML (INSERT, UPDATE and DELETE on all nodes with an ERROR with SQLSTATE 55P03:

    ERROR: Database is locked against DDL operations
   

This continues until the DDL operation has replicated to all nodes, been applied, and all nodes have confirmed to the DDL originator that the changes have been applied. All DDL and DML will ERROR, even if it does not affect the objects the currently in-progress DDL is modifying.

Because DDL is disruptive in BDR, transactions can't do DDL that requires a heavy global lock by default. This is controlled by the bdr.permit_ddl_locking setting. If set to false (the default), any command that would acquire the global DDL lock is rejected with an ERROR instead. This helps prevent unintended global DDL lock acquisitions.

To minimise the impact of DDL, transactions performing DDL should be short, should not be combined with lots of row changes, and should avoid long running foreign key lookups.

If DDL is holding the system up for too long, it is possible and safe to cancel the DDL on the originating node like you would cancel any other statement, e.g. with Control-C in psql or with pg_cancel_backend.

Once the DDL operation has committed on the originating node, you cannot cancel or abort it. You must wait for it to apply successfully on all other nodes and for them to replay confirmation. This is why it is important to keep DDL transactions short and fast.