PostgreSQL BiDirectional Replication

As you can understand from my previous blogs I am really into PostgreSQL.

Previously we run Debezium in Embedded mode. Behind the scenes Debezium consumes the changes that were committed to the transaction log. This happens by utilising the logical decoding feature of PostgreSQL.

In this blog we shall focus on replication and more specific bidirectional replication. To achieve bidirectional replication in PostgreSQL we need the module pglogical. You might wonder the deference between logical decoding and pglogical. Essentially logical decoding has its origins from PgLocigal. View PgLocial as a more featureful module while logical decoding is embedded to a PostgreSQL distribution.

We will create a custom PostgreSQL Docker image and install PgLogical.

# Use the official PostgreSQL image as base
FROM postgres:15
USER root
RUN apt-get update; apt-get install postgresql-15-pglogical -y
USER postgres

Also we need to have a PostgreSQL configuration that will enable PgLogical replication and the conflict resolution.

listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3
track_commit_timestamp = on
shared_preload_libraries = 'pglogical'
pglogical.conflict_resolution = 'first_update_wins'

Let’s break this down. We added pglogical and we enabled track_commit_timestamp. By enabling this parameter PostgreSQL tracks the commit time of transactions. This will be crucial for the conflict resolution strategy.
Now let’s see the conflict resolution. We selected ‘first_update_wins’, therefore in case of two transactions operating on the same row, the transaction that finished first will be the one to be considered.

Bidirectional replication is setup upon a table. Since we use Docker we shall provide an initialization script to PostgreSQL.

The script will:

  • Enable pglogical
  • Create the table
  • Add a target node
  • Insert the row we shall run tests upon
#!/bin/bash
set -e

psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
  ALTER SYSTEM RESET shared_preload_libraries;
  CREATE EXTENSION pglogical;

  create schema test_schema;
  create table test_schema.employee(
          id  SERIAL PRIMARY KEY,
          firstname   TEXT    NOT NULL,
          lastname    TEXT    NOT NULL,
          email       TEXT    not null,
          age         INT     NOT NULL,
          salary         real,
          unique(email)
      );

  SELECT pglogical.create_node(
      node_name := '$TARGET',
      dsn := 'host=$TARGET port=5432 dbname=$POSTGRES_DB user=$POSTGRES_USER password=$POSTGRES_PASSWORD');

  SELECT pglogical.replication_set_add_table('default', 'test_schema.employee', true);

  insert into test_schema.employee (id,firstname,lastname,email,age,salary) values (1,'John','Doe 1','john1@doe.com',18,1234.23);


EOSQL

Let’s create the instances now using docker compose.

version: '3.1'

services:
  postgres-a:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-b
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5431:5432
  postgres-b:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-a
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5432:5432

We can get our instances up and running by issuing

docker compose up

Docker Compose V2 is out there with many good features, you can find more about it on the book I authored:
A Developer’s Essential Guide to Docker Compose
.

Since both instances are up and running we need to enable the replication. Therefore we shall subscribe the nodes to each other.

Execute on the first node

SELECT pglogical.create_subscription(
  subscription_name := 'postgres_b',
  provider_dsn := 'host=postgres-b port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );

Execute at the second node

SELECT pglogical.create_subscription(
  subscription_name := 'postgres_a',
  provider_dsn := 'host=postgres-a port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );

You can use any PostgreSQL client that suits you. Alternatively you can just use the psql client that comes packaged with the Docker Images.
For example:

Login to the first node

docker compose exec postgres-a psql  --username postgres --dbname postgres

Login to the second node

docker compose exec postgres-b psql  --username postgres --dbname postgres

Let’s see how conflict resolution will work now.

On the first node we shall run the following snippet

BEGIN;
UPDATE test_schema.employee SET lastname='first wins';

#before committing start transaction on postgres-b

COMMIT;

Don’t press commit immediately, instead take the time and before you commit the transaction start the following transaction on the second node.

BEGIN;
UPDATE test_schema.employee SET lastname='second looses';

#make sure transaction on node postgres-a is committed first.

COMMIT;

This transaction will be committed after the transaction that takes places in postgres-a.

Let’s check the logs on postgres-a-1

postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] LOG:  CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: keep_local.
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] DETAIL:  existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=748,origin=0,timestamp=2024-05-01 07:10:42.269227+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:45.125791+00,commit_lsn=0/16181C0
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] CONTEXT:  apply UPDATE from remote relation test_schema.employee in commit before 0/16181C0, xid 747 committed at 2024-05-01 07:10:45.125791+00 (action #2) from node replorigin 1

The transaction that took place on postgres-a finished first. Postgres-a received the replication data from the transaction of node postgres-b. A comparison was issued on the commit timestamp, because the commit timestamp of the transaction on postgres-a was earlier the resolution was to keep the local changes.

We can see the reverse on postgres-b


postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] LOG: CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: apply_remote.
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] DETAIL: existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=747,origin=0,timestamp=2024-05-01 07:10:45.125791+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:42.269227+00,commit_lsn=0/1618488
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] CONTEXT: apply UPDATE from remote relation test_schema.employee in commit before 0/1618488, xid 748 committed at 2024-05-01 07:10:42.269227+00 (action #2) from node replorigin 1

Let’s check the result in the database.

postgres=# SELECT*FROM test_schema.employee;
 id | firstname |  lastname  |     email     | age | salary  
----+-----------+------------+---------------+-----+---------
  1 | John      | first wins | john1@doe.com |  18 | 1234.23

As expected the first transaction is the one that stayed.
To wrap it up:

  • We started two transactions in parallel
  • We changed the same row
  • We accepted the changes of the transaction that finished first

That’s it. Hope you had some fun and now you have another tool for y0ur needs.