Skip to content

pgsql: add dynamic replication mode management for out-of-cluster standbys#2142

Open
y-ikeda-ha wants to merge 2 commits intoClusterLabs:mainfrom
y-ikeda-ha:pgsql-external-standby-sync-mode
Open

pgsql: add dynamic replication mode management for out-of-cluster standbys#2142
y-ikeda-ha wants to merge 2 commits intoClusterLabs:mainfrom
y-ikeda-ha:pgsql-external-standby-sync-mode

Conversation

@y-ikeda-ha
Copy link
Copy Markdown

Summary

This PR adds a new feature to the pgsql resource agent that dynamically manages the replication mode of PostgreSQL standbys connecting from outside the Pacemaker cluster, targeting multi-site disaster recovery (DR) use cases.

Motivation

In multi-site DR architectures where independent Pacemaker HA clusters run at separate sites, PostgreSQL data is replicated from the primary site to the DR site using synchronous streaming replication.

The current pgsql RA has no mechanism to manage replication connections from PostgreSQL instances outside the cluster. Administrators must manually change synchronous_standby_names to enable synchronous replication with the DR site. When the out-of-cluster synchronous standby disconnects, client transactions hang until an administrator manually reverts the configuration.

Solution

A new optional parameter external_standby_node_list is introduced.
When set, the RA automatically:

  1. Detects connections from listed external nodes via pg_stat_replication during the monitor action
  2. Adds connected nodes to synchronous_standby_names (using FIRST N (...) syntax when multiple sync targets exist)
  3. Removes disconnected nodes from synchronous_standby_names, preventing client transaction hangs without administrator intervention

Use case

Normal operation:

[Primary Site]                       [DR Site]
  Pacemaker Cluster                    Pacemaker Cluster
  +-----------------------+            +-----------------------+
  | primary1      (PRI)   |  sync rep  | dr-standby1     (HS)  |
  | standby1      (HS)    | -------->  |         |             |
  +-----------------------+            |  async rep (cascade)  |
                                       |         v             |
                                       | dr-standby2     (HS)  |
                                       +-----------------------+

When dr-standby1 fails:

[Primary Site]                       [DR Site]
  Pacemaker Cluster                    Pacemaker Cluster
  +-----------------------+            +-----------------------+
  | primary1      (PRI)   |  sync rep  | dr-standby1  (FAILED) |
  | standby1      (HS)    | -------->  |                       |
  +-----------------------+     |      | dr-standby2     (HS)  |
                                |      +-----------------------+
                                |              ^
                                +--------------+
node_list="primary1 standby1"
external_standby_node_list="dr-standby1 dr-standby2"

In this topology:

  • standby1 is an in-cluster synchronous standby managed by the existing node_list parameter.
  • dr-standby1 connects from outside the primary site's cluster via synchronous replication. It is listed in external_standby_node_list.
  • dr-standby2 normally replicates asynchronously from dr-standby1 (cascading replication) and does not connect directly to the primary. However, it is also listed in external_standby_node_list so that if dr-standby1 fails, dr-standby2 can connect directly to the primary and be automatically promoted to synchronous standby.

Key behaviors:

  1. When dr-standby1 connects to the primary, the RA adds it to synchronous_standby_names automatically.
  2. When dr-standby1 disconnects, the RA removes it, preventing transaction hangs.
  3. If dr-standby2 then connects directly to the primary (as a failover within the DR site), the RA detects this and adds dr-standby2 to synchronous_standby_names automatically.

This means external_standby_node_list serves as a pre-registered list of potential sync standby nodes — nodes do not need to be connected at the time of configuration.

Changes

This PR contains two commits:

Commit 1: pgsql: enhance set_sync_mode to support multiple sync standby targets

Refactors set_sync_mode() as a prerequisite:

  • Accepts a space-separated list of node names (previously single node only)
  • Generates FIRST N (...) syntax when there are 2+ sync targets
  • Adds idempotency check to skip unnecessary pg_ctl reload
  • Parses both FIRST N (...) and plain quoted format from rep_mode.conf

No behavioral change when called with a single node argument (existing usage).

Commit 2: pgsql: add external_standby_node_list for out-of-cluster sync replication management

Adds the new feature:

  • New parameter external_standby_node_list (optional, default: empty)
  • Modified control_slave_status() to evaluate external nodes and make a consolidated sync mode decision
  • Warning log when a synchronous connection from an external node is lost
  • Variable initialization in validate_ocf_check_level_10()

Backward compatibility

  • When external_standby_node_list is not set (default), behavior is identical to the existing implementation
  • Designed for rep_mode="sync" configurations
  • FIRST N syntax requires PostgreSQL 9.6+; single-target mode works with PostgreSQL 9.1+

Testing

Tested with:

  • Red Hat Enterprise Linux release 9.6
  • pacemaker-2.1.9-1.el9.x86_64
  • postgresql17-17.6-1PGDG.rhel9.x86_64

Tested topology: primary1 (PRI) + standby1 (sync HS, in-cluster) + dr-standby1 (sync HS, external) + dr-standby2 (async HS, cascading from dr-standby1).

Test scenarios:

  • dr-standby1 connects → automatically added to synchronous_standby_names
  • dr-standby1 disconnects → automatically removed (no transaction hang)
  • dr-standby1 fails, dr-standby2 connects directly to primary → automatically added to synchronous_standby_names
  • In-cluster standby + external standby connected simultaneously → FIRST N (...) syntax generated
  • external_standby_node_list not set → identical behavior to current code

AI disclosure

This PR description and commit messages were written with the assistance of Claude (Anthropic). The code itself was designed and implemented by the author. See the Assisted-by: trailer in each commit message.

Refactor set_sync_mode() to handle multiple synchronous standby nodes:

- Accept a space-separated list of node names as the argument
- Generate FIRST N (...) syntax for synchronous_standby_names when
  there are two or more sync targets
- Add idempotency check: skip configuration reload when the current
  settings already match the desired state
- Parse both FIRST N (...) format and plain quoted format from
  rep_mode.conf for comparison

This prepares for multi-target sync replication scenarios and also
reduces unnecessary pg_ctl reloads in the existing single-target case.

No behavioral change when called with a single node argument
(existing usage).

Assisted-by: Claude (Anthropic)
…tion management

In multi-site disaster recovery architectures where independent
Pacemaker clusters run at separate sites, the pgsql RA needs to
manage synchronous replication connections from PostgreSQL instances
outside the local cluster.

Without this feature, administrators must manually modify
synchronous_standby_names to enable synchronous replication with
DR-site standbys. When such a standby disconnects, client transactions
hang until manual intervention.

Add a new optional parameter "external_standby_node_list" that
specifies standby node names connecting from outside the cluster:

- During monitor (control_slave_status), the RA checks
  pg_stat_replication for both in-cluster and external nodes
- Connected external nodes are added to synchronous_standby_names
- Disconnected external nodes are removed automatically, preventing
  transaction hangs
- A warning is logged when an external sync connection is lost

When external_standby_node_list is not set (default), behavior is
identical to the existing implementation.

Tested-on: RHEL 9.6, Pacemaker 2.1.9, PostgreSQL 17.6

Assisted-by: Claude (Anthropic)
@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Mar 31, 2026

Can one of the project admins check and authorise this run please: https://haci.fast.eng.rdu2.dc.redhat.com/job/resource-agents/job/resource-agents-pipeline/job/PR-2142/1/input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant