Skip to content

pgsql: Fixed an issue where tmpdir was not created when rep_mode was set to slave.#2071

Draft
yiida-ha wants to merge 1 commit intoClusterLabs:mainfrom
yiida-ha:develop/fix_rep_mode_slave
Draft

pgsql: Fixed an issue where tmpdir was not created when rep_mode was set to slave.#2071
yiida-ha wants to merge 1 commit intoClusterLabs:mainfrom
yiida-ha:develop/fix_rep_mode_slave

Conversation

@yiida-ha
Copy link
Copy Markdown

@yiida-ha yiida-ha commented Sep 4, 2025

Problem Details

When starting a pgsql resource with rep_mode="slave", the following error occurred, causing the start operation to fail.
The error message is as follows:

Failed Resource Actions:
  * remote-site-pgsql_start_0 on dr-standby1 'error' (1): call=34, status='complete', exitreason='Can't create recovery.conf.', last-rc-change='Thu Sep  4 11:59:06 2025', queued=0ms, exec=309ms

Environment

  • PostgreSQL 17
  • Resource setting: rep_mode="slave"

Solution

In the current code, the tmp directory is not created when rep_mode="slave".
The existing code creates the tmp directory only if the result of is_replication() is true within the validate_ocf_check_level_10 function.
To resolve this issue, I modified the validate_ocf_check_level_10 function to create the tmp directory even when rep_mode="slave" is set.

@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Sep 4, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/1/input

Comment thread heartbeat/pgsql Outdated
rc=$?
if [ $rc -eq 1 ]||[ $rc -eq 2 ]; then # PosrgreSQL 12 or later.
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
ocf_exit_reason "Can't create directory $OCF_RESKEY_tmpdir or it is not readable by $OCF_RESKEY_pgdba"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should say "writable", and should only be run during the start-action.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment.

I modified the code based on the following reasoning:

  • The existing code already included a similar process for creating tmpdir within validate_ocf_check_level_10(), so I aligned with that approach.
  • As you mentioned, if it should only run during the start action, should I move the above code to the start process as well?
  • Additionally, I slightly modified the code so it only runs when rep_mode="slave" is set.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move it to the start-action, as we dont need it for regular actions (if it's needed for monitor or similar we might want to change the logic in those actions).

You'll also have to cover "promoted" or what term they are using in the latest PostgreSQL releases.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment.
As requested, I've moved the tmpdir creation process to the start action.
cd8e58f
Based on my research, all write operations to tmpdir within the start action were consolidated in make_recovery_conf(), so I've grouped the processing there.
How does this approach look?

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from c86b25d to 0c3f2cf Compare September 12, 2025 02:07
@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Sep 12, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/2/input

@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Sep 17, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/3/input

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from 9829572 to cd8e58f Compare September 18, 2025 01:28
@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Sep 18, 2025

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/4/input

Comment thread heartbeat/pgsql Outdated
if [ $rc -eq 1 ]||[ $rc -eq 2 ]; then # PosrgreSQL 12 or later.
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
ocf_exit_reason "Can't create directory $OCF_RESKEY_tmpdir or it is not readable by $OCF_RESKEY_pgdba"
return $OCF_ERR_PERM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the outer if[] here, and and use [ "$OCF_RESKEY_rep_mode" = "slave" ] && return $OCF_ERR_PERM || return $OCF_ERR_GENERIC here instead.

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from cd8e58f to 5a3a98a Compare February 26, 2026 05:20
Comment thread heartbeat/pgsql Outdated
}

make_recovery_conf() {
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesnt this fail if the directory already exists?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment.
The -p option of the mkdir command does not fail regardless of whether the directory exists.
Therefore, if the directory exists, ! mkdir -p XXX is inverted by the ! and returns exit code 1.
Since 1 is returned, the subsequent processing proceeds via ||.

@yiida-ha yiida-ha force-pushed the develop/fix_rep_mode_slave branch from 5a3a98a to 56db6ed Compare February 27, 2026 02:28
@knet-jenkins
Copy link
Copy Markdown

knet-jenkins bot commented Feb 27, 2026

Can one of the project admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents/job/resource-agents-pipeline/job/PR-2071/6/input

@yiida-ha yiida-ha marked this pull request as draft February 27, 2026 02:35
@yiida-ha
Copy link
Copy Markdown
Author

Sorry
After test, we will publish the draft.

Comment thread heartbeat/pgsql
check_stat_temp_directory

if [ "$OCF_RESKEY_rep_mode" = "slave" ]; then
if ! mkdir -p $OCF_RESKEY_tmpdir || ! chown $OCF_RESKEY_pgdba $OCF_RESKEY_tmpdir || ! chmod 700 $OCF_RESKEY_tmpdir; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the description in the metadata you can probably move this to the beginning of pgsql_start().

Maybe the metadata needs updating as well, as it sounds like it's not optional for replication anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants