Fix Cloudberry restore point test race condition in WAL archiving#35
Draft
Copilot wants to merge 4 commits into
Draft
Fix Cloudberry restore point test race condition in WAL archiving#35Copilot wants to merge 4 commits into
Copilot wants to merge 4 commits into
Conversation
…int test Co-authored-by: chipitsine <2217296+chipitsine@users.noreply.github.com>
Co-authored-by: chipitsine <2217296+chipitsine@users.noreply.github.com>
Co-authored-by: chipitsine <2217296+chipitsine@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix cloudberry test failures in wal-g
Fix Cloudberry restore point test race condition in WAL archiving
Feb 9, 2026
3633c51 to
81395bd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cloudberry test fails when checking for archived WAL files after
create-restore-pointcommands. The test expected ≥2 WAL files per segment but found only 1 for coordinator (seg-1).Root Cause
create-restore-pointtriggerspg_switch_wal()to close current WAL segment, but PostgreSQL's archiver executesarchive_commandasynchronously. Fixed 5-second sleep insufficient for archiver to complete before test verification.Changes
Retry loop with exponential patience (60 attempts × 5s intervals):
Delay between restore point creations:
create-restore-pointOriginal prompt
This section details on the original issue you should resolve
<issue_title>[BUG] cloudberry test failed</issue_title>
<issue_description>### Database name
cloudberry
WAL-G Version
master
Describe your problem
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Starting gpstop with args: -a -M fast
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Gathering information and validating the environment...
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Obtaining Cloudberry Coordinator catalog information
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Cloudberry Version: 'postgres (Apache Cloudberry) 2.1.0-incubating build dev'
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Commencing Coordinator instance shutdown with mode='fast'
wal-g_cloudberry_tests | 20260209:12:09:39:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Coordinator segment instance directory=/usr/local/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Attempting forceful termination of any leftover coordinator process
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Terminating processes for segment /usr/local/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-No standby coordinator host configured
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Targeting dbid [2, 3, 4] for shutdown
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
wal-g_cloudberry_tests | 20260209:12:09:40:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-0.00% of jobs completed
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-100.00% of jobs completed
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-----------------------------------------------------
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:- Segments stopped successfully = 3
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:- Segments with errors during stop = 0
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-----------------------------------------------------
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Successfully shutdown 3 of 3 segment instances
wal-g_cloudberry_tests | 20260209:12:09:42:141081 gpstop:0530ff5b9cb9:gpadmin-[INFO]:-Database successfully shutdown with no errors reported
wal-g_cloudberry_tests | + start_cluster
wal-g_cloudberry_tests | + /usr/local/gpdb_src/bin/gpstart -a -t 180
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Starting gpstart with args: -a -t 180
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Gathering information and validating the environment...
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Cloudberry Binary Version: 'postgres (Apache Cloudberry) 2.1.0-incubating build dev'
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Cloudberry Catalog Version: '302502091'
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Starting Coordinator instance in admin mode
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-CoordinatorStart pg_ctl cmd is env GPSESSID=0000000000 GPERA=None $GPHOME/bin/pg_ctl -D /usr/local/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1 -l /usr/local/gpdb_src/gpAux/gpdemo/datadirs/qddir/demoDataDir-1/log/startup.log -w -t 180 -o " -p 7000 -c gp_role=utility " start
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Obtaining Cloudberry Coordinator catalog information
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Obtaining Segment details from coordinator...
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Setting new coordinator era
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Coordinator Started...
wal-g_cloudberry_tests | 20260209:12:09:42:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Shutting down coordinator
wal-g_cloudberry_tests | 20260209:12:09:46:141423 gpstart:0530ff5b9cb9:gpadmin-[INFO]:-Commencing parallel segment instance startup, please wait...
...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.