Add CDC file cleanup watchdog#29
Draft
mble wants to merge 9 commits into
Draft
Conversation
Parses duration strings like "30s", "15m", "2h" into seconds. A bare number with no suffix is treated as seconds. Returns false on parse error. This will be used by the --cleanup-min-age CLI flag for the CDC file cleanup watchdog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add CDC file cleanup configuration options to CopyDBOptions struct and wire them into cli_copy_db_getopts (clone/follow) and cli_stream_getopts (stream subcommands). These flags accept human-readable values using the existing cli_parse_bytes_pretty and cli_parse_duration parsers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the cleanup subprocess to the processArray in both follow_wait_subprocesses and follow_terminate_subprocesses so it gets proper signal handling and waitpid management. Start the cleanup watchdog in followDB after the catchup subprocess, gated on cleanupThresholdBytes > 0. Subprocesses with pid <= 0 are skipped automatically, so an unconfigured cleanup process does not interfere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ld_cleanup.c/h with the core cleanup logic that runs as a forked subprocess. The watchdog periodically scans the CDC directory, identifies applied .json/.sql files (LSN < replay_lsn), and deletes the oldest first when total applied file bytes exceed the configured threshold. Respects a minimum age floor unless disk pressure requires overriding it. Replace the follow_start_cleanup stub in follow.c with a call to cdc_cleanup_loop. The Makefile picks up the new source automatically via its wildcard pattern. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verifies that pgcopydb follow works correctly with --cleanup-threshold and --cleanup-min-age flags, that the cleanup subprocess doesn't crash or interfere with the apply pipeline, and that follow reaches endpos and exits cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply citus_indent formatting (move && to end of line, add braces, fix argument alignment) - Add IGNORE-BANNED for qsort() in ld_cleanup.c - Regenerate clone.rst and follow.rst with new cleanup options Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace qsort (banned API) with repeated linear min-scan to find the oldest file each iteration. The I/O cost of unlink dominates, so the O(n*k) scan cost is negligible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
During long-running CDC with high write volume, pgcopydb accumulates
.jsonand.sqlfiles in the CDC directory indefinitely after they've been applied. This can exhaust disk space.This PR adds a cleanup watchdog subprocess that periodically deletes applied CDC files based on a configurable size threshold and minimum age floor:
--cleanup-threshold(e.g.10GB) — max total size of applied CDC files to retain; oldest applied files are deleted when exceeded. Set to0to disable (default).--cleanup-min-age(e.g.15m) — minimum age before an applied file is eligible for deletion. Defaults to 15 minutes when threshold is set. Overridden under disk pressure.How it works
A new
cleanupFollowSubProcessruns alongside prefetch/transform/catchup. Every 30 seconds it:replay_lsnfrom the sentinel table.json/.sqlfiles whose WAL segment LSN is belowreplay_lsn(fully applied)Safety
replay_lsn, which is only advanced after apply finishes with a file--cleanup-threshold 0(default) disables the watchdog entirely — backward compatibleTest plan
pgcopydb follow --cleanup-threshold 100MB --cleanup-min-age 1magainst a write-heavy source (e.g.pgbench); confirm CDC files are cleaned up after apply advances past themps aux | grep pgcopydbshowspgcopydb: follow cleanupsubprocess--resume; confirm correct pickup without data loss--cleanup-threshold 0(default) does not fork the cleanup subprocesstests/cdc-cleanup/integration testtests/cdc-wal2json/to confirm no regressions