Skip to content

pmseries, libpcp_web: add --gc mode to remove expired series data#2575

Draft
natoscott wants to merge 1 commit intomainfrom
series-gc
Draft

pmseries, libpcp_web: add --gc mode to remove expired series data#2575
natoscott wants to merge 1 commit intomainfrom
series-gc

Conversation

@natoscott
Copy link
Copy Markdown
Member

Summary

  • Metric time series ancillary keys (pcp:desc:series:, pcp:series:metric.name:, pcp:series:label.*.value:, etc.) accumulated indefinitely after stream TTL expiry, causing unbounded key server growth
  • Adds pmSeriesGC() to libpcp_web and pmseries --gc / --dryrun to scan for expired series and remove all associated keys asynchronously
  • Adds three new pmproxy MMV metrics: series.gc.calls, series.gc.scanned, series.gc.cleaned

Details

When pcp:values:series:H expires (via stream.expire, default 86400s) the raw data is reclaimed but ~10 ancillary keys per series remain. --gc scans pcp:desc:series:*, checks each stream's liveness, and for dead series removes:

  • pcp:desc:series:H, pcp:metric.name:series:H, pcp:instances:series:H, pcp:labelvalue:series:H, pcp:labelflags:series:H
  • Membership in pcp:series:metric.name:*, pcp:series:inst.name:*, pcp:series:label.<name>.value:*, pcp:series:context.name:*
  • Orphaned pcp:inst:series:* hashes when their inst-name set becomes empty

Output goes through the standard info callback (stdout for CLI, pmproxy log for timer use). A --dryrun flag logs what would be removed without writing anything.

Tested against a local Valkey instance: 9547 series scanned, 8402 identified for cleanup.

Known limitation: SCAN is sent to the first node only (keySlotsRequestFirstNode); full cluster support is future work.

Test plan

  • pmseries --gc --dryrun — verify series listed without any key server writes
  • Load archives, let stream.expire elapse (or set to 1s for testing), run pmseries --gc — verify ancillary keys removed
  • Check series.gc.* MMV metrics visible via pmproxy after a GC run
  • Confirm --gc is mutually exclusive with --load, --query, --values, --window
  • Confirm --dryrun without --gc is rejected with a clear error

🤖 Generated with Claude Code

Metric time series data expires via the pcp:values:series: stream TTL
(stream.expire, default 86400s), but the ~10 ancillary keys per series
(pcp:desc:series:, pcp:series:metric.name:, pcp:series:label.*.value:,
etc.) accumulated indefinitely, causing unbounded key server growth.

Add pmSeriesGC() to libpcp_web and a --gc / --dryrun mode to pmseries
that scans pcp:desc:series:* for all known series, checks whether each
stream is still alive, and for stale series removes:

  - pcp:desc:series:H, pcp:metric.name:series:H
  - pcp:instances:series:H, pcp:labelvalue:series:H, pcp:labelflags:series:H
  - membership in pcp:series:metric.name:*, pcp:series:inst.name:*,
    pcp:series:label.<name>.value:*, pcp:series:context.name:*
  - orphaned pcp:inst:series:* hashes when their inst-name set empties

All key server operations are fully async using the existing
baton/callback infrastructure.  Three new pmproxy MMV metrics track GC
activity: series.gc.calls, series.gc.scanned, series.gc.cleaned.

Output uses the standard info callback so results appear on stdout for
the CLI and in the pmproxy log when invoked from a timer.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@natoscott natoscott requested a review from lmchilton April 27, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant