Skip to content

feat(profiling): add GC observability collector#18566

Draft
vlad-scherbich wants to merge 8 commits into
mainfrom
vlad/profiling-gc-collector
Draft

feat(profiling): add GC observability collector#18566
vlad-scherbich wants to merge 8 commits into
mainfrom
vlad/profiling-gc-collector

Conversation

@vlad-scherbich

Copy link
Copy Markdown
Contributor

Adds GCCollector that hooks gc.callbacks to measure per-generation pause durations, emits alloc samples for collected objects, and records explicit gc.collect() call counts per flush interval. Controlled via DD_PROFILING_GC_ENABLED (default: true).

Description

Testing

Risks

Additional Notes

Adds GCCollector that hooks gc.callbacks to measure per-generation pause
durations, emits alloc samples for collected objects, and records explicit
gc.collect() call counts per flush interval. Controlled via
DD_PROFILING_GC_ENABLED (default: true).
@datadog-datadog-prod-us1

datadog-datadog-prod-us1 Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Pipelines  Tests

Fix all issues with BitsAI or with Cursor

⚠️ Warnings

🚦 42 Pipeline jobs failed

System Tests | tracer-release / End-to-end #1 / uds-flask 1   View in Datadog   GitHub Actions

🧪 1 Test failed

tests.test_data_integrity.Test_TraceUniqueness.test_trace_ids[uds-flask] from system_tests_suite   View in Datadog (Fix with Cursor)
ValueError: Found duplicated trace id 6235347653552129956 in ./logs/interfaces/library/00107__v0.4_traces.json and ./logs/interfaces/library/00102__v0.4_traces.json

self = <tests.test_data_integrity.Test_TraceUniqueness object at 0x7f954c8a9760>

    def test_trace_ids(self):
>       interfaces.library.assert_trace_id_uniqueness()

tests/test_data_integrity.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

...

System Tests | tracer-release / End-to-end #2 / uwsgi-poc 2   View in Datadog   GitHub Actions

🧪 2 Tests failed

tests.test_standard_tags.Test_StandardTagsNetworkClientIp.test_network_client_ip[uwsgi-poc] from system_tests_suite   View in Datadog (Fix with Cursor)
AssertionError: assert '43.43.43.43' != '43.43.43.43'
 +  where '43.43.43.43' = <tests.test_standard_tags.Test_StandardTagsNetworkClientIp object at 0x7f4c9ff4fcb0>.PUBLIC_IP

self = <tests.test_standard_tags.Test_StandardTagsNetworkClientIp object at 0x7f4c9ff4fcb0>

    def test_network_client_ip(self):
        """Test network.client.ip is reported and different from http.client_ip."""
>       self._test()

tests/test_standard_tags.py:245: 
...
tests.test_standard_tags.Test_StandardTagsNetworkClientIp.test_network_client_ip_with_attack[uwsgi-poc] from system_tests_suite   View in Datadog (Fix with Cursor)
AssertionError: assert '43.43.43.43' != '43.43.43.43'
 +  where '43.43.43.43' = <tests.test_standard_tags.Test_StandardTagsNetworkClientIp object at 0x7f4c9ff4fda0>.PUBLIC_IP

self = <tests.test_standard_tags.Test_StandardTagsNetworkClientIp object at 0x7f4c9ff4fda0>

    def test_network_client_ip_with_attack(self):
        """Test network.client.ip is reported on ASM attacks. This is a special case to map the legacy behavior where this header would only be added on attacks, and not the general case."""
>       self._test()

tests/test_standard_tags.py:252: 
...

System Tests | tracer-release / End-to-end #5 / flask-poc 5   View in Datadog   GitHub Actions

🧪 1 Test failed

tests.test_data_integrity.Test_TraceUniqueness.test_trace_ids[flask-poc] from system_tests_suite   View in Datadog (Fix with Cursor)
ValueError: Found duplicated trace id 6008680841824951237 in ./logs/interfaces/library/00069__v0.4_traces.json and ./logs/interfaces/library/00064__v0.4_traces.json

self = <tests.test_data_integrity.Test_TraceUniqueness object at 0x7fe511a15280>

    def test_trace_ids(self):
>       interfaces.library.assert_trace_id_uniqueness()

tests/test_data_integrity.py:19: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

...

View all 42 failed jobs.

❄️ 11 New flaky tests detected

test_explicit_count_resets_on_snapshot[py3.10] from test_gc.py   View in Datadog (Fix with Cursor)
module 'ddtrace.profiling.collector' has no attribute 'gc'

New test introduced in this PR is flaky.

test_explicit_count_resets_on_snapshot[py3.9] from test_gc.py   View in Datadog (Fix with Cursor)
module 'ddtrace.profiling.collector' has no attribute 'gc'

New test introduced in this PR is flaky.

View in Flaky Test Management

ℹ️ Info

🔄 Datadog auto-retried 1 job - 0 passed on retry View in Datadog

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: dd1b04a | Docs | Datadog PR Page | Give us feedback!

@vlad-scherbich vlad-scherbich added the Profiling Continous Profling label Jun 10, 2026
…mport

test_capture_sampler_pure_python_fallback pops and reimports
ddtrace.profiling.collector, creating a fresh package object that lacks
the 'gc' submodule attribute.  mock.patch("ddtrace.profiling.collector.gc.ddup")
in test_gc.py then fails with AttributeError because __import__ on an
already-cached submodule does not re-attach it to the new parent object.

Two-part fix:
1. test_collector.py: re-attach all already-cached direct submodules to
   the freshly-reimported package in the finally block.
2. test_gc.py: switch all mock.patch string-path usages to
   mock.patch.object(_gc_module, "ddup") which targets the imported
   module object directly, bypassing the parent-package attribute chain.
@cit-pr-commenter-54b7da

cit-pr-commenter-54b7da Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codeowners resolved as

tests/profiling/collector/test_gc.py                                    @DataDog/profiling-python

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new profiling collector to observe Python garbage collection activity, exposing GC pause durations and related GC signals through the existing ddup profiling pipeline, gated by DD_PROFILING_GC_ENABLED.

Changes:

  • Introduces GCCollector that hooks gc.callbacks and patches gc.collect() to capture per-generation GC pauses and explicit collection counts.
  • Wires the collector into the profiler startup flow and adds a new profiling config namespace (profiling.gc.enabled) plus supported-configuration metadata for DD_PROFILING_GC_ENABLED.
  • Adds a new GC collector test suite and adjusts an import-related test cleanup to better restore re-imported package state.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
ddtrace/profiling/collector/gc.py New GC profiling collector that emits GC pause samples and an interval “config/count” sample.
ddtrace/profiling/profiler.py Registers GCCollector with the profiler when enabled.
ddtrace/internal/settings/profiling.py Adds ProfilingConfigGC and includes it in profiling config + config_str().
ddtrace/internal/settings/_supported_configurations.py Adds DD_PROFILING_GC_ENABLED to supported env-var list.
supported-configurations.json Documents the new DD_PROFILING_GC_ENABLED configuration.
tests/profiling/collector/test_gc.py New unit/integration tests for the GC collector behavior and config gating.
tests/profiling/collector/test_collector.py Ensures re-imported ddtrace.profiling.collector package re-attaches already-imported submodules during cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to +55
def _patched_collect(self, generation: int = 2) -> int:
self._explicit_count += 1
return self._orig_collect(generation)

Comment thread ddtrace/profiling/collector/gc.py
Comment on lines +75 to +78
handle2 = ddup.SampleHandle()
handle2.push_alloc(collected, 1)
handle2.push_frame(frame_name, "gc", 0, gen)
handle2.push_monotonic_ns(time.monotonic_ns())
Comment thread ddtrace/profiling/collector/gc.py Outdated
Comment on lines +85 to +107
thresholds = gc.get_threshold()
enabled = gc.isenabled()
freeze_count = gc.get_freeze_count() if hasattr(gc, "get_freeze_count") else 0
stats = gc.get_stats()
total_collections = sum(s.get("collections", 0) for s in stats)

handle = ddup.SampleHandle()
# Use count field to carry explicit gc.collect() tally for this interval.
# A zero walltime with count > 0 is the established pattern for pure-count
# samples (same as lock release-time samples with zero duration).
handle.push_walltime(0, explicit)
handle.push_frame("gc.config", "gc", 0, 0)
handle.push_monotonic_ns(time.monotonic_ns())
handle.flush_sample()

LOG.debug(
"GCCollector snapshot: enabled=%s thresholds=%s freeze=%d total_collections=%d explicit_collect=%d",
enabled,
thresholds,
freeze_count,
total_collections,
explicit,
)
Comment on lines +502 to +508
enabled = DDConfig.v(
bool,
"enabled",
default=True,
help_type="Boolean",
help="Whether to enable the GC collector (pause durations, collection counts, freeze status).",
)
Comment on lines +32 to +38
def test_gc_callbacks_registered() -> None:
col = GCCollector()
assert col._on_gc not in gc.callbacks
col.start()
assert col._on_gc in gc.callbacks
col.stop()
assert col._on_gc not in gc.callbacks
Comment on lines +41 to +47
def test_gc_collect_not_patched_after_stop() -> None:
orig = gc.collect
col = GCCollector()
col.start()
assert gc.collect is not orig
col.stop()
assert gc.collect is orig
Comment thread tests/profiling/collector/test_gc.py Outdated
Comment on lines +50 to +58
def test_explicit_count_increments() -> None:
col = GCCollector()
col.start()
try:
assert col._explicit_count == 0
gc.collect()
assert col._explicit_count == 1
gc.collect(0)
assert col._explicit_count == 2
Comment thread tests/profiling/collector/test_gc.py Outdated
Comment on lines +63 to +74
def test_explicit_count_resets_on_snapshot() -> None:
col = GCCollector()
col.start()
try:
gc.collect()
gc.collect()
assert col._explicit_count == 2
with mock.patch.object(_gc_module, "ddup") as mock_ddup:
mock_handle = mock.MagicMock()
mock_ddup.SampleHandle.return_value = mock_handle
col.snapshot()
assert col._explicit_count == 0
Comment thread tests/profiling/collector/test_gc.py Outdated
Comment on lines +161 to +175
def test_snapshot_emits_config_sample() -> None:
col = GCCollector()
col.start()
try:
gc.collect()
gc.collect()

with mock.patch.object(_gc_module, "ddup") as mock_ddup:
mock_handle = mock.MagicMock()
mock_ddup.SampleHandle.return_value = mock_handle
col.snapshot()

mock_handle.push_walltime.assert_called_once_with(0, 2)
mock_handle.push_frame.assert_called_once_with("gc.config", "gc", 0, 0)
mock_handle.flush_sample.assert_called_once()
@vlad-scherbich vlad-scherbich requested a review from Copilot June 10, 2026 18:25
@vlad-scherbich

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fbbc19f5cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gc.callbacks.remove(self._on_gc)
except ValueError:
pass
gc.collect = self._orig_collect

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard gc.collect restoration against later patches

When another library or test replaces gc.collect while the profiler is running, stopping this collector unconditionally restores the function captured at profiler startup and silently discards that later patch. Because this collector is enabled by default and mutates a process-wide stdlib function, this can break code that wraps gc.collect after profiling starts; only restore when gc.collect is still this collector's wrapper, or otherwise chain/coordinate the wrapper safely.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment on lines +54 to +57
def _patched_collect(self, generation: int = 2) -> int:
with self._count_lock:
self._explicit_count += 1
return self._orig_collect(generation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Profiling Continous Profling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants