tracing(csharp): introduce Connection/Statement lifetime activities for hierarchical traces#501
Draft
eric-wang-1990 wants to merge 2 commits into
Draft
tracing(csharp): introduce Connection/Statement lifetime activities for hierarchical traces#501eric-wang-1990 wants to merge 2 commits into
eric-wang-1990 wants to merge 2 commits into
Conversation
… for #477) Captures every Activity emitted on AdbcDrivers.* during a SELECT 1 workflow and asserts the trace-tree shape required by #477: - DatabricksConnection.Lifetime span exists and is a root - DatabricksStatement.Lifetime is a child of ConnectionLifetime, same TraceId - At least one per-call activity is a child of each Lifetime - A single TraceId covers the entire connection lifetime Before the fix the first assertion fails with "no DatabricksConnection.Lifetime span found" — the captured 24 activities split across 9 distinct TraceIds, confirming the #477 baseline (12,958 TraceIds / 37,950 spans in MemoryStress, 95% throwaway single-span roots). Co-authored-by: Isaac
…ities Mirrors the object hierarchy (Connection owns Statement owns calls) in the Activity tree so every per-call activity emitted on AdbcDrivers.Databricks during one connection shares a single TraceId — restoring trace-tree visualization on fleet dashboards. Implementation: - DatabricksConnection opens a long-lived "DatabricksConnection.Lifetime" Activity at the end of its constructor (after ValidateProperties succeeds). The sync call site DatabricksDatabase.Connect then runs OpenAsync().Wait() and ApplyServerSidePropertiesAsync().Wait() in the same async context, so those per-call activities chain naturally via Activity.Current — no extra plumbing required for them. - DatabricksStatement opens a "DatabricksStatement.Lifetime" Activity in its constructor, parented EXPLICITLY to the connection's lifetime context via the ActivityContext overload of StartActivity. The explicit parent guards against cross-async-context construction where Activity.Current on the thread executing the Statement constructor would not observe the connection-open context. - Both lifetime Activities live on a STATIC ActivitySource so they survive TracingConnection.Dispose, which disposes the per-instance source mid-Dispose (and would otherwise silently drop the Stop() call to listeners). - DatabricksConnection.Dispose stops its lifetime Activity AFTER base.Dispose has run DisposeClient + DELETE_SESSION telemetry, so those tail spans chain underneath the root. Idempotent stop via _lifetimeActivityStopped. - DatabricksStatement.Dispose stops its lifetime Activity AFTER base.Dispose has issued CLOSE_STATEMENT telemetry. Idempotent. Constraints: - No submodule changes. The per-call activities in HiveServer2 already use Activity.Current as their parent, so they pick up the lifetime parent automatically without code change. - Existing OperationName values are unchanged; only parent relationships and TraceIds change. Test passes against a live Thrift connection: every span in a SELECT 1 workflow now shares one TraceId, ConnectionLifetime is the only root span on AdbcDrivers.Databricks, and DisposeClient / Dispose / ExecuteStatement all correctly chain under their respective lifetime activities. Closes #477 Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Issue #477 shows every top-level driver call creates a fresh root
Activitywith a new
TraceId. The MemoryStress baseline measured 12,316 of 12,958TraceIds (95%) as single-span throwaway
ReadNextRecordBatchAsyncroots,breaking trace-tree visualization on fleet dashboards. The
ConcurrentDisposeTestsevidence confirms everyDatabricksCompositeReader.DisposeandHiveServer2Connection.DisposeClientspan has
ParentSpanId=0, so even within one connection Dispose cannot belinked back to the Execute that spawned the cleanup.
Design
Mirror the existing object hierarchy (Connection owns Statement owns calls)
in the Activity tree by introducing two long-lived "lifetime" Activities:
Existing per-call activities need no code change:
StartActivityalreadyuses
Activity.Currentas the parent, and the lifetime Activities are onthe stack throughout the synchronous Database.Connect flow and all
subsequent operations.
Implementation choices
Static ActivitySource for the lifetime spans. The per-instance
ActivitySourceheld byActivityTraceis disposed insideTracingConnection.DisposebeforeDatabricksConnection.Disposereturns;listeners stop receiving notifications once the source is disposed. A
process-wide
static readonlysource survives connection disposal and keepsnotifying listeners. (This was caught by the failing test — first attempt
used the per-instance source and the ConnectionLifetime
Stop()silentlydropped.)
Explicit parent context for Statement → Connection.
Activity.CurrentisAsyncLocal-scoped. IfCreateStatementruns on adifferent async continuation than where the connection was opened, naïve
reliance on
Activity.Currentwould not see the connection's Lifetime.DatabricksStatementtherefore passesconnection.LifetimeActivity?.Contextexplicitly to
StartActivity, making parenting robust regardless of asynccontext.
Stop order matters. ConnectionLifetime is stopped after
base.Disposeruns
DisposeClient+ emits DELETE_SESSION telemetry, so those tail spanschain underneath the lifetime root. Statement lifetime is stopped after
base.Disposeruns the CLOSE_STATEMENT telemetry. BothStop()calls areidempotent via a
_lifetimeActivityStoppedguard.Lifetime opens after
ValidateProperties. If the parser throws (e.g.ArgumentExceptionfrom a badadbc.databricks.fetch_heartbeat_intervalvalue), no half-open Activity is leaked — the constructor never reaches the
StartActivitycall.Tests
ConnectionStatementLifetimeTraceTests.ConnectionLifetime_ParentsStatementAndCalls_Issue477drives a SELECT 1 workflow through a Thrift connection, captures every
Activity emitted on
AdbcDrivers.*, and asserts:DatabricksConnection.Lifetimeexists and is a root spanDatabricksStatement.Lifetimeis a child of ConnectionLifetime,same TraceId
(proves the lifetime is on the Activity.Current stack during sync
Database.Connect)
(proves the explicit parent context works)
AdbcDrivers.DatabrickssourceBefore the fix all 5 assertions fail (no Lifetime spans exist at all,
and the captured 24 activities split across 9 distinct TraceIds — the
exact tracing(csharp): no driver-session root span — every top-level call creates a fresh TraceId #477 baseline).
After the fix every span shares one TraceId and the parent relationships
match the diagram above.
Regression sweep
All targeted regressions pass against a live
pecotestingThrift connection:DatabricksConnectionTestStatementTestsCloseOperationE2ETestTelemetryBaselineTestsConcurrentDisposeTestsTelemetry(broader sweep)No existing test required modification — the lifetime spans don't disturb
any assertion that depended on per-call activity names, status codes,
events, or tags. Tests that checked "is this span a root" continue to
hold because they only checked specific operation names; the new Lifetime
spans are net-additive.
Files touched
csharp/src/DatabricksConnection.cs(+45 net)csharp/src/DatabricksStatement.cs(+27 net)csharp/test/E2E/Telemetry/ConnectionStatementLifetimeTraceTests.cs(+255, new file)No submodule changes. No new public API surface beyond
internal LifetimeActivityon the connection for the statement constructor to consume.
Closes #477
This pull request and its description were written by Isaac.