fix(csharp): tag download_slot_acquired with wait_duration_ms#495
Merged
Conversation
…nt (failing for #483) The cloudfetch.download_slot_acquired event currently carries only chunk_index and is_sequential_mode. Under CloudFetchStressTests it fires 252 times with no timing information, so an operator looking at the trace cannot tell whether each slot was acquired immediately or whether the call blocked waiting for another download to finish. This test fails RED today (24 events emitted, none has wait_duration_ms) and will pass once the slot-wait is wrapped in a Stopwatch and the elapsed ms is attached to the event (issue #483).
…ration_ms Previously the event fired with no timing information, so slot contention was invisible to operators. Under CloudFetchStressTests the event fired 252 times with only chunk_index and is_sequential_mode, so operators had no way to distinguish "acquired instantly" from "queued behind 3 other downloads." Now wrap the SemaphoreSlim.WaitAsync in a Stopwatch and attach wait_duration_ms (long, milliseconds) to the event so dashboards can surface "downloads are slow because they're queued." Also include queue_depth (count of items still pending in the download queue at the moment of acquisition) — it's free to compute alongside the stopwatch snapshot and gives the event a second contention signal. Verified end-to-end against a 24-file CloudFetch download (SELECT * FROM main.tpcds_sf100_delta.store_sales LIMIT 1000000): the first chunks acquire slots in 0ms while later chunks wait 50-600ms behind the 3-parallel-download cap, matching expected backpressure. Closes #483
msrathore-db
approved these changes
Jun 1, 2026
msrathore-db
left a comment
Collaborator
There was a problem hiding this comment.
Verified live: test passes; wait_duration_ms and queue_depth are emitted on the download_slot_acquired event. Clean change. LGTM. (Tiny nit, non-blocking: queue_depth is snapshotted just before WaitAsync rather than at acquisition — cosmetic.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cloudfetch.download_slot_acquiredpreviously fired with onlychunk_indexandis_sequential_mode, so slot contention was invisible in traces. UnderCloudFetchStressTeststhe event fired 252 times with no way to distinguish "acquired instantly" from "queued behind another download."SemaphoreSlim.WaitAsyncin aStopwatchand attacheswait_duration_ms(long, milliseconds) plusqueue_depth(count of pending downloads at acquisition) to the event.Why both tags?
The issue calls for
wait_duration_msas the primary signal and listsqueue_depthas an optional follow-up. I included both because_downloadQueue.Countwas already used in the surrounding code (seecloudfetch.download_loop_startat line 326) — capturing it under the stopwatch snapshot is a single extra field load with no new locking, so leaving it out would be more work to justify than to keep.Test (RED -> GREEN)
CloudFetchE2ETest.DownloadSlotAcquired_EmitsWaitDurationMs_Issue483runs a 24-file CloudFetch download (SELECT * FROM main.tpcds_sf100_delta.store_sales LIMIT 1000000withMaxBytesPerFile=10MB) and asserts that every capturedcloudfetch.download_slot_acquiredevent carries a non-negative integerwait_duration_mstag.chunk_index/is_sequential_mode, nowait_duration_ms._maxParallelDownloads=3cap):CloudFetchE2ETestsuite (21 tests) passes after the fix.Files changed
csharp/src/Reader/CloudFetch/CloudFetchDownloader.cs(+23/-2)csharp/test/E2E/CloudFetchE2ETest.cs(+139)This pull request and its description were written by Isaac.