Skip to content

fix(dedupjoin): restore parallel probe for REPLACE INTO OldColCapture#24179

Merged
mergify[bot] merged 2 commits intomatrixorigin:mainfrom
ck89119:fix-replace-dedup-capture-parallel
Apr 24, 2026
Merged

fix(dedupjoin): restore parallel probe for REPLACE INTO OldColCapture#24179
mergify[bot] merged 2 commits intomatrixorigin:mainfrom
ck89119:fix-replace-dedup-capture-parallel

Conversation

@ck89119
Copy link
Copy Markdown
Contributor

@ck89119 ck89119 commented Apr 23, 2026

What type of PR is this?

  • BUG
  • Improvement

Which issue(s) this PR fixes:

issue #23946

What this PR does / why we need it:

Background

PR #24153 introduced an OldColCapture mechanism in DEDUP JOIN to merge REPLACE INTO's two main-table scans into one. Each worker recorded probe-side old-column values into per-worker capturedVecs keyed by build bucket, intended to be emitted alongside the build row at finalize.

That introduced a latent bug: parallel probe workers kept capturedVecs and captured as container-local state, but the finalize() multi-worker protocol only merged the matched bitmap through ap.Channel. Non-merger workers' captured values were silently dropped, so rows matched by those workers would emit NULL (or stale) in the placeholder slot.

Commit 7cc544d2d (fix(replace): disable parallel probe for DEDUP JOIN with OldColCapture) worked around this by forcing Mcpu = 1 on every probe scope whenever OldColCapture was active (pkg/sql/compile/compile.go). Shuffle + capture additionally panics NYI. The REPLACE INTO merged-scan path has thus been running without any probe-side parallelism since.

This PR

Restore parallelism by upgrading the finalize merge protocol.

  1. New WorkerJoinMsg struct in pkg/sql/colexec/dedupjoin/types.go carries {matched, captured, capturedVecs}. DedupJoin.Channel becomes chan *WorkerJoinMsg.
  2. Non-merger workers relinquish ownership of their capture state to the merger when sending; the merger:
    • Or's in the matched bitmap (existing behavior).
    • Walks the worker's captured bitmap and, for each bucket not already present in its own captured, copies per-column values from the worker's capturedVecs into its own. First-wins semantics across workers — any one captured value for a bucket is semantically equivalent since HashOnUnique gives a 1:1 bucket↔build-row mapping.
    • Frees the worker's capturedVecs after merging (ownership was transferred).
  3. Remove the Mcpu = 1 forcing in compile.go so broadcast DedupJoin with OldColCapture runs parallel again. The shuffle + capture NYI panic stays — cross-CN pipeline channel semantics for capturedVecs are out of scope here and will be addressed separately.
  4. The defensive capture-field copy in dupOperator (also from 7cc544d2d) is retained as it's a prerequisite for parallel clones.

Tests

Added unit tests in pkg/sql/colexec/dedupjoin/join_test.go:

  • TestMergeCaptured_DisjointBuckets — parallel workers capture different buckets; after merge, merger owns the union.
  • TestMergeCaptured_FirstWinsOnConflict — when both workers captured the same bucket, merger keeps its own value.
  • TestMergeCaptured_EmptyWorkerMsg — worker with empty capture doesn't corrupt merger state.
  • TestWorkerJoinMsg_ChannelRoundTrip — full channel send/receive + merge + free cycle without leaks.
  • TestReceiveWorkerMsg_ContextCancel / _ChannelClose — receive helper respects context cancellation and closed channel.

Existing end-to-end TestDedupJoinCapture{,PartialMatch,Reset} unchanged, all pass. go test -race clean. make static-check clean.

Benchmark

YCSB workload-replace (100K ops, updateproportion=1, zipfian, threads=16, single CN, one iteration):

metric baseline (main) this PR delta
RUN Throughput (ops/sec) 1384.64 1561.67 +12.8%
UPDATE avg latency (us) 11255.67 9978.82 -11.3%
UPDATE p95 (us) 22527 18223 -19.1%
UPDATE p99 (us) 53471 40895 -23.5%
UPDATE max (us) 593407 260095 -56.2%

Tail latency improvements (p99 -23%, max -56%) match the expected effect of restoring parallel probe: reduced queueing at the DEDUP JOIN stage under the high-conflict zipfian workload.

Follow-ups (separate issues / PRs)

Special notes for your reviewer:

The cross-worker first-wins semantic for probe-side same-key duplicates is equivalent to the existing intra-worker first-wins: the captured value is a property of the build-side bucket (main-table old row), not of which probe row matched first. If reviewers want to eliminate the non-determinism entirely, we'd need to serialize capture writes, which defeats the purpose of parallelism.

The REPLACE INTO merged main-table scan path (introduced in matrixorigin#24153) broke
parallel DEDUP JOIN probe: each worker kept capturedVecs / captured in
container-local state, but the finalize() channel protocol only merged
the matched bitmap. Non-merger workers' captures were silently dropped,
which matrixorigin#24044 (actually 7cc544d) worked around by forcing Mcpu=1 on
probe scopes whenever OldColCapture was active. That workaround wiped
out the parallelism gains matrixorigin#24153 was meant to deliver.

Fix by extending the finalize merge protocol:

- Replace chan *bitmap.Bitmap with chan *WorkerJoinMsg carrying
  {matched, captured, capturedVecs}. Non-merger workers transfer
  capture ownership to the merger; the merger folds them in via
  mergeCaptured() using first-wins semantics across workers.
- Remove the Mcpu=1 forcing in compile.go so broadcast DedupJoin with
  OldColCapture runs parallel again. Shuffle + capture stays NYI
  (separate follow-up once cross-CN channel semantics are designed).

Tests: adds targeted unit tests for mergeCaptured (disjoint buckets,
first-wins conflict, empty worker msg), channel round-trip with
ownership transfer, and context-cancel / channel-close behavior.
Existing end-to-end capture tests still pass.

Benchmark (YCSB workload-replace, 100K ops, threads=16, zipfian,
single CN, one iteration):

| metric              | main    | fix     | delta    |
|---------------------|---------|---------|----------|
| Throughput (ops/s)  | 1384.64 | 1561.67 |  +12.8%  |
| UPDATE p95 (us)     |   22527 |   18223 |  -19.1%  |
| UPDATE p99 (us)     |   53471 |   40895 |  -23.5%  |

Related: matrixorigin#23946

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mergify mergify Bot added the queued label Apr 24, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 24, 2026

Merge Queue Status

  • Entered queue2026-04-24 03:32 UTC · Rule: main
  • Checks passed · in-place
  • Merged2026-04-24 04:29 UTC · at c74cbf44b078f9be756bbff5a0bef5330cc9f551 · squash

This pull request spent 56 minutes 21 seconds in the queue, including 56 minutes 10 seconds running CI.

Required conditions to merge
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]
  • #review-threads-unresolved = 0 [🛡 GitHub branch protection]
  • branch-protection-review-decision = APPROVED [🛡 GitHub branch protection]
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-neutral = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
    • check-skipped = Matrixone Standlone CI / Multi-CN e2e BVT Test on Linux/x64(LAUNCH, PROXY)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH, PESSIMISTIC)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / SCA Test on Ubuntu/x86
    • check-neutral = Matrixone CI / SCA Test on Ubuntu/x86
    • check-skipped = Matrixone CI / SCA Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone CI / UT Test on Ubuntu/x86
    • check-neutral = Matrixone CI / UT Test on Ubuntu/x86
    • check-skipped = Matrixone CI / UT Test on Ubuntu/x86
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-neutral = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
    • check-skipped = Matrixone Compose CI / multi cn e2e bvt test docker compose(Optimistic/PUSH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-neutral = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
    • check-skipped = Matrixone Standlone CI / e2e BVT Test on Linux/x64(LAUNCH,Optimistic)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-neutral = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
    • check-skipped = Matrixone Upgrade CI / Compatibility Test With Target on Linux/x64(LAUNCH)
  • any of [🛡 GitHub branch protection]:
    • check-success = Matrixone Utils CI / Coverage
    • check-neutral = Matrixone Utils CI / Coverage
    • check-skipped = Matrixone Utils CI / Coverage

@mergify mergify Bot merged commit d1f2d8b into matrixorigin:main Apr 24, 2026
24 of 25 checks passed
@mergify mergify Bot removed the queued label Apr 24, 2026
@ck89119 ck89119 deleted the fix-replace-dedup-capture-parallel branch April 24, 2026 06:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Something isn't working kind/enhancement size/M Denotes a PR that changes [100,499] lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants