Skip to content

Follow-up: remove interleave panic recovery after Arrow 58.1.0#21436

Merged
xudong963 merged 3 commits intoapache:mainfrom
xudong963:xudong963/panic-detect-fix
Apr 8, 2026
Merged

Follow-up: remove interleave panic recovery after Arrow 58.1.0#21436
xudong963 merged 3 commits intoapache:mainfrom
xudong963:xudong963/panic-detect-fix

Conversation

@xudong963
Copy link
Copy Markdown
Member

@xudong963 xudong963 commented Apr 7, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

Fix sort merge interleave overflow (#20922) added a temporary catch_unwind shim around Arrow's interleave call because the upstream implementation still panicked on offset overflow at the time.

Arrow 58.1.0 includes apache/arrow-rs#9549, which returns ArrowError::OffsetOverflowError directly instead of panicking. DataFusion main now depends on that release, so the panic recovery path is no longer needed and only broadens the set of failures we might accidentally treat as recoverable.

What changes are included in this PR?

  • Remove the temporary panic-catching wrapper from
    BatchBuilder::try_interleave_columns.
  • Keep the existing retry logic, but trigger it only from the returned
    OffsetOverflowError.
  • Replace the panic-specific unit tests with a direct error-shape assertion.

Are these changes tested?

Yes.

  • cargo test -p datafusion-physical-plan sorts::builder -- --nocapture
  • cargo test -p datafusion-physical-plan sorts:: -- --nocapture
  • ./dev/rust_lint.sh

Are there any user-facing changes?

No.

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Apr 7, 2026
@xudong963 xudong963 changed the title fix: remove interleave panic recovery after Arrow 58.1.0 Follow-up: remove interleave panic recovery after Arrow 58.1.0 Apr 7, 2026
@xudong963
Copy link
Copy Markdown
Member Author

cc @kosiew given that you have the context for this

@xudong963 xudong963 requested a review from kosiew April 7, 2026 09:01
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow up @xudong963

I agree it would be good to have @kosiew take a look at this one too

Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xudong963

Looks good overall. Nice cleanup and alignment with the newer Arrow behavior. I have a couple of small suggestions that could make the intent clearer and help future-proof the tests.

.unwrap_err();

assert!(is_offset_overflow(&error));
fn test_is_offset_overflow_matches_arrow_error() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this coverage up a layer and exercise BatchBuilder::build_record_batch, or even the sort-preserving merge drain path, using a real ArrowError::OffsetOverflowError?

Right now the tests only stub retry_interleave, so they might miss regressions if interleave starts surfacing the overflow differently again.

.map(|(_, batch)| batch.column(column_idx).as_ref())
.collect();
recover_offset_overflow_from_panic(|| interleave(&arrays, indices))
interleave(&arrays, indices).map_err(Into::into)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth adding a short comment here noting that this now relies on Arrow 58.1.0+ returning OffsetOverflowError directly.

That would make the cleanup easier to understand in this file, especially since the removed shim was guarding this exact call site.

@xudong963
Copy link
Copy Markdown
Member Author

Thanks @alamb @kosiew

@xudong963 xudong963 added this pull request to the merge queue Apr 8, 2026
Merged via the queue into apache:main with commit 603bfb4 Apr 8, 2026
35 checks passed
@xudong963 xudong963 deleted the xudong963/panic-detect-fix branch April 8, 2026 06:32
Dandandan pushed a commit to Dandandan/arrow-datafusion that referenced this pull request Apr 8, 2026
…e#21436)

## Which issue does this PR close?

- Closes #.

## Rationale for this change

`Fix sort merge interleave overflow` (apache#20922) added a temporary
`catch_unwind` shim around Arrow's `interleave` call because the
upstream implementation still panicked on offset overflow at the time.

Arrow 58.1.0 includes apache/arrow-rs#9549, which returns
`ArrowError::OffsetOverflowError` directly instead of panicking.
DataFusion main now depends on that release, so the panic recovery path
is no longer needed and only broadens the set of failures we might
accidentally treat as recoverable.

## What changes are included in this PR?

- Remove the temporary panic-catching wrapper from
  `BatchBuilder::try_interleave_columns`.
- Keep the existing retry logic, but trigger it only from the returned
  `OffsetOverflowError`.
- Replace the panic-specific unit tests with a direct error-shape
assertion.

## Are these changes tested?

Yes.

- `cargo test -p datafusion-physical-plan sorts::builder -- --nocapture`
- `cargo test -p datafusion-physical-plan sorts:: -- --nocapture`
- `./dev/rust_lint.sh`

## Are there any user-facing changes?

No.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants