Skip to content

Track cleanup ownership for Fragment::add_columns after outer commit failures #7231

@yyzhao2025

Description

@yyzhao2025

Summary

Dataset::add_columns now tracks newly written, uncommitted fragments and cleans their files if the final merge commit fails.

However, there is still a fragment-level cleanup ownership gap. FileFragment::add_columns calls schema_evolution::add_columns_to_fragments, but it currently discards the returned cleanup metadata (fragments_to_cleanup) and only returns (Fragment, Schema).

As a result, fragment-level callers can successfully write new column files, then fail later during an outer Operation::Merge commit, without having enough ownership information to safely clean up the uncommitted files written by the fragment-level add-columns operation.

Context

The current PR fixes cleanup for dataset-level add_columns failure paths.

This issue tracks the broader follow-up for fragment-level callers such as LanceFragment.merge_columns, where the add-columns work may succeed first and the outer commit may fail later.

Reproducer

A minimal reproducer is:

  1. Create a dataset with multiple fragments.
  2. Call fragment.merge_columns(...) (or FileFragment::add_columns(...)) on one fragment.
  3. Observe that new column data files are written successfully.
  4. Advance the dataset version through another operation.
  5. Attempt to commit the fragment-level merge with a stale read_version.
  6. The outer merge commit fails, but the files written by the fragment-level add-columns operation remain in the dataset directory.

Expected Behavior

If a fragment-level add-columns operation writes new files successfully but the later outer merge commit fails, callers should have a safe way to clean up only the files newly written by that failed operation.

Cleanup must not delete:

  • pre-existing data files already referenced by the original fragment
  • external blob source files
  • files belonging to unrelated committed versions

Why This Is Separate From The Current PR

The current PR is intentionally scoped to dataset-level add_columns cleanup.

Fixing this fragment-level case likely requires exposing or preserving cleanup ownership information across the Fragment::add_columns boundary, which is a broader API / ownership follow-up than the current dataset-level cleanup fix.

Possible Directions

Some possible approaches:

  • Add a cleanup-aware fragment-level API that returns both the new fragment result and cleanup metadata for newly written, uncommitted files.
  • Introduce an internal guard / token object that can be used by the outer caller to clean up if the later merge commit fails.
  • Preserve the existing API for compatibility and add a new lower-level API for callers that manage their own outer commit lifecycle.

Acceptance Criteria

  • Fragment-level callers can safely clean up newly written add-columns files after an outer commit failure.
  • Cleanup only removes files written by the failed fragment-level operation.
  • External blob source files are preserved.
  • Existing public APIs remain compatible, or any API expansion has a clear migration path.
  • Add a regression test covering:
    • fragment-level add-columns succeeds,
    • a later outer merge commit fails,
    • uncommitted files written by the fragment-level operation are cleaned up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions