Skip to content

fix: preserve namespace read storage options#5159

Open
LuciferYang wants to merge 3 commits into
lance-format:mainfrom
LuciferYang:fix/issue-70-namespace-read-storage-options
Open

fix: preserve namespace read storage options#5159
LuciferYang wants to merge 3 commits into
lance-format:mainfrom
LuciferYang:fix/issue-70-namespace-read-storage-options

Conversation

@LuciferYang

@LuciferYang LuciferYang commented Jun 4, 2026

Copy link
Copy Markdown

Summary

  • fix namespace-backed reads to request vended storage credentials from describe_table()
  • merge namespace storage options with user-provided storage_options, with namespace values taking precedence
  • reopen the resolved table URI on Ray workers with the merged storage options
  • preserve namespace table id, integer table version, and namespace-managed versioning when reconstructing worker datasets
  • add regression tests for namespace reads, direct MinIO-style URI reads, and worker reconstruction
  • update namespace docs/examples to use namespace_impl / namespace_properties and call out the MinIO S3 API endpoint

Semantics

For namespace reads, the driver resolves the table once through the namespace before creating Ray read tasks. The resolved URI and merged storage options are then captured in each read task so workers can open the same table location without losing object-store credentials.

If a namespace returns managed_versioning=True, workers pass namespace_client_managed_versioning=True when reopening the dataset. Integer dataset_options["version"] values are included in the namespace describe request; non-integer versions are still passed to Lance when opening the dataset.

The dependency lower bound is set to lance-namespace>=0.7.6, which is the locked version that already includes vend_credentials and managed_versioning support.

Validation

  • python -m pytest tests/test_datasource_namespace_options.py tests/test_basic_read_write.py::TestNamespaceReadWrite -q
  • python -m ruff check lance_ray/datasource.py tests/test_datasource_namespace_options.py
  • python -m ruff format --check lance_ray/datasource.py tests/test_datasource_namespace_options.py
  • local MinIO smoke test with namespace write/read using endpoint=http://localhost:9000

Fixes #70

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@LuciferYang LuciferYang changed the title Fix namespace read storage options fix: preserve namespace read storage options Jun 4, 2026
@github-actions github-actions Bot added the bug Something isn't working label Jun 4, 2026
@LuciferYang

Copy link
Copy Markdown
Author

Verified the #70 path with a local MinIO smoke test: namespace-backed write/read using the S3 API endpoint http://localhost:9000 and Lance storage_options for credentials. The read path returned the expected rows through Ray workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Could not read the data in MinIO by lance-ray

1 participant