fix: preserve namespace read storage options#5159
Open
LuciferYang wants to merge 3 commits into
Open
Conversation
Contributor
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Author
|
Verified the #70 path with a local MinIO smoke test: namespace-backed write/read using the S3 API endpoint |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
describe_table()storage_options, with namespace values taking precedencenamespace_impl/namespace_propertiesand call out the MinIO S3 API endpointSemantics
For namespace reads, the driver resolves the table once through the namespace before creating Ray read tasks. The resolved URI and merged storage options are then captured in each read task so workers can open the same table location without losing object-store credentials.
If a namespace returns
managed_versioning=True, workers passnamespace_client_managed_versioning=Truewhen reopening the dataset. Integerdataset_options["version"]values are included in the namespace describe request; non-integer versions are still passed to Lance when opening the dataset.The dependency lower bound is set to
lance-namespace>=0.7.6, which is the locked version that already includesvend_credentialsandmanaged_versioningsupport.Validation
python -m pytest tests/test_datasource_namespace_options.py tests/test_basic_read_write.py::TestNamespaceReadWrite -qpython -m ruff check lance_ray/datasource.py tests/test_datasource_namespace_options.pypython -m ruff format --check lance_ray/datasource.py tests/test_datasource_namespace_options.pyendpoint=http://localhost:9000Fixes #70