Skip to content

fix(connectors): support default credential provider chain for iceberg sink#3045

Open
gomnitrix wants to merge 7 commits intoapache:masterfrom
gomnitrix:fix-2911
Open

fix(connectors): support default credential provider chain for iceberg sink#3045
gomnitrix wants to merge 7 commits intoapache:masterfrom
gomnitrix:fix-2911

Conversation

@gomnitrix
Copy link
Copy Markdown
Contributor

@gomnitrix gomnitrix commented Mar 29, 2026

Which issue does this PR close?

Closes #2911

Rationale

Production environments prefer the default credential provider chain over static keys, which the Iceberg Sink connector currently lacks support for.

What changed?

The Iceberg Sink connector previously failed to initialize if store_access_key_id and store_secret_access_key were missing from the config, blocking the use of standard identity-based auth. This fix makes these config fields optional, allowing the underlying OpenDAL seamlessly fall back to the default credential provider chain. An isolated integration test was also added to verify this fallback behavior using environment variables.

Local Execution

  • Passed
  • Pre-commit hooks ran

E2E Validation Results

The fallback mechanism has been verified through:

1. Automated Integration Test
Added a new test fixture: IcebergEnvAuthFixture and a corresponding integration test case: iceberg_sink_uses_default_credential_chain. It simulates the environment by omitting config credentials, injecting AWS_ environment variables, and validating whether the test data successfully flushes to MinIO using the fallback chain.

2. Manual Local Verification
I also successfully verified this end-to-end. Providing credentials exclusively via the environment triggered the fallback correctly and successfully routed Iggy messages to the catalog:

Click to view the local verification logs
# 1. The connector initializes and successfully falls back to default credentials:

2026-04-04T11:28:28.680638Z  INFO connector: connector_target="iggy_connector_iceberg_sink::sink" Opened Iceberg sink connector with ID: 1 for URL: http://rest:8181. No explicit credentials provided, falling back to default credential provider chain
2026-04-04T11:28:28.680661Z  INFO connector: connector_target="iggy_connector_iceberg_sink::sink" Configuring Iceberg catalog with the following config:
-region: us-east-1
-url: http://minio:9000
-store class: S3
-catalog type: REST

2026-04-04T11:28:28.692411Z  INFO connector: connector_target="iggy_connector_iceberg_sink::router::static_router" Static router found 1 tables on iceberg catalog from 1 tables declared
2026-04-04T11:28:28.692612Z  INFO iggy_connectors::sink: Sink container with name: Iceberg sink (iceberg), initialized successfully with ID: 1.

# 2. Sending a test message via CLI to the running server:

$ cargo run --bin iggy -- -u iggy -p password message send test_stream test_topic "{\"id\": 1, \"name\": \"hello_iggy\"}"
Sent messages to topic with ID: test_topic and stream with ID: test_stream

# 3. The connector immediately processes and routes it to Iceberg:

2026-04-04T11:29:54.643126Z  INFO connector: connector_target="iggy_connector_iceberg_sink::router::static_router" Routed 1 messages to iceberg table messages successfully

AI Usage

  1. Which tools? Gemini 3.1 Pro
  2. Scope of usage? Paired to implement the iceberg_sink_uses_default_credential_chain integration test.
  3. How did you verify the generated code works correctly? Verified the test logic and ensured it correctly polls Iceberg snapshots without explicit static access key config.
  4. Can you explain every line of the code if asked? yes

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.54%. Comparing base (051475a) to head (514350d).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3045      +/-   ##
============================================
+ Coverage     70.52%   70.54%   +0.01%     
  Complexity      943      943              
============================================
  Files          1115     1115              
  Lines         95388    95445      +57     
  Branches      72589    72646      +57     
============================================
+ Hits          67275    67330      +55     
+ Misses        25634    25633       -1     
- Partials       2479     2482       +3     
Components Coverage Δ
Rust Core 70.58% <100.00%> (+0.01%) ⬆️
Java SDK 62.30% <ø> (ø)
C# SDK 69.42% <ø> (+0.01%) ⬆️
Python SDK 81.43% <ø> (ø)
Node SDK 91.40% <ø> (ø)
Go SDK 38.97% <ø> (ø)
Files with missing lines Coverage Δ
core/connectors/sinks/iceberg_sink/src/lib.rs 100.00% <ø> (ø)
core/connectors/sinks/iceberg_sink/src/props.rs 97.29% <100.00%> (+9.06%) ⬆️
core/connectors/sinks/iceberg_sink/src/sink.rs 100.00% <ø> (ø)

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gomnitrix gomnitrix force-pushed the fix-2911 branch 2 times, most recently from 1ae3328 to b324786 Compare March 29, 2026 01:46
@hubcio
Copy link
Copy Markdown
Contributor

hubcio commented Mar 30, 2026

@EdgarModesto23 would you mind checking this?

@EdgarModesto23
Copy link
Copy Markdown
Contributor

@hubcio sure!

"s3.secret-access-key".to_string(),
config.store_secret_access_key.clone(),
);
if let Some(access_key_id) = &config.store_access_key_id {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_props_s3() handles each credential independently with separate if let Some(...) blocks. the partial-credential invariant (both-or-neither) is only enforced in sink.rs::open(), but init_props is pub - a future caller could bypass validation and get a silently half-configured props map. consider making init_props pub(crate) or adding both-or-neither validation here as defense-in-depth.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the both-or-neither validation to get_props_s3. Thanks for pointing out!

async fn setup() -> Result<Self, TestBinaryError> {
let inner = IcebergPreCreatedFixture::setup().await?;
// Set credentials before test server initialization.
unsafe {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe { std::env::set_var } without cleanup or safety justification. the tokio runtime is already running at this point, so concurrent env reads from other threads are UB. no Drop impl means these vars leak into subsequent tests.

the fix is straightforward: add AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY to the HashMap returned by connectors_runtime_envs() instead - the harness already passes that map to the child process via Command::envs() at connectors_runtime.rs:157, eliminating the unsafe entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a serious issue. I missed the concurrency risks while trying to figure out how the test environment handles variables. I'll spend more time digging into the implementation to avoid similar mistakes in the future. I've fixed it and thanks for the heads-up!

error!(
"Partially configured Iceberg credentials. You must provide both store_access_key_id and store_secret_access_key, or omit both."
);
return Err(Error::InvalidConfig);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error::InvalidConfig produces a bare "Invalid config" message. Error::InvalidConfigValue(String) exists (used by influxdb connectors) and carries a description - use it so the error is self-explanatory without needing to correlate with the error!() log above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to Error::InvalidConfigValue, I think it's now much cleaner!

enabled = true
version = 0
name = "Iceberg sink"
path = "../../target/release/libiggy_connector_iceberg_sink"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path points to target/release but ENV_SINK_PATH always overrides to debug at runtime. not broken, but misleading. this is the same pattern as the original config.toml so not introduced by this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the path to debug in this config to match the actual test environment and avoid confusion.

use crate::{IcebergSinkConfig, IcebergSinkStoreClass, IcebergSinkTypes};

#[test]
fn test_get_props_s3() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test covers both-None and both-Some but not the mixed case (one Some, one None). if validation moves into get_props_s3 per the above comment, this test should verify the error path too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the unit tests to cover the partial credential scenarios. Since adding the new cases made the test a bit long and bloated, I also split it up. Thanks!

harness: &TestHarness,
fixture: IcebergEnvAuthFixture,
) {
let client = harness.root_client().await.unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bare .unwrap() calls here and at lines 220, 221, 227, 232 - the existing tests in this file consistently use .expect("...") with context messages.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I replaced all unwrap() calls with expect() to provide better error context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support default credential provider chain for Iceberg sink storage authentication

3 participants