Skip to content

Add S3 blob storage with cashier billing to ic-gateway#193

Open
shilingwang wants to merge 25 commits intomainfrom
shiling/blob-storage
Open

Add S3 blob storage with cashier billing to ic-gateway#193
shilingwang wants to merge 25 commits intomainfrom
shiling/blob-storage

Conversation

@shilingwang
Copy link
Copy Markdown
Contributor

@shilingwang shilingwang commented Apr 17, 2026

#NODE-1941

Summary

  • Adds a full blob storage API to ic-gateway, enabling upload/download of content-addressed blobs backed by a single AWS S3 bucket with per-owner billing through the cashier canister.
  • Introduces new /v1/ HTTP endpoints for blob metadata, chunk operations, and owner data management, gated behind --s3-endpoint and --cashier-canister-id CLI flags.
  • Integrates billing (budget checks, usage reporting) via a CashierConnector that caches budgets locally and flushes usage counters periodically, wired into ic-gateway's existing TaskManager and HealthManager.

New modules

  • src/s3/ — S3 client abstraction (BucketLike trait, AWSBucket impl, RamFakeBucket for dev), config
  • src/cashier/ — CashierClient (4 canister calls: whoami, pricelist, budget, usage reporting), CashierConnector (local billing cache + periodic flush)
  • src/storage/ — Shared types (blob metadata, hash tree, chunk constants), S3 key paths, IC egress certificate auth
  • src/routing/storage/ — Axum handlers + router for all /v1/ endpoints

HTTP endpoints (under /v1/)

  • HEAD /v1/blob — Blob metadata headers (size, content type)
  • GET /v1/blob — Download blob with Range header support
  • GET /v1/blob-tree — Raw blob metadata JSON
  • PUT /v1/blob-tree — Upload blob metadata (with IC egress cert auth)
  • GET /v1/chunk — Download a single chunk
  • PUT /v1/chunk — Upload a single chunk (SHA-256 verified)
  • DELETE /v1/owner — Delete all data for an owner (host-gated)

Design decisions

  • Single S3 bucket: One bucket configured via CLI, no multi-bucket routing. Simpler than the multi-instance model in object-storage.
  • Billing gated: Storage routes are only mounted when both --s3-endpoint and --cashier-canister-id are provided. Without either, ic-gateway serves only normal IC traffic.
  • Budget caching: Per-owner budgets cached for 30s to avoid hitting the cashier canister on every request. Usage counters flushed every 10s.
  • IC egress auth: PUT /blob-tree verifies an OwnerEgressSignature certificate from the request body. Bypassable with --fake-ingress-auth for local dev.

@shilingwang shilingwang marked this pull request as ready for review April 17, 2026 14:53
@shilingwang shilingwang requested a review from a team as a code owner April 17, 2026 14:53
Comment thread src/routing/mod.rs Outdated
Comment thread src/routing/mod.rs Outdated
Comment thread src/cli.rs Outdated
Comment thread src/routing/storage/mod.rs Outdated
Comment thread src/routing/storage/mod.rs Outdated
Copy link
Copy Markdown

@frankdavid frankdavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any plans for testing?

Comment thread src/routing/storage/auth.rs
Comment thread src/routing/storage/auth.rs Outdated
Comment thread src/routing/storage/bucket.rs Outdated
Comment thread src/routing/storage/bucket.rs Outdated
Comment thread src/routing/storage/handler.rs Outdated
Comment thread src/routing/storage/cashier_connector.rs Outdated
Comment thread src/routing/storage/cashier_connector.rs Outdated
Comment thread src/routing/storage/handler.rs Outdated
Comment thread src/routing/storage/handler.rs Outdated
Comment thread src/routing/storage/handler.rs Outdated
@shilingwang shilingwang requested a review from frankdavid April 28, 2026 08:26
Comment thread src/routing/storage/auth.rs Outdated
) -> impl Stream<Item = Result<bytes::Bytes, std::io::Error>> + Send + 'static {
stream::iter(parts)
.map(move |part| fetch_chunk(state.clone(), owner, part))
.buffered(CHUNK_DOWNLOAD_PARALLELISM)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it again and this may be exploited by malicious clients. A malicious client may request a file and just load a single byte from it (e.g. by terminating the connection early) but we'll load up to 8MB every time nevertheless. Also, if the clients are slow (or act slow because the client is intentionally dropping packets / delaying ACKs), memory usage can blow up (e.g. 10000 connections will require 80GB RAM).
It'd be nice to do some benchmarking, e.g. how fast the connection is to AWS - if as fast as the connection between the client and the gateway, buffering may not even be necessary.

.blob_tree
.root_hash()
.ok_or_else(|| StorageError::Forbidden("blob tree has no root hash".into()))?
.to_string();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to stringify it here? Can't we check it directly? I guess because OwnerEgressSignature has it as a string?

/// Errors from S3 storage operations.
#[derive(Debug)]
pub enum StorageError {
AwsS3(String),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have only AWS specific errors?

Client::from_conf(s3_config)
}

/// Ensure the bucket exists, creating it if necessary. Probe intelligent tiering.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really probe tiering?

HeadBucketError::NotFound(_) => false,
other => return Err(StorageError::AwsS3(other.to_string())),
},
Err(e) => return Err(StorageError::AwsS3(format!("{}", DisplayErrorContext(e)))),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and everywhere: just DisplayErrorContext(e).into() might be nicer?

.body
.collect()
.await
.map(|b| Some(b.to_vec()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should change the trait to work with Bytes everywhere? This way we won't have to convert from Bytes that S3 client returns and Vec, and vice-versa. That would save us allocations and CPU cycles.

request: &GetBudgetRequestV1,
) -> Result<GetBudgetResult, Error> {
let encoded_args =
candid::encode_args((request,)).context("failed to encode budget_get_v1 args")?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can drop the budget_get_v1 here in the context and everywhere else the same way. It's anyway inferred where the error happens.

client: Arc<CashierClient>,
gateway_id: GatewayId,
pricelist: Pricelist,
budgets: RwLock<HashMap<Principal, CachedBudget>>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to use DashMap here instead. Or even better - Moka cache with a TTL.

CashierUnavailable(String),
}

impl std::fmt::Display for BillingError {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use thiserror instead of manual impl


impl fmt::Debug for CashierConnector {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("CashierConnector")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe just use normal write! with some formatting?

client: Arc<CashierClient>,
gateway_name: Option<String>,
) -> Result<Self, Error> {
let principal = client.principal()?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add error context here and below?

@shilingwang
Copy link
Copy Markdown
Contributor Author

I lost the access so I cannot easily push any change. I'll create a new PR. @blind-oracle @frankdavid

@shilingwang
Copy link
Copy Markdown
Contributor Author

Thank you for contributing! Unfortunately this repository does not accept external contributions yet.

We are working on enabling this by aligning our internal processes and our CI setup to handle external contributions. However this will take some time to set up so in the meantime we unfortunately have to close this Pull Request.

We hope you understand and will come back once we accept external PRs.

— The DFINITY Foundation

@r-birkner could you help me change the permission of this Repo?

let mut budgets = self.budgets.write().await;
let cached = budgets
.get_mut(owner)
.expect("cache entry exists after refresh");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid expect since it would panic

.is_none_or(|c| c.fetched_at.elapsed() >= BUDGET_TTL)
};

if needs_refresh {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure, but don't we have condition here with flush_usage? Like, we run it periodically and what if we haven't flushed yet the budget (e.g. we had some debits) - here we just overwrite it with a fresh copy from the cashier?

if divisor == 0 {
return 0;
}
// Promote to `i128` so `quantity (u64) * cost (i64)` cannot overflow
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't just error out when we get "absurd inputs"? Like negative cost etc.


fn int_to_i64(v: &Int) -> i64 {
// Int is arbitrary precision; clamp to i64 range for local budget math.
v.0.to_string().parse::<i64>().unwrap_or(i64::MAX)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stringifying BigInt for every conversion operation is a big overhead. Why don't we just operate on BigInts directly w/o conversion to i64? That would probably make life easier (e.g. no need to worry about overflows etc below)


type S = Arc<StorageState>;

const BODY_READ_TIMEOUT: Duration = Duration::from_secs(60);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make configurable or use some already present CLI option

/// download. `buffered(N)` preserves source order, so the response body stays
/// strictly sequential while we prefetch ahead. Bounds peak per-download
/// memory at `CHUNK_DOWNLOAD_PARALLELISM * 1 MiB` (~8 MiB).
const CHUNK_DOWNLOAD_PARALLELISM: usize = 8;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make configurable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants