Skip to content

Return 5xx instead of 404 when named blob metadata exists but storage replicas return NOT_FOUND#3234

Draft
abha-mutalik wants to merge 2 commits intolinkedin:masterfrom
abha-mutalik:named-blob-503-on-storage-missing
Draft

Return 5xx instead of 404 when named blob metadata exists but storage replicas return NOT_FOUND#3234
abha-mutalik wants to merge 2 commits intolinkedin:masterfrom
abha-mutalik:named-blob-503-on-storage-missing

Conversation

@abha-mutalik
Copy link
Copy Markdown
Contributor

When a named blob's metadata is resolved successfully via idConverter but the storage layer subsequently returns BlobNotFound on every replica, NonBlockingRouter now translates the resolved error from RouterErrorCode.BlobDoesNotExist to RouterErrorCode.AmbryUnavailable. This surfaces as HTTP 503 (retryable) rather than HTTP 404 (authoritative), since the metadata says the blob exists and a missing storage response is therefore transient (replication lag, in-flight delete race, or replica outage).

  • Wrap callback in getBlob(RestRequest, ...) to apply translation only after idConverter.convert() succeeded.
  • Add namedBlobMetadataExistsButStorageNotFoundCount counter for visibility into how often this inconsistency occurs.
  • Fix pre-existing operations-counter underflow in the same method's idConverter-failure branch (no matching increment).
  • Add parameterized tests for both the translation and the metadata-not-found regression case.

Summary

Testing Done

…ts but storage replicas return NOT_FOUND

When a named blob's metadata is resolved successfully via idConverter
but the storage layer subsequently returns BlobNotFound on every
replica, NonBlockingRouter now translates the resolved error from
RouterErrorCode.BlobDoesNotExist to RouterErrorCode.AmbryUnavailable.
This surfaces as HTTP 503 (retryable) rather than HTTP 404
(authoritative), since the metadata says the blob exists and a
missing storage response is therefore transient (replication lag,
in-flight delete race, or replica outage).

- Wrap callback in getBlob(RestRequest, ...) to apply translation
  only after idConverter.convert() succeeded.
- Add namedBlobMetadataExistsButStorageNotFoundCount counter for
  visibility into how often this inconsistency occurs.
- Fix pre-existing operations-counter underflow in the same
  method's idConverter-failure branch (no matching increment).
- Add parameterized tests for both the translation and the
  metadata-not-found regression case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 12.50000% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.25%. Comparing base (52ba813) to head (739f9b5).
⚠️ Report is 376 commits behind head on master.

Files with missing lines Patch % Lines
...ava/com/github/ambry/router/NonBlockingRouter.java 0.00% 14 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3234       +/-   ##
=============================================
- Coverage     64.24%   51.25%   -12.99%     
+ Complexity    10398     8666     -1732     
=============================================
  Files           840      931       +91     
  Lines         71755    79406     +7651     
  Branches       8611     9500      +889     
=============================================
- Hits          46099    40701     -5398     
- Misses        23004    35330    +12326     
- Partials       2652     3375      +723     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…tFoundCache test

- Document the named-blob translation contract on the Router.getBlob(RestRequest, ...) interface so downstream Router implementations are aware that BlobDoesNotExist may be translated to AmbryUnavailable after a successful IdConverter resolution.
- Soften the translated RouterException message: drop "treating as transient" (which can read as false reassurance in logs) and replace with a factual "Named blob metadata exists but storage returned BlobNotFound for the resolved blob ID."
- Add a parameterized test covering the notFoundCache short-circuit path. getBlobHelper completes that path through completeOperation(...) rather than BlobOperationCallbackWrapper, and the wrapped callback must still translate BlobDoesNotExist to AmbryUnavailable when metadata exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants