Skip to content

fix(tsdb): Do not try to ship TSDBs that cannot be opened upon start#21769

Merged
chaudum merged 1 commit intomainfrom
chaudum/update-9297
May 8, 2026
Merged

fix(tsdb): Do not try to ship TSDBs that cannot be opened upon start#21769
chaudum merged 1 commit intomainfrom
chaudum/update-9297

Conversation

@chaudum
Copy link
Copy Markdown
Contributor

@chaudum chaudum commented May 6, 2026

Summary

On TSDB shipper start the manager tries to load all local TSDB files and hand them to the shipper, which eventually uploads them to object storage.

This change fixes a bug that causes a nil pointer dereference when a TSDB file fails to open but is nevertheless added to the shipper.

What this PR does / why we need it:
Closes #9297


Note

Medium Risk
Changes TSDB shipper startup behavior to continue on per-file failures and return an aggregated error, which could alter failure visibility and startup success conditions.

Overview
On tsdbManager.Start, failures to open local TSDB files no longer cause partially-initialized indices to be handed to the shipper; broken TSDBs are skipped and a clearer warning is logged.

Startup now accumulates load/add errors via multierror and continues scanning remaining TSDBs, returning the aggregated error at the end (and reporting it in the final "loaded leftover local indices" log).

Reviewed by Cursor Bugbot for commit 59f718c. Bugbot is set up for automated code reviews on this repo. Configure here.

On TSDB shipper start the manager tries to load all local TSDB files and
hand them to the shipper, which eventually uploads them to object storage.

This change fixes a bug that causes a nil pointer dereference when a
TSDB file fails to open but is nevertheless added to the shipper.
@chaudum chaudum requested a review from a team as a code owner May 6, 2026 06:56
multiErr.Add(err)
loadingErrors++
return err
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a change of behavior, should we treat both loading and adding to index failures are errors and fail the start-up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not it does not return on first error, but continues the operation with the remaining TSDBs. However, the error is still returned at the end.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a need to load more TSDBs if we are going to fail the service anyway?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least we should try IMO

Copy link
Copy Markdown
Contributor

@ashwanthgoli ashwanthgoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chaudum chaudum merged commit d4ce29a into main May 8, 2026
94 checks passed
@chaudum chaudum deleted the chaudum/update-9297 branch May 8, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants