Skip to content

[DB-2137] Report disk usage metrics for index and log drives#5640

Open
timothycoleman wants to merge 1 commit into
masterfrom
timothycoleman/multi-disk-metrics
Open

[DB-2137] Report disk usage metrics for index and log drives#5640
timothycoleman wants to merge 1 commit into
masterfrom
timothycoleman/multi-disk-metrics

Conversation

@timothycoleman

Copy link
Copy Markdown
Contributor

The *-sys-disk metric (exposed on /metrics) previously reported total/used bytes only for the database drive. This PR extends it to also report the index and log drives when they live on different physical drives, so operators running a split-disk layout get free/used visibility for all three.

The sys-disk metric only reported total/used bytes for the database
drive. Extend it to also report the index and log drives when they
live on different physical drives.

CreateDiskMetric now takes a list of paths and a drive-info resolver.
It resolves each path to its drive and registers one total/used pair
per distinct drive, deduplicating by disk name so paths sharing a
mount (the common default layout) don't emit duplicate (kind, disk)
series. When everything is on one drive the output is unchanged.

The index-path expression is computed once in ClusterVNode and reused.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@linear-code

linear-code Bot commented Jun 11, 2026

Copy link
Copy Markdown

DB-2137

@timothycoleman timothycoleman marked this pull request as ready for review June 15, 2026 10:24
@timothycoleman timothycoleman requested a review from a team as a code owner June 15, 2026 10:24
Copilot AI review requested due to automatic review settings June 15, 2026 10:24
@qodo-code-review

qodo-code-review Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (2) 📘 Rule violations (0)

Grey Divider


Remediation recommended

1. Drive dedupe locks-in fallback 🐞 Bug ☼ Reliability
Description
SystemMetrics.CreateDiskMetric deduplicates drives using a one-time call to
getDriveInfo(path).DiskName during metric registration; if that call returns a fallback DiskName
(e.g., "Unknown" from DriveStats on exception), later paths can be skipped and their disk series
will never be registered. This can permanently under-report split-disk layouts when drive resolution
fails transiently at startup for some paths.
Code

src/KurrentDB.Core/Metrics/SystemMetrics.cs[R57-65]

+		var seenDisks = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
+		foreach (var path in paths) {
+			var diskName = getDriveInfo(path).DiskName;
+			if (!seenDisks.Add(diskName))
+				continue;

-		dims.Register(SystemTracker.DriveUsedBytes, GenMeasure(info => info.UsedBytes));
-		dims.Register(SystemTracker.DriveTotalBytes, GenMeasure(info => info.TotalBytes));
+			var debouncedDriveInfo = Functions.Debounce(() => getDriveInfo(path), timeout);
+			dims.Register(SystemTracker.DriveUsedBytes, GenMeasure(debouncedDriveInfo, info => info.UsedBytes));
+			dims.Register(SystemTracker.DriveTotalBytes, GenMeasure(debouncedDriveInfo, info => info.TotalBytes));
Evidence
The implementation deduplicates at registration using getDriveInfo(path).DiskName, but DriveStats
explicitly returns DiskName="Unknown" on exceptions; if multiple paths hit that fallback during
registration they will be collapsed and skipped, preventing later emission of those disks’ series.

src/KurrentDB.Core/Metrics/SystemMetrics.cs[52-66]
src/KurrentDB.SystemRuntime/Diagnostics/DriveStats.cs[13-33]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`CreateDiskMetric` resolves each path to `DiskName` once at registration to deduplicate, which can permanently skip registering series for some disks if the initial resolution returns a fallback name (e.g., `"Unknown"`). Because the set of registered measurement callbacks becomes fixed, later successful resolutions will never add the missing series.

### Issue Context
`DriveStats.GetDriveInfo` catches exceptions and returns `new DriveData("Unknown", 0, 0)`, so transient resolution failures can cause multiple distinct paths to collapse into the same `DiskName` during bootstrap.

### Fix Focus Areas
- src/KurrentDB.Core/Metrics/SystemMetrics.cs[52-66]
- src/KurrentDB.SystemRuntime/Diagnostics/DriveStats.cs[13-33]

### Suggested fix approach
Refactor disk metric creation so *drive resolution and dedup happen inside the observable callback* (per scrape), not at registration time. For example:
- Build a list of debounced `Func<DriveData>` for each configured path.
- In the observable callback, call each debounced function, then `DistinctBy`/`GroupBy` on `DiskName` to produce one `DriveData` per disk.
- Emit `used` and `total` measurements for each distinct disk that is enabled by config.
This preserves the “deduplicate same disk” behavior while ensuring transient startup failures don’t permanently suppress other disks.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Informational

2. Case-insensitive disk dedupe 🐞 Bug ≡ Correctness
Description
CreateDiskMetric uses StringComparer.OrdinalIgnoreCase when deduplicating DiskName values, which can
incorrectly merge distinct mount points on case-sensitive platforms and drop metrics for one disk.
This behavior is new in the multi-path implementation because it determines whether additional disk
series are registered.
Code

src/KurrentDB.Core/Metrics/SystemMetrics.cs[57]

+		var seenDisks = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
Evidence
The dedup key is a HashSet<string> built with OrdinalIgnoreCase, and DiskName comes from
drive/mount identifiers produced by DriveStats based on DriveInfo enumeration; treating those
identifiers as case-insensitive can merge distinct mounts on case-sensitive systems.

src/KurrentDB.Core/Metrics/SystemMetrics.cs[55-61]
src/KurrentDB.SystemRuntime/Diagnostics/DriveStats.cs[15-28]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Disk deduplication uses `StringComparer.OrdinalIgnoreCase` unconditionally. On Unix-like systems, mount point paths are case-sensitive, so different disks/mounts that differ only by case could be merged and one disk’s metrics would be dropped.

### Issue Context
`DiskName` is derived from `DriveInfo.Name` and is treated as the identity key for the `(kind, disk)` series.

### Fix Focus Areas
- src/KurrentDB.Core/Metrics/SystemMetrics.cs[55-61]
- src/KurrentDB.SystemRuntime/Diagnostics/DriveStats.cs[15-28]

### Suggested fix approach
Use a platform-appropriate comparer for `seenDisks`, e.g.:
- Windows: `StringComparer.OrdinalIgnoreCase`
- Non-Windows: `StringComparer.Ordinal`
This keeps Windows drive-letter behavior intact while avoiding accidental merges on case-sensitive systems.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-code-review

Copy link
Copy Markdown
Contributor

PR Summary by Qodo

Report disk usage metrics for index and log drives
✨ Enhancement 🧪 Tests 🕐 10-20 Minutes

Grey Divider

Walkthroughs

Description
• Extends *-sys-disk metric to report total/used bytes for index and log drives in addition to the
  database drive, enabling visibility for split-disk layouts.
• CreateDiskMetric now accepts a list of paths and a getDriveInfo resolver; it deduplicates by
  disk name so paths sharing a mount point don't emit duplicate metric series.
• indexPath is computed once in ClusterVNode and passed through MetricsBootstrapper.Bootstrap
  alongside the new logPath parameter.
• Adds two new unit tests: one verifying deduplication when all paths share a drive, and one
  verifying separate series are emitted per distinct drive.
Diagram
graph TD
    A["ClusterVNode"] -->|"indexPath, logPath"| B["MetricsBootstrapper"]
    B -->|"[dbPath, indexPath, logPath]"| C["SystemMetrics"]
    C -->|"per distinct drive"| D["ObservableGauge\n*-sys-disk-bytes"]
    C -->|"resolve path"| E["DriveStats.GetDriveInfo"]
    E --> F[("DriveData\n(DiskName, Total, Available)")]
    F --> C

    subgraph Legend
      direction LR
      _svc["Service/Class"] ~~~ _db[("Data Record")] ~~~ _metric["Metric Output"]
    end
Loading
High-Level Assessment

The approach is optimal. Injecting the drive-info resolver as a Func keeps the metric logic testable without real filesystem access, and the HashSet deduplication is the simplest correct solution for the common single-disk case. Alternatives such as pre-grouping paths by drive before calling Bootstrap, or exposing a separate metric per drive type (db/index/log), were likely considered but would add complexity without benefit — the current design is transparent to callers and backward-compatible when all paths share one drive.

Grey Divider

File Changes

Enhancement (2)
SystemMetrics.cs Extend CreateDiskMetric to accept multiple paths with per-drive deduplication +14/-6

Extend CreateDiskMetric to accept multiple paths with per-drive deduplication

• The 'CreateDiskMetric' signature changes from a single 'dbPath' string to 'IReadOnlyList<string> paths' plus an injectable 'Func<string, DriveData> getDriveInfo'. It iterates the paths, resolves each to its drive name, and skips duplicates via a 'HashSet<string>'. Each distinct drive gets its own debounced resolver and registers independent 'DriveUsedBytes'/'DriveTotalBytes' gauges tagged with '(kind, disk)'.

src/KurrentDB.Core/Metrics/SystemMetrics.cs


MetricsBootstrapper.cs Pass indexPath and logPath into Bootstrap and forward all three paths to CreateDiskMetric +4/-1

Pass indexPath and logPath into Bootstrap and forward all three paths to CreateDiskMetric

• The 'Bootstrap' method gains two new parameters: 'indexPath' and 'logPath'. The 'CreateDiskMetric' call is updated to pass '[dbConfig.Path, indexPath, logPath]' and the real 'DriveStats.GetDriveInfo' resolver, enabling multi-drive metric emission in production.

src/KurrentDB.Core/MetricsBootstrapper.cs


Refactor (1)
ClusterVNode.cs Compute indexPath earlier and pass it with logPath to MetricsBootstrapper +2/-2

Compute indexPath earlier and pass it with logPath to MetricsBootstrapper

• The 'indexPath' local variable is moved from 'StartSubsystems' scope to the constructor body so it can be passed to 'MetricsBootstrapper.Bootstrap' alongside 'options.Logging.Log' as the new 'logPath' argument. The duplicate declaration inside 'StartSubsystems' is removed.

src/KurrentDB.Core/ClusterVNode.cs


Tests (1)
SystemMetricsTests.cs Add unit tests for multi-drive deduplication and per-drive metric emission +68/-1

Add unit tests for multi-drive deduplication and per-drive metric emission

• Updates the existing 'CreateDiskMetric' call in the test fixture to use the new list-based signature. Adds 'deduplicates_disk_paths_on_the_same_drive' (verifies only 2 measurements for 3 paths on one disk) and 'separate_drives_produce_separate_stats' (verifies 4 measurements with correct values for 2 distinct disks). Extracts a 'CreateSystemMetrics' helper to reduce duplication.

src/KurrentDB.Core.XUnit.Tests/Metrics/SystemMetricsTests.cs


Grey Divider

Qodo Logo

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the *-sys-disk Prometheus metric so it can report disk total/used bytes for the DB, index, and log locations (deduplicated when multiple paths map to the same underlying drive), improving observability for split-disk deployments.

Changes:

  • Extend metrics bootstrapping to pass DB/index/log paths into disk metrics registration.
  • Refactor SystemMetrics.CreateDiskMetric to accept multiple paths plus an injected drive-info resolver and deduplicate by resolved drive.
  • Add unit tests validating drive deduplication and multi-drive emission behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
src/KurrentDB.Core/MetricsBootstrapper.cs Pass DB/index/log paths into system disk metrics registration.
src/KurrentDB.Core/Metrics/SystemMetrics.cs Generalize disk metric creation to multiple paths and dedupe per distinct drive.
src/KurrentDB.Core/ClusterVNode.cs Wire index/log paths into metrics bootstrap and reuse the computed index path.
src/KurrentDB.Core.XUnit.Tests/Metrics/SystemMetricsTests.cs Add coverage for deduplication and multi-drive metric series behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +52 to 53
public void CreateDiskMetric(string metricName, IReadOnlyList<string> paths, Func<string, DriveData> getDriveInfo, Dictionary<SystemTracker, string> dimNames) {
var dims = new Dimensions<SystemTracker, long>(config, dimNames, tag => new());
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants