Skip to content

Migrate Quickwit metrics to metrics-rs#6374

Open
shuheiktgw wants to merge 56 commits intomainfrom
migrate-quickwit-metrics-to-metrics-rs
Open

Migrate Quickwit metrics to metrics-rs#6374
shuheiktgw wants to merge 56 commits intomainfrom
migrate-quickwit-metrics-to-metrics-rs

Conversation

@shuheiktgw
Copy link
Copy Markdown
Collaborator

@shuheiktgw shuheiktgw commented May 4, 2026

Summary

Migrates Quickwit metrics to a metrics-rs based infrastructure.

Prometheus metrics are now exported through PrometheusRecorder, while OTLP metrics are exported through the OpenTelemetry metrics recorder. Existing metric definitions and call sites have been moved to typed quickwit-metrics handles/macros where applicable.

Quickwit-specific metric logic remains in quickwit-common/src/metrics.rs, while the generic metrics declaration/registration/cache layer is split into the new quickwit-metrics crate.

Review Guide

The diff is large, but most of it is mechanical: existing metrics were converted to static definitions using quickwit-metrics, and call sites were updated accordingly. The core changes are concentrated in the following areas.

quickwit/quickwit-cli/src/logger.rs

Metrics initialization was updated for the metrics-rs migration.

  • Uses PrometheusRecorder for Prometheus export.
  • Uses OpenTelemetryRecorder for OTLP metrics export.
  • Uses fanout routing so Prometheus and OTLP metrics can both receive Quickwit metrics.
  • Keeps invariant metrics routed to the DogStatsD recorder.

quickwit/quickwit-cli/src/main.rs

metrics-rs requires the global recorder to be installed before metrics are emitted, so telemetry/metrics initialization was moved earlier in the startup flow.

quickwit/quickwit-common/src/metrics.rs

This file now keeps only Quickwit-specific metrics logic. The generic Prometheus metric wrappers/factories were removed or moved to quickwit-metrics.

quickwit/quickwit-metrics

Adds the metrics-rs module prepared by @Mallets .

Main features:

  • counter!, gauge!, and histogram! macros.
  • Static metric metadata, labels, and histogram bucket definitions.
  • inventory-based metric metadata enumeration.
  • Call-site and global caching to avoid allocations on hot paths.

⚠️ Breaking Changes ⚠️

The gRPC metrics have been renamed because metric names now need to be defined statically. The service name can no longer be embedded dynamically in the metric name, so it has been moved to a service label instead.

The affected metrics are:

  • quickwit_<service>_grpc_requests_total -> quickwit_grpc_requests_total{service="<service>"}
  • quickwit_<service>_grpc_requests_in_flight -> quickwit_grpc_requests_in_flight{service="<service>"}
  • quickwit_<service>_grpc_request_duration_seconds -> quickwit_grpc_request_duration_seconds{service="<service>"}

For example, quickwit_ingest_grpc_requests_total{kind="server"} should now be queried as quickwit_grpc_requests_total{service="ingest", kind="server"}.

shuheiktgw and others added 21 commits May 2, 2026 14:24
Port the metricspp library into the quickwit workspace as a single
crate with type-safe, zero-allocation metric declarations built on the
metrics crate. Includes two-level caching (thread-local + global
DashMap), observable counters/gauges with shadow atomics, RAII
GaugeGuard, Labels<N> templates, inventory-based metric discovery,
integration tests, property-based hash tests, criterion benchmarks,
the http_service example, and the inventory binary.

Made-with: Cursor
Move the inventory binary, build.rs (linker flags), and scripts/ from
quickwit-metrics into a dedicated quickwit-metrics-inventory crate.
Re-export `metrics` and `inventory` types via `$crate::__metrics::`
and `$crate::__inventory::` so downstream crates only need
`quickwit-metrics` in their Cargo.toml.

Made-with: Cursor
Replace name/subsystem/module_path fields with a &'static Metadata
reference (provides module_path, target/subsystem, and level) and add
static_labels for compile-time label name/value pairs. Update inventory
output to group metrics by module path, sorted by key name.

Made-with: Cursor
Export Quickwit metrics through the existing OpenTelemetry OTLP exporter path when enabled, while preserving Prometheus and DogStatsD routing. Group the telemetry providers and env-filter reload callback into TelemetryHandle so metrics, traces, and logs are initialized and shut down together.
XOR is self-inverse so duplicate labels cancel each other out
(a ^ a == 0). Wrapping addition (mod 2^64) is still commutative
and associative — preserving order-independence and incremental
composability — but distinct label sets now always produce distinct
hashes.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread quickwit/quickwit-common/src/tower/metrics.rs
Comment thread quickwit/quickwit-common/src/thread_pool.rs Outdated
Comment thread quickwit/quickwit-common/src/thread_pool.rs Outdated
Comment thread quickwit/quickwit-common/src/thread_pool.rs Outdated
Comment thread quickwit/quickwit-metrics/src/gauge.rs Outdated
Mallets and others added 2 commits May 4, 2026 14:45
The EXIT trap now also restores Cargo.lock, which gets modified when
cargo resolves the patched Cargo.toml dependencies.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread quickwit/quickwit-metrics/src/histogram.rs Outdated
Comment thread quickwit/quickwit-metrics/src/histogram.rs Outdated
Mallets and others added 4 commits May 5, 2026 10:43
…uickwit-oss/quickwit into migrate-quickwit-metrics-to-metrics-rs

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	quickwit/quickwit-indexing/src/actors/indexer.rs
#	quickwit/quickwit-metrics/examples/http_service.rs
#	quickwit/quickwit-search/src/leaf.rs
#	quickwit/quickwit-search/src/list_terms.rs
#	quickwit/quickwit-search/src/scroll_context.rs
#	quickwit/quickwit-search/src/search_permit_provider.rs
Co-authored-by: Cursor <cursoragent@cursor.com>
Rename LabelValues::hash → __hash and to_labels → __to_labels to
match the __with_values convention for internal API. Update label_values!
doc to recommend inline "key" => value for single-use labels and clarify
that the labels: macro arm borrows internally.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename `Labels<N>` to `LabelNames<N>` (label-name template)
- Rename `LabelValues<N>` to `Labels<N>` (concrete name+value pairs)
- Add `labels!` macro for const-constructible all-static label pairs
- Remove Counter fields from CountingUdpSocket, use statics directly
- Inline get_actor_inboxes_count_gauge_guard into its single call site

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread quickwit/quickwit-indexing/src/source/mod.rs Outdated
Mallets and others added 6 commits May 5, 2026 12:19
… for repeated keys

- Add ACTOR_NAME and COMPONENT LabelNames constants in quickwit-indexing metrics
- Use label_values!(ACTOR_NAME, [...]) and label_values!(COMPONENT, [...]) instead
  of repeating "actor_name" and "component" string literals
- Replace all &quickwit_common::metrics::IN_FLIGHT_* qualified paths with direct
  use imports across quickwit-indexing, quickwit-ingest, and quickwit-serve

Co-authored-by: Cursor <cursoragent@cursor.com>
…onstants

- Replace all `metrics::INGEST_RESULT_*` qualified paths in router.rs
  with direct imports
- Move VALIDITY LabelNames from ingest_v2/metrics.rs to crate-level
  metrics.rs for shared use across ingest_api_service and ingester
- Replace raw "validity" => "value" with label_values!(VALIDITY, [...])
  in ingest_api_service.rs
- Use labels! macro in with_lock_metrics! macro for operation/type labels
- Import IN_FLIGHT_WAL directly in ingest_v2/metrics.rs

Co-authored-by: Cursor <cursoragent@cursor.com>
…private

- Add label_names! macro to replace LabelNames::new([...])
- Make LabelNames::new private (__new, doc(hidden))
- Change label_values! syntax from (NAMES, [v1, v2]) to (names: NAMES, v1, v2)
- Update all call sites across the workspace
- Update docs and examples

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace negative increment pattern with explicit decrement call
in source batch clearing.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ncher

The labels: arm in counter!, gauge!, and histogram! now accepts multiple
Labels<N> expressions (e.g. `labels: region_labels, status_labels`).
A recursive __bind_labels! macro binds each expression once, folds hash
and count, and chains iterators — zero allocation on the hot path.

Also adds Labels::iter() returning (&str, &str) pairs, and tests
verifying two/three-label composition and hash equivalence with single
Labels.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread quickwit/quickwit-search/src/metrics.rs Outdated
pub cancel_cpu_queue: IntCounter,
pub cancel_cpu: IntCounter,
pub success: IntCounter,
pub cancel_before_warmup: MaybeRegisteredCounter,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shuheiktgw here MaybeRegisteredCounter replace IntCounter. Could you double check it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentional! (I refactored that part a bit, so MaybeRegisteredCounter no longer exists and has been replaced by ScopedCounter.) SplitSearchOutcomeCounters currently records metrics locally in some cases, so we need to support that use case. Previously, we simply returned a non-registered counter, but we can’t keep doing that now that we’ve started using static metrics.

Mallets and others added 10 commits May 5, 2026 14:39
The labels: argument now requires square brackets to visually
distinguish the label list from other macro parameters:
  counter!(parent: FOO, labels: [labels_a, labels_b])

Also migrates sketch_processor.rs to use label composition.

Co-authored-by: Cursor <cursoragent@cursor.com>
  label_values!(ROUTE => method, path)

instead of the previous:

  label_values!(names: ROUTE, method, path)

The => visually connects the LabelNames template to its values,
aligning with the existing "key" => value pattern in inline labels.

Co-authored-by: Cursor <cursoragent@cursor.com>
Collapse unnecessarily multi-line label_names!, labels!, label_values!,
counter!, gauge!, and histogram! invocations onto single or fewer lines
where they fit within ~100 characters.

Co-authored-by: Cursor <cursoragent@cursor.com>
The recursive macro is no longer needed now that the labels: arm uses
[$($labels:expr),+]. Hash, count, and iterator are folded inline via
simple $(...)+ repetition, removing ~60 lines of macro machinery.

Co-authored-by: Cursor <cursoragent@cursor.com>
Switch PartialEq/Hash impls on Counter, Gauge, and Histogram from
cache-key hash comparison to Arc::as_ptr() identity. This eliminates
any collision risk and is semantically correct since the global DashMap
guarantees one Arc per unique name+labels combination.

Add Counter::local() and Gauge::local() for detached noop accumulators
with independent shadow atomics. Rename get_hash() to __hash() and
mark it #[doc(hidden)].

Co-authored-by: Cursor <cursoragent@cursor.com>
Add a literal arm to labels! that uses SharedString::const_str() for
const-compatible label construction. Use LabelNames constants (OUTCOME,
ACTION, COMPONENT_NAME, COMPONENT_CAPACITY_POLICY) in quickwit-storage
to eliminate repeated label key strings.

Co-authored-by: Cursor <cursoragent@cursor.com>
Make `metrics` a `pub mod` instead of re-exporting individual symbols
from the crate root. Internal consumers now import directly from
`crate::metrics::`, external consumers from `quickwit_storage::metrics::`.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove #![allow(missing_docs)], add proper rustdoc to CacheMetrics,
SingleCacheMetrics, their methods, and the four public cache statics.
Tighten field visibility to pub(crate) where only internal access is needed.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter now exposes Counter::local(). I think LocalCounter and ScopedCounter can be removed now.

self.underlying.with_label_values(&label_values)
}
}
static PROMETHEUS_HANDLE: OnceLock<PrometheusHandle> = OnceLock::new();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we could organize the code a bit better and make it more explorable. One of the benefits of this PR is that allows to have multiple exporters supported by a fanout bridge. Would it make sense to have a specific module, e.g. quickwit-metrics-exporters, where we group all of them? What do you think?

Comment thread quickwit/quickwit-common/src/metrics.rs Outdated
prometheus::register(collector).expect("failed to register counter vec");

IntGaugeVec { underlying }
pub fn register_info(name: &'static str, help: &'static str, kvs: BTreeMap<&'static str, String>) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the goal of this piece of code here and it's not clear to me. @shuheiktgw could you clarify it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff is a bit misleading but the register_info function already existed before this PR, so I left it as-is. I believe it was originally added to provide a “fake” counter that exposes Quickwit build information, such as the commit hash.

Looking at it now, it seems the metric is not being registered correctly, so I need to fix that. However, for backward compatibility, I believe we still need to keep supporting it, but what do you think?

https://github.com/quickwit-oss/quickwit/blob/main/quickwit/quickwit-common/src/metrics.rs#L78-L86

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 8211251

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as #6374 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants