Migrate Quickwit metrics to metrics-rs#6374
Conversation
Port the metricspp library into the quickwit workspace as a single crate with type-safe, zero-allocation metric declarations built on the metrics crate. Includes two-level caching (thread-local + global DashMap), observable counters/gauges with shadow atomics, RAII GaugeGuard, Labels<N> templates, inventory-based metric discovery, integration tests, property-based hash tests, criterion benchmarks, the http_service example, and the inventory binary. Made-with: Cursor
Move the inventory binary, build.rs (linker flags), and scripts/ from quickwit-metrics into a dedicated quickwit-metrics-inventory crate. Re-export `metrics` and `inventory` types via `$crate::__metrics::` and `$crate::__inventory::` so downstream crates only need `quickwit-metrics` in their Cargo.toml. Made-with: Cursor
Replace name/subsystem/module_path fields with a &'static Metadata reference (provides module_path, target/subsystem, and level) and add static_labels for compile-time label name/value pairs. Update inventory output to group metrics by module path, sorted by key name. Made-with: Cursor
Made-with: Cursor
Export Quickwit metrics through the existing OpenTelemetry OTLP exporter path when enabled, while preserving Prometheus and DogStatsD routing. Group the telemetry providers and env-filter reload callback into TelemetryHandle so metrics, traces, and logs are initialized and shut down together.
XOR is self-inverse so duplicate labels cancel each other out (a ^ a == 0). Wrapping addition (mod 2^64) is still commutative and associative — preserving order-independence and incremental composability — but distinct label sets now always produce distinct hashes. Co-authored-by: Cursor <cursoragent@cursor.com>
The EXIT trap now also restores Cargo.lock, which gets modified when cargo resolves the patched Cargo.toml dependencies. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…uickwit-oss/quickwit into migrate-quickwit-metrics-to-metrics-rs Co-authored-by: Cursor <cursoragent@cursor.com> # Conflicts: # quickwit/quickwit-indexing/src/actors/indexer.rs # quickwit/quickwit-metrics/examples/http_service.rs # quickwit/quickwit-search/src/leaf.rs # quickwit/quickwit-search/src/list_terms.rs # quickwit/quickwit-search/src/scroll_context.rs # quickwit/quickwit-search/src/search_permit_provider.rs
Co-authored-by: Cursor <cursoragent@cursor.com>
Rename LabelValues::hash → __hash and to_labels → __to_labels to match the __with_values convention for internal API. Update label_values! doc to recommend inline "key" => value for single-use labels and clarify that the labels: macro arm borrows internally. Co-authored-by: Cursor <cursoragent@cursor.com>
- Rename `Labels<N>` to `LabelNames<N>` (label-name template) - Rename `LabelValues<N>` to `Labels<N>` (concrete name+value pairs) - Add `labels!` macro for const-constructible all-static label pairs - Remove Counter fields from CountingUdpSocket, use statics directly - Inline get_actor_inboxes_count_gauge_guard into its single call site Co-authored-by: Cursor <cursoragent@cursor.com>
… for repeated keys - Add ACTOR_NAME and COMPONENT LabelNames constants in quickwit-indexing metrics - Use label_values!(ACTOR_NAME, [...]) and label_values!(COMPONENT, [...]) instead of repeating "actor_name" and "component" string literals - Replace all &quickwit_common::metrics::IN_FLIGHT_* qualified paths with direct use imports across quickwit-indexing, quickwit-ingest, and quickwit-serve Co-authored-by: Cursor <cursoragent@cursor.com>
…onstants - Replace all `metrics::INGEST_RESULT_*` qualified paths in router.rs with direct imports - Move VALIDITY LabelNames from ingest_v2/metrics.rs to crate-level metrics.rs for shared use across ingest_api_service and ingester - Replace raw "validity" => "value" with label_values!(VALIDITY, [...]) in ingest_api_service.rs - Use labels! macro in with_lock_metrics! macro for operation/type labels - Import IN_FLIGHT_WAL directly in ingest_v2/metrics.rs Co-authored-by: Cursor <cursoragent@cursor.com>
…private - Add label_names! macro to replace LabelNames::new([...]) - Make LabelNames::new private (__new, doc(hidden)) - Change label_values! syntax from (NAMES, [v1, v2]) to (names: NAMES, v1, v2) - Update all call sites across the workspace - Update docs and examples Co-authored-by: Cursor <cursoragent@cursor.com>
Replace negative increment pattern with explicit decrement call in source batch clearing. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ncher The labels: arm in counter!, gauge!, and histogram! now accepts multiple Labels<N> expressions (e.g. `labels: region_labels, status_labels`). A recursive __bind_labels! macro binds each expression once, folds hash and count, and chains iterators — zero allocation on the hot path. Also adds Labels::iter() returning (&str, &str) pairs, and tests verifying two/three-label composition and hash equivalence with single Labels. Co-authored-by: Cursor <cursoragent@cursor.com>
| pub cancel_cpu_queue: IntCounter, | ||
| pub cancel_cpu: IntCounter, | ||
| pub success: IntCounter, | ||
| pub cancel_before_warmup: MaybeRegisteredCounter, |
There was a problem hiding this comment.
@shuheiktgw here MaybeRegisteredCounter replace IntCounter. Could you double check it?
There was a problem hiding this comment.
Yes, this is intentional! (I refactored that part a bit, so MaybeRegisteredCounter no longer exists and has been replaced by ScopedCounter.) SplitSearchOutcomeCounters currently records metrics locally in some cases, so we need to support that use case. Previously, we simply returned a non-registered counter, but we can’t keep doing that now that we’ve started using static metrics.
The labels: argument now requires square brackets to visually distinguish the label list from other macro parameters: counter!(parent: FOO, labels: [labels_a, labels_b]) Also migrates sketch_processor.rs to use label composition. Co-authored-by: Cursor <cursoragent@cursor.com>
label_values!(ROUTE => method, path) instead of the previous: label_values!(names: ROUTE, method, path) The => visually connects the LabelNames template to its values, aligning with the existing "key" => value pattern in inline labels. Co-authored-by: Cursor <cursoragent@cursor.com>
Collapse unnecessarily multi-line label_names!, labels!, label_values!, counter!, gauge!, and histogram! invocations onto single or fewer lines where they fit within ~100 characters. Co-authored-by: Cursor <cursoragent@cursor.com>
The recursive macro is no longer needed now that the labels: arm uses [$($labels:expr),+]. Hash, count, and iterator are folded inline via simple $(...)+ repetition, removing ~60 lines of macro machinery. Co-authored-by: Cursor <cursoragent@cursor.com>
Switch PartialEq/Hash impls on Counter, Gauge, and Histogram from cache-key hash comparison to Arc::as_ptr() identity. This eliminates any collision risk and is semantically correct since the global DashMap guarantees one Arc per unique name+labels combination. Add Counter::local() and Gauge::local() for detached noop accumulators with independent shadow atomics. Rename get_hash() to __hash() and mark it #[doc(hidden)]. Co-authored-by: Cursor <cursoragent@cursor.com>
Add a literal arm to labels! that uses SharedString::const_str() for const-compatible label construction. Use LabelNames constants (OUTCOME, ACTION, COMPONENT_NAME, COMPONENT_CAPACITY_POLICY) in quickwit-storage to eliminate repeated label key strings. Co-authored-by: Cursor <cursoragent@cursor.com>
Make `metrics` a `pub mod` instead of re-exporting individual symbols from the crate root. Internal consumers now import directly from `crate::metrics::`, external consumers from `quickwit_storage::metrics::`. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove #![allow(missing_docs)], add proper rustdoc to CacheMetrics, SingleCacheMetrics, their methods, and the four public cache statics. Tighten field visibility to pub(crate) where only internal access is needed. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Counter now exposes Counter::local(). I think LocalCounter and ScopedCounter can be removed now.
| self.underlying.with_label_values(&label_values) | ||
| } | ||
| } | ||
| static PROMETHEUS_HANDLE: OnceLock<PrometheusHandle> = OnceLock::new(); |
There was a problem hiding this comment.
I feel we could organize the code a bit better and make it more explorable. One of the benefits of this PR is that allows to have multiple exporters supported by a fanout bridge. Would it make sense to have a specific module, e.g. quickwit-metrics-exporters, where we group all of them? What do you think?
| prometheus::register(collector).expect("failed to register counter vec"); | ||
|
|
||
| IntGaugeVec { underlying } | ||
| pub fn register_info(name: &'static str, help: &'static str, kvs: BTreeMap<&'static str, String>) { |
There was a problem hiding this comment.
I'm trying to understand the goal of this piece of code here and it's not clear to me. @shuheiktgw could you clarify it?
There was a problem hiding this comment.
The diff is a bit misleading but the register_info function already existed before this PR, so I left it as-is. I believe it was originally added to provide a “fake” counter that exposes Quickwit build information, such as the commit hash.
Looking at it now, it seems the metric is not being registered correctly, so I need to fix that. However, for backward compatibility, I believe we still need to keep supporting it, but what do you think?
https://github.com/quickwit-oss/quickwit/blob/main/quickwit/quickwit-common/src/metrics.rs#L78-L86
Summary
Migrates Quickwit metrics to a metrics-rs based infrastructure.
Prometheus metrics are now exported through
PrometheusRecorder, while OTLP metrics are exported through the OpenTelemetry metrics recorder. Existing metric definitions and call sites have been moved to typedquickwit-metricshandles/macros where applicable.Quickwit-specific metric logic remains in
quickwit-common/src/metrics.rs, while the generic metrics declaration/registration/cache layer is split into the newquickwit-metricscrate.Review Guide
The diff is large, but most of it is mechanical: existing metrics were converted to static definitions using
quickwit-metrics, and call sites were updated accordingly. The core changes are concentrated in the following areas.quickwit/quickwit-cli/src/logger.rsMetrics initialization was updated for the metrics-rs migration.
PrometheusRecorderfor Prometheus export.OpenTelemetryRecorderfor OTLP metrics export.quickwit/quickwit-cli/src/main.rsmetrics-rs requires the global recorder to be installed before metrics are emitted, so telemetry/metrics initialization was moved earlier in the startup flow.
quickwit/quickwit-common/src/metrics.rsThis file now keeps only Quickwit-specific metrics logic. The generic Prometheus metric wrappers/factories were removed or moved to
quickwit-metrics.quickwit/quickwit-metricsAdds the metrics-rs module prepared by @Mallets .
Main features:
counter!,gauge!, andhistogram!macros.inventory-based metric metadata enumeration.The gRPC metrics have been renamed because metric names now need to be defined statically. The service name can no longer be embedded dynamically in the metric name, so it has been moved to a
servicelabel instead.The affected metrics are:
quickwit_<service>_grpc_requests_total->quickwit_grpc_requests_total{service="<service>"}quickwit_<service>_grpc_requests_in_flight->quickwit_grpc_requests_in_flight{service="<service>"}quickwit_<service>_grpc_request_duration_seconds->quickwit_grpc_request_duration_seconds{service="<service>"}For example,
quickwit_ingest_grpc_requests_total{kind="server"}should now be queried asquickwit_grpc_requests_total{service="ingest", kind="server"}.