From 48e196ebe9a58c1d1f004c04818c21b1274f5990 Mon Sep 17 00:00:00 2001 From: Nastasha Solomon Date: Wed, 27 May 2026 17:16:07 -0400 Subject: [PATCH 1/7] first draft --- .../alerting/alerts/inspect-rule-queries.md | 234 ++++++++++++++++++ explore-analyze/toc.yml | 1 + .../create-manage-rules.md | 15 ++ .../triage-threshold-breaches.md | 19 ++ 4 files changed, 269 insertions(+) create mode 100644 explore-analyze/alerting/alerts/inspect-rule-queries.md diff --git a/explore-analyze/alerting/alerts/inspect-rule-queries.md b/explore-analyze/alerting/alerts/inspect-rule-queries.md new file mode 100644 index 0000000000..8d35bb0207 --- /dev/null +++ b/explore-analyze/alerting/alerts/inspect-rule-queries.md @@ -0,0 +1,234 @@ +--- +navigation_title: Diagnose rule behavior +description: Use the rule query inspector to view the Elasticsearch request behind a rule and diagnose why an alert did or didn't fire. +applies_to: + stack: ga 9.5 + serverless: ga +products: + - id: kibana +--- + +# Diagnose rule behavior with the rule query inspector [inspect-rule-queries] + +The rule query inspector lets you view the {{es}} request that a rule sends when it evaluates your data. Use it to understand the query structure, confirm the rule is targeting the right data, and diagnose why an alert did or didn't fire. + +::::{note} +:applies_to: {"stack": "ga 9.5", "serverless": "ga"} +Currently, the rule query inspector is only available for **custom threshold rules**. +:::: + +## Access the inspector [inspect-access] + +The inspector is available from two places, each showing a different query: + +**From the rule details page (current rule parameters)** +: Open **{{stack-manage-app}}** > **{{rules-ui}}**, find your rule, and click its name to open the rule details page. Click **Rule query inspector**. The inspector builds the query from the rule's _current_ parameters. Use this view to verify that the rule is configured correctly and would match the data you expect. + +**From an alert details page (historical parameters)** +: Go to the **Alerts** page, then open an individual alert. Click **Rule query inspector**. The inspector uses the rule parameters _as they existed when that specific alert fired_, including the exact evaluation time range. Use this view to understand why a particular alert was or wasn't triggered. + +The key difference: the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert fired, the two inspectors will show different queries. + +## Anatomy of the query [inspect-query-anatomy] + +The following sections describe the query structure for **custom threshold rules**. As support for additional rule types is added, this reference will expand. + +The inspector displays the full {{es}} request. Each part of the query maps to a setting in your rule configuration. + +### Index and time range [inspect-anatomy-index-time] + +The top-level index and `range` filter reflect your rule's data source and time window: + +```json +{ + "index": [""], + "body": { + "query": { + "bool": { + "filter": [ + { + "range": { + "@timestamp": { <1> + "gte": "...", <2> + "lte": "..." <3> + } + } + } + ] + } + } + } +} +``` + +1. The time field from your data view. +2. The start of the evaluation window (`now` minus your rule's **time window** setting). +3. The end of the evaluation window. From an alert details page, this matches the exact moment the alert was evaluated, not the current time. + +If the time range looks unexpected from an alert details page, this confirms the exact window {{es}} searched when the alert fired. This can help explain alerts that seem outdated or cover an unexpected period. + +### Query filter [inspect-anatomy-query-filter] + +If you set a **query filter** on the rule, it appears as an additional clause in the `bool` filter: + +```json +{ + "query": { + "bool": { + "filter": [ + { "range": { "@timestamp": { ... } } }, + { "query_string": { "query": "host.name: host-1" } } <1> + ] + } + } +} +``` + +1. The KQL query filter you set on the rule, translated to a `query_string` or `term` clause. If this filter excludes more data than expected, the rule won't find the documents you intended. + +If the filter is missing or different from what you set, double-check the rule configuration. + +### Aggregations [inspect-anatomy-aggregations] + +Each criterion you defined in the rule becomes an aggregation in the query. A rule with two criteria (for example, Aggregation A and Aggregation B) produces two sub-aggregations: + +```json +{ + "aggs": { + "A": { <1> + "avg": { "field": "system.cpu.user.pct" } + }, + "B": { + "avg": { "field": "system.cpu.system.pct" } + } + } +} +``` + +1. The letter label matches the criterion label shown in the rule configuration (**A**, **B**, and so on). + +| Rule criterion | Aggregation in query | +| --- | --- | +| **Average** of a field | `avg` | +| **Max** of a field | `max` | +| **Min** of a field | `min` | +| **Sum** of a field | `sum` | +| **Count** (all docs) | `value_count` or `filter` + `value_count` | +| **Cardinality** of a field | `cardinality` | +| **95th percentile** of a field | `percentiles` with `{ "percents": [95] }` | +| **Rate** of a field | Two `max` aggregations plus a bucket script | + +If you set a **KQL filter** on a criterion ({applies_to}`stack: ga 9.4+`), it appears as a `filter` aggregation wrapping the metric aggregation. + +### Group-by fields [inspect-anatomy-group-by] + +If your rule uses **Group alerts by**, the aggregations are wrapped in a `composite` aggregation that partitions results by those fields: + +```json +{ + "aggs": { + "groupBy": { + "composite": { + "sources": [ + { "host.name": { "terms": { "field": "host.name" } } } <1> + ], + "size": 10000 + }, + "aggs": { + "A": { "avg": { "field": "system.cpu.user.pct" } } + } + } + } +} +``` + +1. One entry per **Group alerts by** field. Multiple group-by fields produce multiple `sources`. + +Without group-by, the aggregations run over all matched documents and return a single value. + +## Reading the response [inspect-response] + +The inspector also shows the Elasticsearch response alongside the request. Match each aggregation bucket back to your rule configuration to understand what value was computed. + +### No group-by: single-value response [inspect-response-no-group] + +When there are no group-by fields, the response contains a single set of aggregation values under `aggregations`: + +```json +{ + "aggregations": { + "A": { "value": 0.82 }, <1> + "B": { "value": 0.15 } + } +} +``` + +1. Aggregation `A` returned `0.82`. If your rule equation is `(A + B) / C * 100` with threshold `IS ABOVE 95`, you'd compute the equation value with these numbers to confirm whether the threshold was met. + +If the response value is below the threshold and no alert fired, this confirms the rule evaluated correctly. If you _expected_ an alert and the value is below the threshold, review your aggregations and KQL filters. + +### With group-by: bucketed response [inspect-response-group] + +When group-by fields are used, the response returns one bucket per group under `aggregations.groupBy.buckets`: + +```json +{ + "aggregations": { + "groupBy": { + "buckets": [ + { + "key": { "host.name": "host-1" }, + "doc_count": 342, + "A": { "value": 0.97 } <1> + }, + { + "key": { "host.name": "host-2" }, + "doc_count": 58, + "A": { "value": 0.42 } <2> + } + ] + } + } +} +``` + +1. `host-1` had a value of `0.97`. If the threshold is `IS ABOVE 0.95`, this group breached it and an alert should have fired for `host-1`. +2. `host-2` had a value of `0.42` — below the threshold, so no alert fired for this group. + +If a group you expected to appear is missing from the buckets, it had no matching documents during the evaluation window. This can happen when `doc_count` is 0 or when the query filter excluded all documents for that group. + +### What a "no data" response looks like [inspect-response-no-data] + +If {{es}} returned no documents, the aggregation values will be `null` or the buckets array will be empty: + +```json +{ + "aggregations": { + "A": { "value": null } + } +} +``` + +A `null` value means no data matched the query during the evaluation window. If you have **no data** alerts configured, this is the state that triggers them. Check the time range and query filter to confirm no documents were genuinely present, or investigate whether an index or data view configuration issue is preventing data from being found. + +## Common troubleshooting scenarios [inspect-troubleshoot] + +:::{dropdown} Alert fired but I don't know why +Open the inspector from the alert details page. Review the time range to confirm it matches the evaluation period. Find the aggregation bucket for your group and check the value against the threshold. If the value exceeds the threshold, the alert fired correctly. +::: + +:::{dropdown} Alert didn't fire when I expected it to +Open the inspector from the rule details page and confirm the query targets the right index pattern and time range. Check the query filter for unintended restrictions. If the aggregation values in the response are below the threshold, the rule evaluated correctly but your data didn't breach the threshold during that window. +::: + +:::{dropdown} Rule looks correct now but the alert used different parameters +If you've modified the rule since the alert fired, open the inspector from the _alert details page_ rather than the rule details page. The alert inspector uses the parameters that were active at the time the alert fired, so the query will reflect the older configuration. +::: + +:::{dropdown} Empty or null aggregation values +The query matched no documents. Check whether the index pattern in the data view is correct, whether your time range is appropriate, and whether any query filter is too restrictive. Also verify that the data stream or index has data in the expected time period by running the same query in [Discover](/explore-analyze/discover.md) or [Dev Tools](/explore-analyze/query-filter/tools/console.md). +::: + +:::{dropdown} Unexpected group missing from results +If a group you expected (such as a specific host) doesn't appear in the buckets, no documents for that group matched the query during the evaluation window. This can happen when the group was inactive, when a filter excluded its documents, or when the field used for grouping has a different value in the actual documents than you expected. +::: diff --git a/explore-analyze/toc.yml b/explore-analyze/toc.yml index 2a57f93a92..f52005e97c 100644 --- a/explore-analyze/toc.yml +++ b/explore-analyze/toc.yml @@ -394,6 +394,7 @@ toc: - file: alerting/alerts/geo-alerting.md - file: alerting/alerts/rule-action-variables.md - file: alerting/alerts/notifications-domain-allowlist.md + - file: alerting/alerts/inspect-rule-queries.md - file: alerting/alerts/alerting-troubleshooting.md children: - file: alerting/alerts/alerting-common-issues.md diff --git a/solutions/observability/incident-management/create-manage-rules.md b/solutions/observability/incident-management/create-manage-rules.md index 8a29fed681..1a6725fb35 100644 --- a/solutions/observability/incident-management/create-manage-rules.md +++ b/solutions/observability/incident-management/create-manage-rules.md @@ -93,6 +93,21 @@ A rule can have one of the following responses: `warning` : The rule ran with some non-critical errors. +### Inspect the rule query [observability-create-manage-rules-inspect-query] + +```{applies_to} +stack: ga 9.5 +serverless: ga +``` + +::::{note} +:applies_to: {"stack": "ga 9.5", "serverless": "ga"} +Currently, the rule query inspector is only available for **custom threshold rules**. +:::: + +From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, inclduing the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert fires. + +For an explanation of the query structure and how to read the response, refer to [Diagnose rule behavior with the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). ## Snooze and disable rules [observability-create-manage-rules-snooze-and-disable-rules] diff --git a/solutions/observability/incident-management/triage-threshold-breaches.md b/solutions/observability/incident-management/triage-threshold-breaches.md index a3d6727799..8548734843 100644 --- a/solutions/observability/incident-management/triage-threshold-breaches.md +++ b/solutions/observability/incident-management/triage-threshold-breaches.md @@ -51,6 +51,25 @@ Explore charts on the page to learn more about the threshold breach: Analyze these charts to better understand when the breach started, it’s current state, and how the issue is trending. +## Inspect the query behind an alert [triage-threshold-inspect-query] + +```{applies_to} +stack: ga 9.5 +serverless: ga +``` + +To understand exactly what {{es}} evaluated when the alert fired, use the rule query inspector. From the alert details page, click **Rule query inspector**. + +Unlike the inspector on the rule details page, which reflects the rule's _current_ parameters, the inspector on an alert details page uses the parameters that were active at the time the alert fired, including the exact evaluation time range stored on the alert document. This makes it useful for investigating historical alerts, especially if the rule has been edited since. + +Use the inspector to: + +- Confirm the time range that was evaluated when the alert fired. +- Review the aggregation values that were returned and compare them to the threshold. +- Identify whether a missing group, a too-restrictive filter, or unexpected data caused the alert to behave differently than expected. + +For an explanation of the query structure, aggregation types, and how to read the response, refer to [the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). + After investigating the alert, you may want to: * Click **Snooze the rule** to snooze notifications for a specific time period or indefinitely. From f3e1d5aab51261ace69a9c39d6e007d7ddf619d8 Mon Sep 17 00:00:00 2001 From: Nastasha Solomon <79124755+nastasha-solomon@users.noreply.github.com> Date: Wed, 27 May 2026 18:10:02 -0400 Subject: [PATCH 2/7] Update solutions/observability/incident-management/create-manage-rules.md Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --- .../observability/incident-management/create-manage-rules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/incident-management/create-manage-rules.md b/solutions/observability/incident-management/create-manage-rules.md index 1a6725fb35..64af12ec03 100644 --- a/solutions/observability/incident-management/create-manage-rules.md +++ b/solutions/observability/incident-management/create-manage-rules.md @@ -105,7 +105,7 @@ serverless: ga Currently, the rule query inspector is only available for **custom threshold rules**. :::: -From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, inclduing the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert fires. +From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, including the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert fires. For an explanation of the query structure and how to read the response, refer to [Diagnose rule behavior with the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). From 5b03b5678078433bae0675cd008d18cf256d64ae Mon Sep 17 00:00:00 2001 From: Nastasha Solomon <79124755+nastasha-solomon@users.noreply.github.com> Date: Wed, 27 May 2026 18:10:10 -0400 Subject: [PATCH 3/7] Update solutions/observability/incident-management/triage-threshold-breaches.md Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --- .../incident-management/triage-threshold-breaches.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/incident-management/triage-threshold-breaches.md b/solutions/observability/incident-management/triage-threshold-breaches.md index 8548734843..59456cb406 100644 --- a/solutions/observability/incident-management/triage-threshold-breaches.md +++ b/solutions/observability/incident-management/triage-threshold-breaches.md @@ -49,7 +49,7 @@ Explore charts on the page to learn more about the threshold breach: ::: -Analyze these charts to better understand when the breach started, it’s current state, and how the issue is trending. +Analyze these charts to better understand when the breach started, its current state, and how the issue is trending. ## Inspect the query behind an alert [triage-threshold-inspect-query] From 68c47df35b5bc77fa87e191dfb8556f2823f19ae Mon Sep 17 00:00:00 2001 From: Nastasha Solomon Date: Thu, 4 Jun 2026 00:45:44 -0400 Subject: [PATCH 4/7] bena's feedback --- .../alerting/alerts/inspect-rule-queries.md | 190 +----------------- 1 file changed, 9 insertions(+), 181 deletions(-) diff --git a/explore-analyze/alerting/alerts/inspect-rule-queries.md b/explore-analyze/alerting/alerts/inspect-rule-queries.md index 8d35bb0207..4a6ac92c6c 100644 --- a/explore-analyze/alerting/alerts/inspect-rule-queries.md +++ b/explore-analyze/alerting/alerts/inspect-rule-queries.md @@ -29,187 +29,15 @@ The inspector is available from two places, each showing a different query: The key difference: the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert fired, the two inspectors will show different queries. -## Anatomy of the query [inspect-query-anatomy] - -The following sections describe the query structure for **custom threshold rules**. As support for additional rule types is added, this reference will expand. - -The inspector displays the full {{es}} request. Each part of the query maps to a setting in your rule configuration. - -### Index and time range [inspect-anatomy-index-time] - -The top-level index and `range` filter reflect your rule's data source and time window: - -```json -{ - "index": [""], - "body": { - "query": { - "bool": { - "filter": [ - { - "range": { - "@timestamp": { <1> - "gte": "...", <2> - "lte": "..." <3> - } - } - } - ] - } - } - } -} -``` - -1. The time field from your data view. -2. The start of the evaluation window (`now` minus your rule's **time window** setting). -3. The end of the evaluation window. From an alert details page, this matches the exact moment the alert was evaluated, not the current time. - -If the time range looks unexpected from an alert details page, this confirms the exact window {{es}} searched when the alert fired. This can help explain alerts that seem outdated or cover an unexpected period. - -### Query filter [inspect-anatomy-query-filter] - -If you set a **query filter** on the rule, it appears as an additional clause in the `bool` filter: - -```json -{ - "query": { - "bool": { - "filter": [ - { "range": { "@timestamp": { ... } } }, - { "query_string": { "query": "host.name: host-1" } } <1> - ] - } - } -} -``` - -1. The KQL query filter you set on the rule, translated to a `query_string` or `term` clause. If this filter excludes more data than expected, the rule won't find the documents you intended. - -If the filter is missing or different from what you set, double-check the rule configuration. - -### Aggregations [inspect-anatomy-aggregations] - -Each criterion you defined in the rule becomes an aggregation in the query. A rule with two criteria (for example, Aggregation A and Aggregation B) produces two sub-aggregations: - -```json -{ - "aggs": { - "A": { <1> - "avg": { "field": "system.cpu.user.pct" } - }, - "B": { - "avg": { "field": "system.cpu.system.pct" } - } - } -} -``` - -1. The letter label matches the criterion label shown in the rule configuration (**A**, **B**, and so on). - -| Rule criterion | Aggregation in query | -| --- | --- | -| **Average** of a field | `avg` | -| **Max** of a field | `max` | -| **Min** of a field | `min` | -| **Sum** of a field | `sum` | -| **Count** (all docs) | `value_count` or `filter` + `value_count` | -| **Cardinality** of a field | `cardinality` | -| **95th percentile** of a field | `percentiles` with `{ "percents": [95] }` | -| **Rate** of a field | Two `max` aggregations plus a bucket script | - -If you set a **KQL filter** on a criterion ({applies_to}`stack: ga 9.4+`), it appears as a `filter` aggregation wrapping the metric aggregation. - -### Group-by fields [inspect-anatomy-group-by] - -If your rule uses **Group alerts by**, the aggregations are wrapped in a `composite` aggregation that partitions results by those fields: - -```json -{ - "aggs": { - "groupBy": { - "composite": { - "sources": [ - { "host.name": { "terms": { "field": "host.name" } } } <1> - ], - "size": 10000 - }, - "aggs": { - "A": { "avg": { "field": "system.cpu.user.pct" } } - } - } - } -} -``` - -1. One entry per **Group alerts by** field. Multiple group-by fields produce multiple `sources`. - -Without group-by, the aggregations run over all matched documents and return a single value. - -## Reading the response [inspect-response] - -The inspector also shows the Elasticsearch response alongside the request. Match each aggregation bucket back to your rule configuration to understand what value was computed. - -### No group-by: single-value response [inspect-response-no-group] - -When there are no group-by fields, the response contains a single set of aggregation values under `aggregations`: - -```json -{ - "aggregations": { - "A": { "value": 0.82 }, <1> - "B": { "value": 0.15 } - } -} -``` - -1. Aggregation `A` returned `0.82`. If your rule equation is `(A + B) / C * 100` with threshold `IS ABOVE 95`, you'd compute the equation value with these numbers to confirm whether the threshold was met. - -If the response value is below the threshold and no alert fired, this confirms the rule evaluated correctly. If you _expected_ an alert and the value is below the threshold, review your aggregations and KQL filters. - -### With group-by: bucketed response [inspect-response-group] - -When group-by fields are used, the response returns one bucket per group under `aggregations.groupBy.buckets`: - -```json -{ - "aggregations": { - "groupBy": { - "buckets": [ - { - "key": { "host.name": "host-1" }, - "doc_count": 342, - "A": { "value": 0.97 } <1> - }, - { - "key": { "host.name": "host-2" }, - "doc_count": 58, - "A": { "value": 0.42 } <2> - } - ] - } - } -} -``` - -1. `host-1` had a value of `0.97`. If the threshold is `IS ABOVE 0.95`, this group breached it and an alert should have fired for `host-1`. -2. `host-2` had a value of `0.42` — below the threshold, so no alert fired for this group. - -If a group you expected to appear is missing from the buckets, it had no matching documents during the evaluation window. This can happen when `doc_count` is 0 or when the query filter excluded all documents for that group. - -### What a "no data" response looks like [inspect-response-no-data] - -If {{es}} returned no documents, the aggregation values will be `null` or the buckets array will be empty: - -```json -{ - "aggregations": { - "A": { "value": null } - } -} -``` - -A `null` value means no data matched the query during the evaluation window. If you have **no data** alerts configured, this is the state that triggers them. Check the time range and query filter to confirm no documents were genuinely present, or investigate whether an index or data view configuration issue is preventing data from being found. +## What the inspector shows [inspect-tabs] + +The inspector has two tabs: + +**Request** +: Shows the full {{es}} query that the rule sends when it evaluates your data. Use it to verify the index pattern, time range, query filter, and aggregations match what you configured in the rule. + +**Response** +: Shows the raw {{es}} response. Use it to confirm whether data was found, whether the groups you expect are present, and what values the rule was working with when it made its alerting decision. ## Common troubleshooting scenarios [inspect-troubleshoot] From bd245d6288fd716f7a59d49bc81addea241004d1 Mon Sep 17 00:00:00 2001 From: Nastasha Solomon Date: Thu, 4 Jun 2026 13:02:28 -0400 Subject: [PATCH 5/7] updated structure --- .../alerting/alerts/inspect-rule-queries.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/explore-analyze/alerting/alerts/inspect-rule-queries.md b/explore-analyze/alerting/alerts/inspect-rule-queries.md index 4a6ac92c6c..b9a41e69d1 100644 --- a/explore-analyze/alerting/alerts/inspect-rule-queries.md +++ b/explore-analyze/alerting/alerts/inspect-rule-queries.md @@ -19,19 +19,17 @@ Currently, the rule query inspector is only available for **custom threshold rul ## Access the inspector [inspect-access] -The inspector is available from two places, each showing a different query: +The inspector is available from two places, each showing a different query: the rule details page (shows current rule parameters) and the alert details page (shows historical parameters) -**From the rule details page (current rule parameters)** -: Open **{{stack-manage-app}}** > **{{rules-ui}}**, find your rule, and click its name to open the rule details page. Click **Rule query inspector**. The inspector builds the query from the rule's _current_ parameters. Use this view to verify that the rule is configured correctly and would match the data you expect. +* **Rule details page**: Open **{{stack-manage-app}}** > **{{rules-ui}}**, find your rule, and click its name to open the rule details page. Click **Rule query inspector**. The inspector builds the query from the rule's _current_ parameters. Use this view to verify that the rule is configured correctly and would match the data you expect. -**From an alert details page (historical parameters)** -: Go to the **Alerts** page, then open an individual alert. Click **Rule query inspector**. The inspector uses the rule parameters _as they existed when that specific alert fired_, including the exact evaluation time range. Use this view to understand why a particular alert was or wasn't triggered. +* **Alert details page**: Go to the **Alerts** page, then open an individual alert. Click **Rule query inspector**. The inspector uses the rule parameters _as they existed when that specific alert was generated_, including the exact evaluation time range. Use this view to understand why a particular alert was or wasn't triggered. -The key difference: the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert fired, the two inspectors will show different queries. +The key difference is that the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert fired, the two inspectors will show different queries. ## What the inspector shows [inspect-tabs] -The inspector has two tabs: +The inspector has a **Request** and **Response** tab. **Request** : Shows the full {{es}} query that the rule sends when it evaluates your data. Use it to verify the index pattern, time range, query filter, and aggregations match what you configured in the rule. From f637384f0404788bfe7c297b907789c1e4c746eb Mon Sep 17 00:00:00 2001 From: Nastasha Solomon Date: Thu, 4 Jun 2026 15:53:56 -0400 Subject: [PATCH 6/7] Final changes --- .../alerts/alerting-troubleshooting.md | 1 + .../alerting/alerts/inspect-rule-queries.md | 60 +++++++++++++------ .../create-manage-rules.md | 4 +- .../triage-threshold-breaches.md | 8 +-- 4 files changed, 48 insertions(+), 25 deletions(-) diff --git a/explore-analyze/alerting/alerts/alerting-troubleshooting.md b/explore-analyze/alerting/alerts/alerting-troubleshooting.md index bfff300e4f..82335d629b 100644 --- a/explore-analyze/alerting/alerts/alerting-troubleshooting.md +++ b/explore-analyze/alerting/alerts/alerting-troubleshooting.md @@ -29,6 +29,7 @@ The following debugging tools are available: * {{kib}} versions 7.10 and above have a [Test connector](testing-connectors.md) UI. * {{kib}} versions 7.11 and above include improved Webhook error messages, better overall debug logging for actions and connectors, and Task Manager [diagnostics endpoints](../../../troubleshoot/kibana/task-manager.md#task-manager-diagnosing-root-cause). +* The [rule query inspector](inspect-rule-queries.md) lets you view the {{es}} request a rule sends during evaluation, confirm the rule is targeting the right data, and investigate rule behavior. ## Using rules and connectors list for the current state and finding issues [alerting-managment-detail] diff --git a/explore-analyze/alerting/alerts/inspect-rule-queries.md b/explore-analyze/alerting/alerts/inspect-rule-queries.md index b9a41e69d1..e7c22ba389 100644 --- a/explore-analyze/alerting/alerts/inspect-rule-queries.md +++ b/explore-analyze/alerting/alerts/inspect-rule-queries.md @@ -1,6 +1,6 @@ --- -navigation_title: Diagnose rule behavior -description: Use the rule query inspector to view the Elasticsearch request behind a rule and diagnose why an alert did or didn't fire. +navigation_title: Inspect rule queries +description: Use the rule query inspector to view the Elasticsearch request behind a rule, confirm it targets the right data, and diagnose why an alert was or wasn't generated. applies_to: stack: ga 9.5 serverless: ga @@ -8,9 +8,9 @@ products: - id: kibana --- -# Diagnose rule behavior with the rule query inspector [inspect-rule-queries] +# Inspect rule queries [inspect-rule-queries] -The rule query inspector lets you view the {{es}} request that a rule sends when it evaluates your data. Use it to understand the query structure, confirm the rule is targeting the right data, and diagnose why an alert did or didn't fire. +The rule query inspector lets you view the {{es}} request that a rule sends when it evaluates your data. Use it to understand the query structure, confirm the rule is targeting the right data, and diagnose why an alert was or wasn't generated. ::::{note} :applies_to: {"stack": "ga 9.5", "serverless": "ga"} @@ -19,42 +19,64 @@ Currently, the rule query inspector is only available for **custom threshold rul ## Access the inspector [inspect-access] -The inspector is available from two places, each showing a different query: the rule details page (shows current rule parameters) and the alert details page (shows historical parameters) +The inspector is available from two places, each showing a different query: * **Rule details page**: Open **{{stack-manage-app}}** > **{{rules-ui}}**, find your rule, and click its name to open the rule details page. Click **Rule query inspector**. The inspector builds the query from the rule's _current_ parameters. Use this view to verify that the rule is configured correctly and would match the data you expect. * **Alert details page**: Go to the **Alerts** page, then open an individual alert. Click **Rule query inspector**. The inspector uses the rule parameters _as they existed when that specific alert was generated_, including the exact evaluation time range. Use this view to understand why a particular alert was or wasn't triggered. -The key difference is that the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert fired, the two inspectors will show different queries. +The key difference is that the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert was generated, the two inspectors will show different queries. ## What the inspector shows [inspect-tabs] -The inspector has a **Request** and **Response** tab. +The inspector displays the {{es}} query the rule sent, the raw response it received, and how long the query took to execute. -**Request** -: Shows the full {{es}} query that the rule sends when it evaluates your data. Use it to verify the index pattern, time range, query filter, and aggregations match what you configured in the rule. +| Element | Description | +| --- | --- | +| **Criterion dropdown** | Appears when a rule has multiple criteria. Each entry is labeled with its criterion number and metric (for example, `Criterion 1: avg(system.cpu.total.norm.pct)`). Selecting a criterion updates both the **Request** and **Response** tabs to show the query and results for that specific condition. | +| **Request** | Shows the full {{es}} query that the rule sends when it evaluates your data. Use it to verify the index pattern, time range, query filter, and aggregations match what you configured in the rule. | +| **Response** | Shows the raw {{es}} response. Use it to confirm whether data was found, whether the groups you expect are present, and what values the rule was working with when it made its alerting decision. | +| **Request time** | Shows how long {{es}} took to execute the query. This measures the query portion of rule execution only. It doesn't include time spent waiting in the task queue or processing actions after the query returns. Use it to identify whether the query itself is the bottleneck when a rule is slow. | -**Response** -: Shows the raw {{es}} response. Use it to confirm whether data was found, whether the groups you expect are present, and what values the rule was working with when it made its alerting decision. +## Factors that affect request time [inspect-request-time-factors] -## Common troubleshooting scenarios [inspect-troubleshoot] +The request time can be affected by the following factors. When optimizing for performance, verify that any changes don't affect the rule's detection logic, for example, a shorter time window or tighter filter may prevent the rule from catching the conditions it was designed to detect. -:::{dropdown} Alert fired but I don't know why -Open the inspector from the alert details page. Review the time range to confirm it matches the evaluation period. Find the aggregation bucket for your group and check the value against the threshold. If the value exceeds the threshold, the alert fired correctly. +| Factor | Why it increases execution time | How to reduce it | +| --- | --- | --- | +| **Index size** | Rules that search indices with more documents take longer to execute. | Add a KQL filter to narrow the documents the rule searches. | +| **Query complexity** | Metric aggregations such as average, rate, or percentile are heavier than a simple count. | Simplify criteria or swap a complex aggregation for a lighter one where possible. | +| **Number of criteria** | Each criterion is a separate {{es}} query. | Reduce the number of criteria in the rule. | +| **Group-by cardinality** | Grouping by a high-cardinality field (such as `host.name` with thousands of hosts) significantly increases query cost. | Choose a lower-cardinality field, or apply a KQL filter to narrow the population before grouping. | +| **Shard count and cluster load** | Query time increases when {{es}} is under heavy load. | This is outside the rule configuration. If high cluster load is consistent, consider reviewing your cluster sizing or spreading rule evaluation across off-peak hours. | +| **Time window size** | A longer window means {{es}} must scan more data. | Shorten the time window in the rule configuration. | + +## Using the inspector [inspect-troubleshoot] + +Expand the following to learn how the inspector can help. + +:::{dropdown} Confirm why an alert was generated +Open the inspector from the alert details page. Review the time range to confirm it matches the evaluation period. Find the aggregation bucket for your group and check the value against the threshold. If the value exceeds the threshold, the alert was generated correctly. ::: -:::{dropdown} Alert didn't fire when I expected it to +:::{dropdown} Investigate why an alert wasn't generated Open the inspector from the rule details page and confirm the query targets the right index pattern and time range. Check the query filter for unintended restrictions. If the aggregation values in the response are below the threshold, the rule evaluated correctly but your data didn't breach the threshold during that window. ::: -:::{dropdown} Rule looks correct now but the alert used different parameters -If you've modified the rule since the alert fired, open the inspector from the _alert details page_ rather than the rule details page. The alert inspector uses the parameters that were active at the time the alert fired, so the query will reflect the older configuration. +:::{dropdown} Compare the current rule configuration to a historical alert +If you've modified the rule since the alert was generated, open the inspector from the _alert details page_ rather than the rule details page. The alert inspector uses the parameters that were active at the time the alert was generated, so the query will reflect the older configuration. ::: -:::{dropdown} Empty or null aggregation values +:::{dropdown} Identify why the response shows no data The query matched no documents. Check whether the index pattern in the data view is correct, whether your time range is appropriate, and whether any query filter is too restrictive. Also verify that the data stream or index has data in the expected time period by running the same query in [Discover](/explore-analyze/discover.md) or [Dev Tools](/explore-analyze/query-filter/tools/console.md). ::: -:::{dropdown} Unexpected group missing from results +:::{dropdown} Find out why a group is missing from the results If a group you expected (such as a specific host) doesn't appear in the buckets, no documents for that group matched the query during the evaluation window. This can happen when the group was inactive, when a filter excluded its documents, or when the field used for grouping has a different value in the actual documents than you expected. ::: + +:::{dropdown} Diagnose a slow or timing-out rule +Check the request time in the inspector. A high request time means the Elasticsearch query is likely the cause. To reduce it, simplify the query by reducing the number of criteria, shortening the time window, adding a tighter KQL filter, or reducing group-by cardinality. If the rule has multiple criteria, use the dropdown to compare request times across criteria and identify which condition is the most expensive. + +If the request time is near zero, the query isn't the bottleneck and the timeout is likely caused by something else, such as task manager queue pressure or scheduling overhead. For broader investigation, including how to identify long-running rules using the event log and how to adjust timeout settings, refer to [Rules take a long time to run](alerting-common-issues.md#rules-long-run-time). +::: diff --git a/solutions/observability/incident-management/create-manage-rules.md b/solutions/observability/incident-management/create-manage-rules.md index 64af12ec03..65a6733cae 100644 --- a/solutions/observability/incident-management/create-manage-rules.md +++ b/solutions/observability/incident-management/create-manage-rules.md @@ -105,9 +105,9 @@ serverless: ga Currently, the rule query inspector is only available for **custom threshold rules**. :::: -From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, including the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert fires. +From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, including the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert is generated. -For an explanation of the query structure and how to read the response, refer to [Diagnose rule behavior with the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). +For troubleshooting guidance and an explanation of what the inspector shows, refer to [Diagnose rule behavior with the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). ## Snooze and disable rules [observability-create-manage-rules-snooze-and-disable-rules] diff --git a/solutions/observability/incident-management/triage-threshold-breaches.md b/solutions/observability/incident-management/triage-threshold-breaches.md index 59456cb406..32d14ff029 100644 --- a/solutions/observability/incident-management/triage-threshold-breaches.md +++ b/solutions/observability/incident-management/triage-threshold-breaches.md @@ -58,17 +58,17 @@ stack: ga 9.5 serverless: ga ``` -To understand exactly what {{es}} evaluated when the alert fired, use the rule query inspector. From the alert details page, click **Rule query inspector**. +To understand exactly what {{es}} evaluated when the alert was generated, use the rule query inspector. From the alert details page, click **Rule query inspector**. -Unlike the inspector on the rule details page, which reflects the rule's _current_ parameters, the inspector on an alert details page uses the parameters that were active at the time the alert fired, including the exact evaluation time range stored on the alert document. This makes it useful for investigating historical alerts, especially if the rule has been edited since. +Unlike the inspector on the rule details page, which reflects the rule's _current_ parameters, the inspector on an alert details page uses the parameters that were active at the time the alert was generated, including the exact evaluation time range stored on the alert document. This makes it useful for investigating historical alerts, especially if the rule has been edited since. Use the inspector to: -- Confirm the time range that was evaluated when the alert fired. +- Confirm the time range that was evaluated when the alert was generated. - Review the aggregation values that were returned and compare them to the threshold. - Identify whether a missing group, a too-restrictive filter, or unexpected data caused the alert to behave differently than expected. -For an explanation of the query structure, aggregation types, and how to read the response, refer to [the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). +For more information, refer to [Inspect rule queries](/explore-analyze/alerting/alerts/inspect-rule-queries.md). After investigating the alert, you may want to: From c6319fe6da1c6899bd6a6a1cdb74be4d8073daa1 Mon Sep 17 00:00:00 2001 From: Nastasha Solomon Date: Thu, 4 Jun 2026 15:59:36 -0400 Subject: [PATCH 7/7] update ref --- .../observability/incident-management/create-manage-rules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/solutions/observability/incident-management/create-manage-rules.md b/solutions/observability/incident-management/create-manage-rules.md index 65a6733cae..0c10640b58 100644 --- a/solutions/observability/incident-management/create-manage-rules.md +++ b/solutions/observability/incident-management/create-manage-rules.md @@ -107,7 +107,7 @@ Currently, the rule query inspector is only available for **custom threshold rul From the rule details page, click **Rule query inspector** to view the Elasticsearch request the rule sends during evaluation. The inspector builds the query from the rule's current parameters, including the data view, query filter, time window, and aggregations you've configured. Use it to confirm the rule is targeting the right data before an alert is generated. -For troubleshooting guidance and an explanation of what the inspector shows, refer to [Diagnose rule behavior with the rule query inspector](/explore-analyze/alerting/alerts/inspect-rule-queries.md). +For troubleshooting guidance and an explanation of what the inspector shows, refer to [Inspect rule queries](/explore-analyze/alerting/alerts/inspect-rule-queries.md). ## Snooze and disable rules [observability-create-manage-rules-snooze-and-disable-rules]