-
Notifications
You must be signed in to change notification settings - Fork 234
Document the Rule Query Inspector for threshold rules #6746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
nastasha-solomon
wants to merge
12
commits into
main
Choose a base branch
from
issue-6555
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
48e196e
first draft
nastasha-solomon 878044c
Merge branch 'main' into issue-6555
nastasha-solomon f3e1d5a
Update solutions/observability/incident-management/create-manage-rule…
nastasha-solomon 5b03b56
Update solutions/observability/incident-management/triage-threshold-b…
nastasha-solomon 2ec5d1e
Merge branch 'main' into issue-6555
nastasha-solomon 68c47df
bena's feedback
nastasha-solomon 8128df7
Merge branch 'main' into issue-6555
nastasha-solomon bd245d6
updated structure
nastasha-solomon f637384
Final changes
nastasha-solomon 6861d90
Merge branch 'main' into issue-6555
nastasha-solomon c6319fe
update ref
nastasha-solomon 4216a8c
Merge branch 'issue-6555' of https://github.com/elastic/docs-content …
nastasha-solomon File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| --- | ||
| navigation_title: Inspect rule queries | ||
| description: Use the rule query inspector to view the Elasticsearch request behind a rule, confirm it targets the right data, and diagnose why an alert was or wasn't generated. | ||
| applies_to: | ||
| stack: ga 9.5 | ||
| serverless: ga | ||
| products: | ||
| - id: kibana | ||
| --- | ||
|
|
||
| # Inspect rule queries [inspect-rule-queries] | ||
|
|
||
| The rule query inspector lets you view the {{es}} request that a rule sends when it evaluates your data. Use it to understand the query structure, confirm the rule is targeting the right data, and diagnose why an alert was or wasn't generated. | ||
|
|
||
| ::::{note} | ||
| :applies_to: {"stack": "ga 9.5", "serverless": "ga"} | ||
| Currently, the rule query inspector is only available for **custom threshold rules**. | ||
| :::: | ||
|
|
||
| ## Access the inspector [inspect-access] | ||
|
|
||
| The inspector is available from two places, each showing a different query: | ||
|
|
||
| * **Rule details page**: Open **{{stack-manage-app}}** > **{{rules-ui}}**, find your rule, and click its name to open the rule details page. Click **Rule query inspector**. The inspector builds the query from the rule's _current_ parameters. Use this view to verify that the rule is configured correctly and would match the data you expect. | ||
|
|
||
| * **Alert details page**: Go to the **Alerts** page, then open an individual alert. Click **Rule query inspector**. The inspector uses the rule parameters _as they existed when that specific alert was generated_, including the exact evaluation time range. Use this view to understand why a particular alert was or wasn't triggered. | ||
|
|
||
| The key difference is that the rule details page reflects the rule as it is _now_, while the alert details page reflects the rule as it was _then_. If you've edited the rule since an alert was generated, the two inspectors will show different queries. | ||
|
|
||
| ## What the inspector shows [inspect-tabs] | ||
|
|
||
| The inspector displays the {{es}} query the rule sent, the raw response it received, and how long the query took to execute. | ||
|
|
||
| | Element | Description | | ||
| | --- | --- | | ||
| | **Criterion dropdown** | Appears when a rule has multiple criteria. Each entry is labeled with its criterion number and metric (for example, `Criterion 1: avg(system.cpu.total.norm.pct)`). Selecting a criterion updates both the **Request** and **Response** tabs to show the query and results for that specific condition. | | ||
| | **Request** | Shows the full {{es}} query that the rule sends when it evaluates your data. Use it to verify the index pattern, time range, query filter, and aggregations match what you configured in the rule. | | ||
| | **Response** | Shows the raw {{es}} response. Use it to confirm whether data was found, whether the groups you expect are present, and what values the rule was working with when it made its alerting decision. | | ||
| | **Request time** | Shows how long {{es}} took to execute the query. This measures the query portion of rule execution only. It doesn't include time spent waiting in the task queue or processing actions after the query returns. Use it to identify whether the query itself is the bottleneck when a rule is slow. | | ||
|
|
||
| ## Factors that affect request time [inspect-request-time-factors] | ||
|
|
||
| The request time can be affected by the following factors. When optimizing for performance, verify that any changes don't affect the rule's detection logic, for example, a shorter time window or tighter filter may prevent the rule from catching the conditions it was designed to detect. | ||
|
|
||
| | Factor | Why it increases execution time | How to reduce it | | ||
| | --- | --- | --- | | ||
| | **Index size** | Rules that search indices with more documents take longer to execute. | Add a KQL filter to narrow the documents the rule searches. | | ||
| | **Query complexity** | Metric aggregations such as average, rate, or percentile are heavier than a simple count. | Simplify criteria or swap a complex aggregation for a lighter one where possible. | | ||
| | **Number of criteria** | Each criterion is a separate {{es}} query. | Reduce the number of criteria in the rule. | | ||
| | **Group-by cardinality** | Grouping by a high-cardinality field (such as `host.name` with thousands of hosts) significantly increases query cost. | Choose a lower-cardinality field, or apply a KQL filter to narrow the population before grouping. | | ||
| | **Shard count and cluster load** | Query time increases when {{es}} is under heavy load. | This is outside the rule configuration. If high cluster load is consistent, consider reviewing your cluster sizing or spreading rule evaluation across off-peak hours. | | ||
| | **Time window size** | A longer window means {{es}} must scan more data. | Shorten the time window in the rule configuration. | | ||
|
|
||
| ## Using the inspector [inspect-troubleshoot] | ||
|
|
||
| Expand the following to learn how the inspector can help. | ||
|
|
||
| :::{dropdown} Confirm why an alert was generated | ||
| Open the inspector from the alert details page. Review the time range to confirm it matches the evaluation period. Find the aggregation bucket for your group and check the value against the threshold. If the value exceeds the threshold, the alert was generated correctly. | ||
| ::: | ||
|
|
||
| :::{dropdown} Investigate why an alert wasn't generated | ||
| Open the inspector from the rule details page and confirm the query targets the right index pattern and time range. Check the query filter for unintended restrictions. If the aggregation values in the response are below the threshold, the rule evaluated correctly but your data didn't breach the threshold during that window. | ||
| ::: | ||
|
|
||
| :::{dropdown} Compare the current rule configuration to a historical alert | ||
| If you've modified the rule since the alert was generated, open the inspector from the _alert details page_ rather than the rule details page. The alert inspector uses the parameters that were active at the time the alert was generated, so the query will reflect the older configuration. | ||
| ::: | ||
|
|
||
| :::{dropdown} Identify why the response shows no data | ||
| The query matched no documents. Check whether the index pattern in the data view is correct, whether your time range is appropriate, and whether any query filter is too restrictive. Also verify that the data stream or index has data in the expected time period by running the same query in [Discover](/explore-analyze/discover.md) or [Dev Tools](/explore-analyze/query-filter/tools/console.md). | ||
| ::: | ||
|
|
||
| :::{dropdown} Find out why a group is missing from the results | ||
| If a group you expected (such as a specific host) doesn't appear in the buckets, no documents for that group matched the query during the evaluation window. This can happen when the group was inactive, when a filter excluded its documents, or when the field used for grouping has a different value in the actual documents than you expected. | ||
| ::: | ||
|
|
||
| :::{dropdown} Diagnose a slow or timing-out rule | ||
| Check the request time in the inspector. A high request time means the Elasticsearch query is likely the cause. To reduce it, simplify the query by reducing the number of criteria, shortening the time window, adding a tighter KQL filter, or reducing group-by cardinality. If the rule has multiple criteria, use the dropdown to compare request times across criteria and identify which condition is the most expensive. | ||
|
|
||
| If the request time is near zero, the query isn't the bottleneck and the timeout is likely caused by something else, such as task manager queue pressure or scheduling overhead. For broader investigation, including how to identify long-running rules using the event log and how to adjust timeout settings, refer to [Rules take a long time to run](alerting-common-issues.md#rules-long-run-time). | ||
| ::: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.