Skip to content

Fleet Monitoring Docs#1794

Draft
johannesfrey wants to merge 1 commit intomainfrom
observability-guides-2
Draft

Fleet Monitoring Docs#1794
johannesfrey wants to merge 1 commit intomainfrom
observability-guides-2

Conversation

@johannesfrey
Copy link
Copy Markdown
Contributor

WIP

Content Description

Preview Link

Internal Reference

Closes DOC-

AI review: mention @claude in a comment to request a review or changes. See CONTRIBUTING.md for available commands.

@netlify /docs

@netlify
Copy link
Copy Markdown

netlify bot commented Mar 3, 2026

Deploy Preview for vcluster-docs-site ready!

Name Link
🔨 Latest commit 7101e08
🔍 Latest deploy log https://app.netlify.com/projects/vcluster-docs-site/deploys/69a7416c2c13c30008085157
😎 Deploy Preview https://deploy-preview-1794--vcluster-docs-site.netlify.app/docs
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

architecture. Those can be applied with modifications to your actual use cases.
:::

This guide explains how to configure a monitoring architecture with Prometheus
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.OxfordComma] Use the Oxford comma in 'This guide explains how to configure a monitoring architecture with Prometheus to collect workload metrics from across multiple virtual clusters and aggregate by cluster, project and'.

- A Prometheus Operator (to scrape virtual cluster own metrics via `ServiceMonitors`) and a Prometheus Agent (remote_writer) per Cluster
- A Prometheus Agent (remote_writer) per virtual cluster with private nodes (Private Nodes Tenancy Model).

## Configuration Prerequisites
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'Configuration Prerequisites' should use sentence-style capitalization.

## Configuration Prerequisites

The reachable central prometheus must be configured as a remote write receiver.
E.g. following helm values would suffice for that:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Loft.capitalize-helm-project] 'Helm' should be capitalized when referring to the project.

## Configuration Prerequisites

The reachable central prometheus must be configured as a remote write receiver.
E.g. following helm values would suffice for that:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [vale] reported by reviewdog 🐶
[Vale.Terms] Use 'Helm' instead of 'helm'.

**2. Configure Helm values:**

Save the following as `prometheus-virtualcluster-values.yaml` and set the name of the virtual
cluster. This is necessary in order to be able to aggregate any workload
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.WordList] Use 'to' instead of 'in order to'.


The vCluster Platform agent emits a set of custom metrics carrying information
about virtual clusters as labels. These metrics always return `1` and can
therefore be joined via PromQL in order to make those labels available for
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.WordList] Use 'to' instead of 'in order to'.


Following labels are attached:

- `kind`: `VirtualClusterInstance`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Loft.kubernetes-api-kinds] Kubernetes/Platform API kinds like 'VirtualClusterInstance' should not use backticks. Write them as plain text (e.g., StatefulSet not StatefulSet).


### Latency

#### kube-apiserver request latency (p99, by verb) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver request latency (p99, by verb) (virtual cluster only)' should use sentence-style capitalization.

type (GET, LIST, PUT, POST, PATCH, DELETE, WATCH). The p99 captures outliers
that averages hide. WATCH is expected to show 60s (long-poll).

#### kube-apiserver request latency (p95, non-WATCH) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver request latency (p95, non-WATCH) (virtual cluster only)' should use sentence-style capitalization.

**Why:** Excludes long-running connections to focus on latency for synchronous
API calls.

#### etcd backend latency (p99, by operation) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'etcd backend latency (p99, by operation) (virtual cluster only)' should use sentence-style capitalization.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 3, 2026

@johannesfrey johannesfrey changed the title WIP Fleet Monitoring Docs Mar 3, 2026
@johannesfrey johannesfrey force-pushed the observability-guides-2 branch from fc72f2a to 7101e08 Compare March 3, 2026 20:15

### Traffic

#### kube-apiserver request rate (by verb) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver request rate (by verb) (virtual cluster only)' should use sentence-style capitalization.


**Why:** The most fundamental measure of cluster workload. Shows how many requests per second the API server handles, broken down by verb.

#### kube-apiserver request rate (by resource) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver request rate (by resource) (virtual cluster only)' should use sentence-style capitalization.


**Why:** Measures platform-level traffic through the gateway, split by Kubernetes API proxy, auth, and UI.

#### REST client outbound request rate (by code) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'REST client outbound request rate (by code) (virtual cluster only)' should use sentence-style capitalization.


### Errors

#### kube-apiserver error rate (4xx/5xx, by code) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver error rate (4xx/5xx, by code) (virtual cluster only)' should use sentence-style capitalization.


**Why:** HTTP-level error rates.

#### kube-apiserver error ratio (errors / total) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver error ratio (errors / total) (virtual cluster only)' should use sentence-style capitalization.


**Why:** Container runtime failures (image pulls, container create/start failures).

#### REST client error rate (outbound 5xx) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'REST client error rate (outbound 5xx) (virtual cluster only)' should use sentence-style capitalization.


**Why:** Shows which pods are being throttled by cgroup CPU limits.

#### kube-apiserver inflight requests (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver inflight requests (virtual cluster only)' should use sentence-style capitalization.


**Why:** Shows current request concurrency for mutating vs read-only. When this approaches flow control limits, requests start queuing.

#### kube-apiserver flow-control queue depth (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'kube-apiserver flow-control queue depth (virtual cluster only)' should use sentence-style capitalization.


**Why:** Controller work queues. Growing depth = controllers can't keep up with the event rate.

#### WATCH connection count (long-running requests) (virtual cluster only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'WATCH connection count (long-running requests) (virtual cluster only)' should use sentence-style capitalization.


**Why:** Overall node memory pressure.

#### Filesystem usage (by PVC / volume) (vCluster Platform only)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ [vale] reported by reviewdog 🐶
[Google.Headings] 'Filesystem usage (by PVC / volume) (vCluster Platform only)' should use sentence-style capitalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant