feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284
feat: add CSE timing regression tests for all Linux VHDs (Ubuntu 22.04/24.04, Azure Linux V3)#8284
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a CSE bootstrap latency regression on Ubuntu golden-image VHDs by avoiding first-boot apt initialization/lock contention, and adds E2E timing checks to detect future regressions.
Changes:
- Switch cached-package installation and walinuxagent hold/unhold operations to dpkg-based fast paths (with apt fallbacks).
- Export
FULL_INSTALL_REQUIREDso it’s available in subshell contexts. - Add E2E CSE timing extraction + two Ubuntu 22.04 performance scenarios with thresholds.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| parts/linux/cloud-init/artifacts/ubuntu/cse_helpers_ubuntu.sh | Use dpkg --set-selections for walinuxagent hold/unhold and dpkg -i for cached .deb installs; add wait_for_dpkg_lock. |
| parts/linux/cloud-init/artifacts/cse_main.sh | Export FULL_INSTALL_REQUIRED after determining install mode. |
| e2e/cse_timing.go | New helper to extract CSE task durations from Guest Agent event JSON and validate against thresholds. |
| e2e/scenario_cse_perf_test.go | New Ubuntu 22.04 cached/full-install performance scenarios enforcing timing thresholds. |
| .github/workflows/pr-lint.yaml | Bump actions/github-script from v8 to v9. |
a7b9d57 to
6f86252
Compare
6f86252 to
0a1d334
Compare
f88e68a to
ae65394
Compare
547466d to
17f1e2d
Compare
…k contention Replace apt-get/apt-mark with dpkg equivalents for cached (golden image) VHDs: - installDebPackageFromFile: use 'dpkg -i' instead of 'apt-get -f install' when FULL_INSTALL_REQUIRED=false. The first apt-get call after VHD boot triggers expensive apt initialization (~30-50s) and holds dpkg locks. dpkg -i bypasses apt entirely and completes in <100ms for pre-cached packages. - aptmarkWALinuxAgent: use 'dpkg --set-selections' instead of 'apt-mark hold/unhold'. apt-mark internally initializes the apt cache even for simple hold operations, causing 10-30s delays on first invocation. Both functions include fallback to the original apt-based commands if dpkg fails. Add CSE timing regression detection framework: - e2e/cse_timing.go: Extracts CSE task timings from VM event JSON files - e2e/scenario_cse_perf_test.go: Two perf scenarios (cached + full install paths) E2E test results with fix (Ubuntu 22.04 golden image): - aptmarkWALinuxAgent: 21.7s → 0.06s (360x faster) - kubelet deb install: 47.6s → 0.04s (1190x faster) - Total CSE time: 19.65s (down from 47s+ in production) Root cause: PR #8105 removed installContainerRuntime's natural ~10s delay that masked the first-apt-call initialization cost. This fix addresses the underlying issue instead of re-introducing the delay. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix nil map panic in Test_Ubuntu2204_CSE_FullInstallPerformance by initializing vmss.Tags before setting SkipBinaryCleanup tag - Log parse errors in ExtractCSETimings instead of silently skipping, with warning counts for visibility into format drift - Fail test when no CSE tasks were parsed (empty report would silently pass all thresholds, making regression detection ineffective) - Fail test when AKS.CSE.cse_start task is missing (critical for total CSE duration validation) - Return error from ExtractCSETimings when zero tasks are parsed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds Refactor ValidateCSETimings to run each threshold check as a named sub-test (t.Run). Since the E2E pipeline already uses gotestsum to generate JUnit XML published via PublishTestResults@2, ADO Test Analytics will now track each CSE task individually: Test_Ubuntu2204_CSE_CachedPerformance/TotalCSEDuration Test_Ubuntu2204_CSE_CachedPerformance/Task_installDebPackageFromFile Test_Ubuntu2204_CSE_CachedPerformance/Task_aptmarkWALinuxAgent ... This enables per-task pass/fail trends and regression detection in ADO dashboards without any pipeline changes. Also tighten thresholds to ~2-3x observed healthy values: - Cached: TotalCSE 60s→35s, installDeb 25s→10s, aptmark 10s→5s - Full: TotalCSE 120s→90s, installDeps 90s→60s Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…p99) Thresholds derived from GuestAgentGenericLogs table (FA/azcore cluster), ~35K samples per task over 30 minutes on Ubuntu 22.04: Cached path (set at ~p95): - installDebPackageFromFile: 22s (prod p95=21.55s) - aptmarkWALinuxAgent: 24s (prod p90=23.32s, bimodal) - configureKubeletAndKubectl: 27s (prod p95=26.06s) - ensureContainerd: 3s (prod p95=1.99s) - ensureKubelet: 10s (prod p95=6.20s) Full install path (set at ~p99): - installDebPackageFromFile: 45s (prod p99=42.88s) - aptmarkWALinuxAgent: 60s (prod p99=58.07s) - configureKubeletAndKubectl: 45s (prod p99=44.39s) - extractKubeBinaries: 16s (prod p99=15.21s) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…esholds Add DefaultTaskThreshold to CSETimingThresholds — any task exceeding this duration that has no specific threshold automatically becomes a sub-test in ADO Pipeline Analytics. This ensures newly added CSE tasks are tracked without code changes. Expanded specific thresholds from 5 → 18 tasks (cached) and 8 → 21 (full install), covering all high-frequency operations observed in production: - Kubelet install variants (FromPkg, FromURL, extractKubeBinaries) - Credential provider variants (FromUrl, FromPkg, FromPMC, download) - Node config (configureNodeExporter, retrycmd_nslookup, ensureSnapshotUpdate) - Container runtime (installContainerRuntime, installStandaloneContainerd) All thresholds from Ubuntu 22.04 prod telemetry (GuestAgentGenericLogs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add OS-specific threshold sets derived from production telemetry (GuestAgentGenericLogs, FA/azcore cluster): - Ubuntu 24.04: similar to 22.04 but less apt lock contention (aptmarkWALinuxAgent p50=4.45s vs 0.49s bimodal on 22.04) - Azure Linux V3: RPM-based, no apt/deb tasks. Higher baseline for configureKubeletAndKubectl (p50=4.56s) and installKubeletKubectlFromPkg (p50=29.03s). Includes ensureNoDupOnPromiscuBridge and enableLocalDNS. Each OS gets cached + full install test scenarios (6 total now): - Test_Ubuntu2204_CSE_CachedPerformance / FullInstallPerformance - Test_Ubuntu2404_CSE_CachedPerformance / FullInstallPerformance - Test_AzureLinuxV3_CSE_CachedPerformance / FullInstallPerformance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused ctx param from LogReport (fixes golangci-lint unparam) - Use Fatalf for missing cse_start (prevents 0-duration silently passing) - Add newline delimiter between JSON files in find -exec to prevent concatenation - Make subtest names unique by including task short name to avoid collisions - Assert all configured threshold suffixes matched at least one task - Fix DefaultTaskThreshold comment to match actual 45s value Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The E2E framework wraps *testing.T in toolkit.testLogger, so the direct type assertion s.T.(*testing.T) fails. Add UnwrapTestingT() helper to recursively extract the underlying *testing.T. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… comments - Bump installStandaloneContainerd threshold from 2s to 5s across all threshold sets (E2E saw 3.27s — prod p99 is only 0.46s but E2E infra variance is higher) - Fix map iteration nondeterminism: sort suffixes by length descending for deterministic longest-match-first behavior - Promote unmatched threshold suffix warning to Errorf so missing tasks fail the test - Remove no-op empty BootstrapConfigMutator closures and tag-only VMConfigMutator Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add detailed comments explaining OperationId is a timestamp (not a GUID), referencing logs_to_events() in cse_helpers.sh - Add comment on cseEventsDir explaining path matches EVENTS_LOGGING_DIR in both cse_helpers.sh and cse_start.sh (not per-handler subdirectories) - Switch from line-based JSON parsing to json.Decoder for robustness against multi-line or pretty-printed JSON files - Threshold suffixes already sorted by length (longest first) for deterministic matching — addressed in prior commit - Unmatched suffixes already use Errorf to fail the test — addressed in prior commit Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
17f1e2d to
9af8911
Compare
| // Extract short name: "AKS.CSE.foo.bar" → "bar", or use full name if no dots | ||
| shortName := task.TaskName | ||
| if idx := strings.LastIndex(shortName, "."); idx >= 0 { | ||
| shortName = shortName[idx+1:] | ||
| } | ||
| defaultThreshold := thresholds.DefaultTaskThreshold | ||
| tRunner.Run(fmt.Sprintf("Task_%s", shortName), func(t *testing.T) { |
There was a problem hiding this comment.
Using only the last dot-segment as the sub-test name can cause collisions when different tasks share the same suffix (e.g., multiple ...setup tasks). Go’s test runner will auto-disambiguate duplicate names, but ADO analytics can become harder to interpret. Consider incorporating a deterministic, sanitized full task name (or a longer stable prefix) into Task_... to keep tracking unambiguous.
| // Extract short name: "AKS.CSE.foo.bar" → "bar", or use full name if no dots | |
| shortName := task.TaskName | |
| if idx := strings.LastIndex(shortName, "."); idx >= 0 { | |
| shortName = shortName[idx+1:] | |
| } | |
| defaultThreshold := thresholds.DefaultTaskThreshold | |
| tRunner.Run(fmt.Sprintf("Task_%s", shortName), func(t *testing.T) { | |
| // Use a deterministic sanitized full task name to avoid collisions between | |
| // tasks that share the same suffix, e.g. multiple "...setup" tasks. | |
| sanitizedTaskName := strings.NewReplacer(".", "_", "/", "_", " ", "_", ":", "_", "-", "_").Replace(task.TaskName) | |
| defaultThreshold := thresholds.DefaultTaskThreshold | |
| tRunner.Run(fmt.Sprintf("Task_%s", sanitizedTaskName), func(t *testing.T) { |
| // Use k8s 1.34.4 because that's what has cached deb packages on the VHD. | ||
| // The default 1.30 only has tarballs, not .deb files, so it would never | ||
| // exercise the installDebPackageFromFile code path. | ||
| nbc.ContainerService.Properties.OrchestratorProfile.OrchestratorVersion = "1.34.4" | ||
| nbc.AgentPoolProfile.KubernetesConfig.CustomKubeProxyImage = "mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.34.4" |
There was a problem hiding this comment.
Hard-coding Kubernetes and kube-proxy versions in E2E perf tests can create unrelated failures as supported versions/images evolve (e.g., image tag deprecation, future policy changes, or test environment defaults). To reduce operational toil, consider centralizing these versions in shared E2E config/constants (or deriving the kube-proxy tag from the orchestrator version) so updates happen in one place.
| // parseCSETimestamp parses the timestamp format used by logs_to_events: "YYYY-MM-DD HH:MM:SS.mmm" | ||
| func parseCSETimestamp(s string) (time.Time, error) { | ||
| layouts := []string{ | ||
| "2006-01-02 15:04:05.000", | ||
| "2006-01-02 15:04:05", | ||
| } | ||
| for _, layout := range layouts { | ||
| if t, err := time.Parse(layout, s); err == nil { | ||
| return t, nil | ||
| } | ||
| } | ||
| return time.Time{}, fmt.Errorf("cannot parse CSE timestamp %q", s) | ||
| } |
There was a problem hiding this comment.
This introduces new parsing behavior that is easy to unit test (valid timestamps with/without milliseconds, invalid formats, and boundary values). Adding a small Go unit test for parseCSETimestamp (and optionally for the suffix-matching/threshold selection behavior) would help prevent regressions without requiring a full E2E run.
Summary
Adds CSE (Custom Script Extension) timing regression tests for all supported Linux VHD builds, with thresholds derived from production telemetry (
GuestAgentGenericLogs, FA database on azcore cluster).Each CSE task runs as a
t.Run()sub-test, so ADO pipeline analytics can individually track pass/fail rates over time — no pipeline changes needed (existing gotestsum → JUnit →PublishTestResults@2flow picks them up automatically).Test Scenarios (6 total)
Test_Ubuntu2204_CSE_CachedPerformanceTest_Ubuntu2204_CSE_FullInstallPerformanceTest_Ubuntu2404_CSE_CachedPerformanceTest_Ubuntu2404_CSE_FullInstallPerformanceTest_AzureLinuxV3_CSE_CachedPerformanceTest_AzureLinuxV3_CSE_FullInstallPerformanceHow It Works
/var/log/azure/Microsoft.Azure.Extensions.CustomScript/*/events/*.jsonfiles from the VM to extract per-task start/end timestampst.Run()sub-test (e.g.,aptmarkWALinuxAgent,installDebPackageFromFile) — individually tracked in ADO Test AnalyticsDefaultTaskThresholdcheck (45s cached / 60s full install) — automatically discovers new slow tasksThreshold Methodology
All thresholds are derived from real production data queried from the
GuestAgentGenericLogsKusto table:extract("^Completed: ([a-zA-Z_]+)", 1, Context1)strips parameters from task namesKey OS Differences Discovered
aptmarkWALinuxAgentinstallDebPackageFromFileinstallKubeletKubectlFromPkgconfigureKubeletAndKubectlensureNoDupOnPromiscuBridge,enableLocalDNSWhat Regressions This Catches
aptmarkWALinuxAgentto block on dpkg lockinstallDebPackageFromFile/installKubeletKubectlFromPkglatencyFiles Changed
e2e/scenario_cse_perf_test.go— Test scenarios and OS-specific threshold definitionse2e/cse_timing.go— CSE timing extraction, validation logic,DefaultTaskThresholdcatch-all