fix(ci): harden Android NativeAOT instrumentation test scripts against transient failures#1247
fix(ci): harden Android NativeAOT instrumentation test scripts against transient failures#1247agneszitte wants to merge 1 commit intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR reduces Android NativeAOT instrumentation test flakiness in CI by adding retry logic around known transient failure points and by avoiding downloading an unused Android system image.
Changes:
- Add a
sdkmanager --installretry wrapper with back-off to handle transient install/unzip failures. - Stop downloading the unused API-36 emulator system image; install only the API-36 platform package.
- Add retry logic for
adb install -rduring emulator startup.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| build/scripts/android-sdk-emu.inc.sh | Adds sdkmanager_install() retry helper and skips downloading API-36 system image while still installing API-36 platform. |
| build/scripts/android-test-run.sh | Adds retry loop around adb install -r to tolerate transient emulator unresponsiveness. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
41e2828 to
c659a93
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…t transient failures - Add sdkmanager_install() retry wrapper (3 attempts with back-off) for all sdkmanager --install calls (android-sdk-emu.inc.sh) - Stop downloading unused API-36 system image (~1.5 GB saving); only install platforms;android-36 needed for build-tools - Add adb install retry loop (3 attempts) in android-test-run.sh - Guard adb shell keyevent/settings calls with || true in android-uitest-wait-systemui.sh to prevent set -e abort on transient exit 255 during emulator ANR dismissal
c659a93 to
3c1d368
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| fi | ||
| if (( attempt < max_attempts )); then | ||
| echo "sdkmanager --install $* failed (attempt $attempt/$max_attempts), retrying in ${attempt}s..." | ||
| sleep "$attempt" |
There was a problem hiding this comment.
maybe 10x that value, 1~3s might not be enough to escape the same fate as previous attempt(s)
in grand scheme of things, 10~30s(1min total) is really nothing IF we can move forward
There was a problem hiding this comment.
The problem is that we don't understand why it's failing in the first place. It just up and dies, with nothing in the log files. This makes it slightly annoying to reason about.
Summary
Fixes transient CI flakiness in the Android+Skia+NativeAOT Instrumentation Test pipeline introduced by #1245.
Three flakiness modes were observed:
sdkmanager --installdying at "33% Unzipping" — exits with code 1 on transient network/IO issues, andset -euxo pipefailaborts the entire pipeline immediately with no retry. First observed in CI build 204968.adb shell input keyeventreturns exit 255 while dismissing an "Application Not Responding: system" dialog during emulator boot, andset -ekills the script. First observed in canary build 205041 (Attempt 1).adb shell am instrument—adb installcan fail transiently right after emulator boot, again aborting due toset -e.Changes
Fix 1 —
sdkmanager_install()retry wrapper (android-sdk-emu.inc.sh)sdkmanager --installcalls in a function that retries up to 3 times with increasing back-off (1 s, 2 s).Fix 2 — Stop downloading unused API-36 system image (
android-sdk-emu.inc.sh)install_android_sdk 36was downloading the fullsystem-images;android-36;google_apis_playstore;x86_64(~1.5 GB), but the emulator AVD is always created with API-34.platforms;android-36(needed for build-tools/apkanalyzer), saving ~1.5 GB of download and reducing the window for transient failures.Fix 3 —
adb installretry (android-test-run.sh)adb install -r, since the emulator may be transiently unresponsive right after boot.Fix 4 — Guard
adbkeyevent calls against transient exit 255 (android-uitest-wait-systemui.sh)|| trueto all non-criticaladb shell input keyeventandadb shellcalls in the emulator boot-wait loop and post-boot setup.set -e, a transient exit 255 fromadb shell input keyevent KEYCODE_ENTER(while dismissing an ANR dialog) was killing the entire script. The loop can now retry on the next iteration instead of aborting.Related
Note: No related issue (CI maintenance).