Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
5a3a26c
Add docs-driven playground e2e coverage
whoiskatrin Apr 4, 2026
e7c5ebe
refine playground e2e: gitignore generated files, shared types, test …
whoiskatrin Apr 4, 2026
3ecc8fb
revert unused email demo testability changes (no email E2E tests exis…
whoiskatrin Apr 4, 2026
8158a18
feat(playground): replace manual E2E specs with AI-driven test runner
whoiskatrin Apr 4, 2026
5e292a5
fix(playground): add diagnostic logging + fix test.skip pattern
whoiskatrin Apr 4, 2026
4ad67b1
fix(playground): handle Workers AI auto-parsed JSON responses
whoiskatrin Apr 4, 2026
6f7aab6
perf(playground): 4x speedup — parallel workers + fewer retries
whoiskatrin Apr 4, 2026
b33581d
fix(playground): strict mode violations + malformed JSON handling
whoiskatrin Apr 4, 2026
2c19c4d
fix(playground): teach LLM correct event log format + validate actions
whoiskatrin Apr 4, 2026
6547e11
fix(playground): handle paragraph role misuse by LLM
whoiskatrin Apr 4, 2026
ed77f39
feat(playground): switch to Claude Opus via AI Gateway + executor res…
whoiskatrin Apr 5, 2026
11fa345
fix(playground): handle regex literals in JSON, multi-array responses…
whoiskatrin Apr 5, 2026
13ddca3
fix(playground): escape regex metacharacters, flexible whitespace, em…
whoiskatrin Apr 5, 2026
6999064
fix(playground): stabilize ai e2e executor fallbacks
whoiskatrin Apr 5, 2026
7a9004d
fix(playground): harden remaining ai e2e fallbacks
whoiskatrin Apr 5, 2026
dcabb2d
test(playground): exclude unstable ai runner scenarios
whoiskatrin Apr 5, 2026
b09b8b9
fix(playground): make ai runner route-aware
whoiskatrin Apr 5, 2026
50ff8ee
fix(playground): align ai e2e scenarios with current ui
whoiskatrin Apr 5, 2026
13d8a13
fix(workflows): stabilize generated ids and playground e2e
whoiskatrin Apr 5, 2026
5bf912a
test(playground): deflake chat rooms and logs
whoiskatrin Apr 5, 2026
a3732e9
test(playground): harden remaining ai runner flake
whoiskatrin Apr 5, 2026
0230dec
fix(playground): normalize stale ai runner assertions
whoiskatrin Apr 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/workflow-safe-generated-ids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"agents": patch
---

Generate workflow instance IDs with a Cloudflare-safe alphabet so `runWorkflow()` no longer produces invalid IDs containing `_`.
61 changes: 61 additions & 0 deletions .github/workflows/playground-e2e.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Playground E2E

on:
schedule:
- cron: "0 0 * * *"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
e2e:
name: Playground browser tests
runs-on: ubuntu-24.04
timeout-minutes: 30
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 1

- uses: actions/setup-node@v6
with:
node-version: 24
cache: npm

- run: npm ci

- name: Get Playwright version
id: playwright-version
run: echo "version=$(jq -r '.packages[\"node_modules/playwright\"].version' package-lock.json)" >> $GITHUB_OUTPUT

- name: Cache Playwright browsers
uses: actions/cache@v5
id: playwright-cache
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ steps.playwright-version.outputs.version }}

- name: Install Playwright browsers
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: npx playwright install --with-deps chromium

- name: Run playground e2e tests
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CF_AI_GATEWAY_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CF_AI_GATEWAY_ACCOUNT_ID }}
CLOUDFLARE_GATEWAY_ID: ${{ secrets.CF_AI_GATEWAY_NAME }}
run: npm run test:playground:e2e

- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playground-playwright-report
path: |
playwright-report
examples/playground/playwright-report
test-results
examples/playground/test-results
if-no-files-found: ignore
20 changes: 19 additions & 1 deletion .github/workflows/pullrequest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ env:

jobs:
ci:
timeout-minutes: 20
timeout-minutes: 30
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
Expand Down Expand Up @@ -58,4 +58,22 @@ jobs:
run: npx playwright install --with-deps chromium

- run: CI=true npx nx run-many -t test

- name: Run playground E2E tests
env:
CLOUDFLARE_API_TOKEN: ${{ secrets.CF_AI_GATEWAY_TOKEN }}
CLOUDFLARE_ACCOUNT_ID: ${{ secrets.CF_AI_GATEWAY_ACCOUNT_ID }}
CLOUDFLARE_GATEWAY_ID: ${{ secrets.CF_AI_GATEWAY_NAME }}
run: npm run test:playground:e2e

- name: Upload Playwright report
if: failure()
uses: actions/upload-artifact@v4
with:
name: playground-playwright-report
path: |
examples/playground/playwright-report
examples/playground/test-results
if-no-files-found: ignore

- run: npx pkg-pr-new publish --peerDeps ./packages/*
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ __screenshots__

# Playwright test artifacts
test-results/
playwright-report/

# Nx
.nx/cache
Expand Down
25 changes: 24 additions & 1 deletion examples/playground/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,30 @@ playground/

## Testing

See [testing.md](./testing.md) for a comprehensive guide on manually testing every feature.
See [testing.md](./testing.md) for the source-of-truth test plan. **All E2E tests are AI-driven** — the test runner parses `testing.md` into scenarios, then uses an LLM to translate each scenario's natural-language actions and assertions into Playwright commands at runtime.

```bash
# Run the browser suite locally
npm run test:e2e
```

**How it works:**

1. `e2e/parse-testing-md.ts` parses `testing.md` into structured scenario objects
2. `e2e/ai-runner.spec.ts` creates one Playwright `test()` per scenario
3. `e2e/ai-executor.ts` navigates to the route, takes an accessibility snapshot, sends the scenario + snapshot to a Workers AI LLM, and executes the returned actions
4. Scenarios flagged `deployed-only` are auto-skipped in local/CI environments

**Required environment variables:**

- `CLOUDFLARE_API_TOKEN` — Cloudflare API token with Workers AI access
- `CLOUDFLARE_ACCOUNT_ID` — Cloudflare account ID

**Adding a new test:** Edit `testing.md` — no Playwright code needed. The AI runner will pick it up automatically.

The test command includes a smart dependency prepare step: it only rebuilds `agents`, `@cloudflare/ai-chat`, `@cloudflare/codemode`, and `@cloudflare/voice` when their source is newer than their built `dist/` output.

GitHub Actions runs the playground browser suite on every pull request (blocking merge) and nightly.

## Configuration

Expand Down
3 changes: 3 additions & 0 deletions examples/playground/e2e/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Playwright test artifacts
test-results/
playwright-report/
Loading
Loading