Skip to content

IGNORE -- crash from spr#2494

Closed
jucor wants to merge 5 commits intoedgefrom
spr/edge/bdc830db
Closed

IGNORE -- crash from spr#2494
jucor wants to merge 5 commits intoedgefrom
spr/edge/bdc830db

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

Per-discrepancy test infrastructure for TDD fixing of Python-Clojure differences.

Changes

  • Add per-discrepancy test markers and parametrized test infrastructure
  • Cold-start recorder: coordinate parallel runs with marker file, auto-pause math workers
  • Update journal with xpassed test breakdown across all datasets
  • Address Copilot review: remove unused import, fix script issues
  • Add naming convention documentation

Test plan

  • 223 passed, 4 skipped, 22 xfailed, 7 xpassed, 0 failures
    🤖 Generated with Claude Code

commit-id:bdc830db


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

jucor added 5 commits March 30, 2026 22:51
When spr force-pushes all stack branches at once, each branch triggers
a new E2E run while the previous one is still queued. The concurrency
group (keyed on branch ref) cancels the stale run, cutting the number
of concurrent E2E jobs roughly in half.

commit-id:1a8eb714
## Summary


Add `SKIP_GOLDEN=1` environment variable to disable golden snapshot regression tests.

During stacked PR development, golden snapshots become stale as computation changes cascade through the stack. Rather than re-recording snapshots at every rebase (which causes conflict cascades in jj/git), we skip them until the stack is merged into `edge`.

### Changes

- **`test_regression.py`**: Add `@_skip_golden` decorator to `test_conversation_regression` and `test_conversation_stages_individually` — the only two tests that compare against golden snapshots. Other dataset-using tests (Clojure comparison, smoke tests) are unaffected.
- **`python-ci.yml`**: Set `SKIP_GOLDEN=1` in CI so the stacked PRs don't fail on stale snapshots.

### Usage

```bash
SKIP_GOLDEN=1 pytest tests/          # skip golden snapshot tests
pytest tests/                         # run everything (default)
```

## Test plan

- [x] `SKIP_GOLDEN=1 pytest tests/test_regression.py -v`: 4 skipped, 5 passed
- [x] `pytest tests/test_regression.py -v`: all 9 collected (golden tests run normally)

commit-id:d39cf65d
## Summary


Replace `pip install` with `uv pip install` in the delphi Dockerfile for faster dependency installation. `uv pip` is a drop-in replacement — same `requirements.lock`, same `pyproject.toml`, same installed packages in `site-packages`.

### Changes

- **`delphi/Dockerfile`**:
  - Copy `uv` binary from `ghcr.io/astral-sh/uv:0.11.2` (pinned version, single static binary)
  - Place in `/opt/uv/` in builder to avoid leaking into production image via `COPY --from=builder /usr/local/bin`
  - Set `UV_SYSTEM_PYTHON=1` (install into system Python, not a venv)
  - Replace all `pip install` with `uv pip install` in builder and test stages
  - Update BuildKit cache mount targets from `/root/.cache/pip` to `/root/.cache/uv`
  - Test stage copies `uv` from builder (single source of truth)

### What's NOT changed

- **Makefile** — untouched, `make rebuild-delphi` works as before
- **docker-compose.yml / docker-compose.test.yml** — untouched
- **pyproject.toml / requirements.lock** — untouched, same format
- **pip-compile workflow** — untouched, still used for lock file generation
- **Final/production image** — no `uv` added, stays lean

### CI Benchmark (GitHub Actions, `ubuntu-latest`, `--no-cache`)

5 pip runs (Mar 27 stack push) vs 3 uv pip runs (this PR):

| Step | pip (n=5) | uv pip (n=3) | Speedup |
|------|-----------|--------------|---------|
| **Docker build** | **264s** (sd=6) | **169s** (sd=2) | **1.56x (-94s)** |
| Pytest run | 223s (sd=4) | 227s (sd=4) | ~same |

**~94 seconds saved per CI run** on the Docker build step. Pytest runtime is unchanged (same packages, same tests). Low variance in both groups confirms this is a real improvement, not noise.

### Local Benchmark (Apple M1 Max, 64GB, Docker Desktop, `--no-cache`)

| Step | pip | uv pip | Speedup |
|------|-----|--------|---------|
| **Dependencies install** | **149.3s** | **80.4s** | **1.9x** |
| Dev deps install | 10.0s | 2.2s | **4.5x** |

## Test plan

- [x] `docker compose -f docker-compose.test.yml build --no-cache delphi` succeeds locally
- [x] Built image has all expected packages (`pip show` diagnostic passes in build log)
- [x] CI passes (3 successful runs)
- [x] `make rebuild-delphi` works (Makefile untouched)

commit-id:0c448343
## Summary


Documentation-only PR: deep analysis of Python vs Clojure discrepancies and a TDD fix plan.

### Changes

- Deep analysis documents (`deep-analysis-for-julien/`) comparing Python and Clojure implementations statement-by-statement
- Consolidate CLAUDE.md documentation for the delphi project
- Discrepancy fix plan (`docs/PLAN_DISCREPANCY_FIXES.md`) with prioritized list of fixes

## Test plan

- [x] Documentation only — no code changes
🤖 Generated with [Claude Code](https://claude.com/claude-code)

commit-id:d2f65026
## Summary


Per-discrepancy test infrastructure for TDD fixing of Python-Clojure differences.

### Changes

- Add per-discrepancy test markers and parametrized test infrastructure
- Cold-start recorder: coordinate parallel runs with marker file, auto-pause math workers
- Update journal with xpassed test breakdown across all datasets
- Address Copilot review: remove unused import, fix script issues
- Add naming convention documentation

## Test plan

- [x] 223 passed, 4 skipped, 22 xfailed, 7 xpassed, 0 failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)

commit-id:bdc830db
@jucor jucor force-pushed the spr/edge/bdc830db branch from 26c46cd to dec4237 Compare March 30, 2026 21:56
@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 30, 2026

PR created by a crashed jj spr update, ignore.

@jucor jucor closed this Mar 30, 2026
@jucor jucor changed the title Per-discrepancy test infrastructure IGNORE -- crash from spr Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant