Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/cypress-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ on:
- stable
- 'jc/**'

concurrency:
group: e2e-${{ github.ref }}
cancel-in-progress: true

jobs:
cypress-run:
runs-on: ubuntu-latest
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ jobs:
-e POSTGRES_HOST=postgres \
-e POSTGRES_PASSWORD=PdwPNS2mDN73Vfbc \
-e POSTGRES_DB=polis-test \
-e SKIP_GOLDEN=1 \
delphi \
bash -c " \
set -e; \
Expand Down
14 changes: 14 additions & 0 deletions .spr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
githubRepoOwner: compdemocracy
githubRepoName: polis
githubHost: github.com
githubRemote: origin
githubBranch: edge
requireChecks: true
requireApproval: true
defaultReviewers: []
mergeMethod: squash
mergeQueue: false
prTemplateType: stack
forceFetchTags: false
showPrTitlesInStack: false
branchPushIndividually: false
32 changes: 24 additions & 8 deletions delphi/CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,9 @@

This document provides comprehensive guidance for working with the Delphi system, including database interactions, environment configuration, Docker services, and the distributed job queue system. It serves as both documentation and a practical reference for day-to-day operations.

## Documentation Directory
## Documentation

For a comprehensive list of all documentation files with descriptions, see:
[delphi/docs/DOCUMENTATION_DIRECTORY.md](docs/DOCUMENTATION_DIRECTORY.md)

## Current work todos are located in

delphi/docs/JOB_QUEUE_SCHEMA.md
delphi/docs/DISTRIBUTED_SYSTEM_ROADMAP.md
**Warning:** Many docs in `docs/` are outdated and should not be trusted. Always verify against the actual code. Start with `docs/PLAN_DISCREPANCY_FIXES.md` (canonical fix plan) and `docs/CLJ-PARITY-FIXES-JOURNAL.md` (session journal) for current Clojure parity work.

## Helpful terminology

Expand Down Expand Up @@ -368,3 +362,25 @@ The system uses AWS Auto Scaling Groups to manage capacity:
- Large Instance ASG: 1 instance by default, scales up to 3 based on demand

CPU utilization triggers scaling actions (scale down when below 60%, scale up when above 80%).


## Testing

Run tests with `pytest` on the `tests/` folder.

### Datasets of reference

In `real_data`, we have several datasets of real conversations, exported from Polis, that can be used for testing and development. Those at the root of `real_data` are public.
In `real_data/.local`, we have some private datasets that can only be used internally. The comparer supports both public and private datasets via the `--include-local` flag.

### Regressions and golden snapshots

For regressions compared to the latest validated python code, there are both regression unit tests in `tests/`, as well as a test script that compares the output to "golden snapshots": `scripts/regression_comparer.py`. That script is more verbose than the tests, useful for debugging.

Some amount of numerical errors are OK, which is what the regression comparer library is for.

### Old Clojure reference implementation, and moving to Sklearn

For math, there is an older implementation in Clojure, in `polismath`. Until we can replace it, we run comparisons between the two implementations in `tests/*legacy*`. Those run the python code, and compare some of the output in some way to the `math blob`, which is the JSON output of the Clojure implementation, often stored in the PostgreSQL database, but for simplicity stored along the golden (python) snapshots used by the regression comparer, so we do not have to run Postgres nor Clojure to run those tests.

A lot of the current python code was ported from Clojure using an AI agent (Sonnet 3.5 last year), including a lot of home-made implementations of core algorithms. We are in the process of replacing those with standard implementations (such as sklearn for the PCA and K-means). This is ongoing work, and made harder by the fact that the Python code does not quite produce the same output as the Clojure code. So typically we have to check what the ported python code is doing differently from the clojure code, adjust the python code to match the clojure output, and then replace it with standard implementations, which may again produce slightly different output, so we have to adjust parameters until we get similar output.
29 changes: 19 additions & 10 deletions delphi/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,13 @@

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
ENV UV_SYSTEM_PYTHON=1

# Install uv for faster package installation (~2x faster than pip)
# Placed in /opt/uv (not /usr/local/bin) to avoid leaking into the final/prod image
# via the blanket COPY --from=builder /usr/local/bin in Stage 2.
COPY --from=ghcr.io/astral-sh/uv:0.11.2 /uv /opt/uv/uv
ENV PATH="/opt/uv:$PATH"

RUN apt-get update && \
apt-get install -y --no-install-recommends \
Expand All @@ -28,21 +35,21 @@
COPY pyproject.toml requirements.lock ./

# Install dependencies from lock file (cached layer - reused unless requirements.lock changes)
# BuildKit cache mount keeps pip cache between builds for faster rebuilds
# BuildKit cache mount keeps uv cache between builds for faster rebuilds
# If USE_CPU_TORCH is true, we install CPU-specific wheels and filter them out of the lockfile
RUN --mount=type=cache,target=/root/.cache/pip \
RUN --mount=type=cache,target=/root/.cache/uv \
if [ "$USE_CPU_TORCH" = "true" ]; then \
echo "USE_CPU_TORCH=true: Installing CPU-only PyTorch..." && \
pip install --index-url https://download.pytorch.org/whl/cpu \
uv pip install --index-url https://download.pytorch.org/whl/cpu \
torch==2.8.0 \
torchvision==0.23.0 \
torchaudio==2.8.0 && \
echo "Filtering standard torch packages from requirements.lock..." && \
grep -vE "^(torch|torchvision|torchaudio)==" requirements.lock > requirements.filtered.lock && \
pip install -r requirements.filtered.lock; \
uv pip install -r requirements.filtered.lock; \
else \
echo "USE_CPU_TORCH=false: Installing standard dependencies..." && \
pip install -r requirements.lock; \
uv pip install -r requirements.lock; \
fi

# ===== OPTIMIZATION: Copy source code LAST (busts cache on code changes) =====
Expand All @@ -54,8 +61,8 @@ RUN --mount=type=cache,target=/root/.cache/pip \

# Install the project package (without dependencies - they're already installed)
# This registers entry points and installs the package in development mode
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-deps .
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --no-deps .

RUN echo "--- PyTorch Check (after pyproject.toml installation) ---" && \
pip show torch torchvision torchaudio && \
Expand Down Expand Up @@ -132,12 +139,14 @@ RUN apt-get update && \
&& apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Copy pyproject.toml to install dev dependencies
# Copy uv from builder for faster package installation
COPY --from=builder /opt/uv/uv /usr/local/bin/uv
ENV UV_SYSTEM_PYTHON=1
COPY pyproject.toml .

# Install dev dependencies (pytest, etc.) using caching
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-cache-dir ".[dev]"
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install ".[dev]"

# Default command for test container (can be overridden)
CMD ["tail", "-f", "/dev/null"]
Loading
Loading