Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
ad42a88
add: .idea files to gitignore
Nigma-Ks Jan 9, 2026
b862616
feat(dockerfile): add build PySymGym dockerfile
Nigma-Ks Jan 9, 2026
f7f9218
fix: make building on macos dockerfile
Nigma-Ks Jan 12, 2026
7ec2f7a
fix: add runstrat test and fix errors occured
Nigma-Ks Jan 13, 2026
ab1faa4
add func that runs test case in docker
Nigma-Ks Feb 10, 2026
715cb65
feat(run_runstrat): run ExecutionTreeContributedCoverage strategy and…
Nigma-Ks Feb 17, 2026
d35f886
remove runstrat test from docker
Nigma-Ks Feb 17, 2026
4186072
feat(frontend form)
Nigma-Ks Feb 18, 2026
e97dabd
add backend-frontend connection to upload onnx model
Nigma-Ks Feb 18, 2026
6604dcd
refactor net7 to net8, run runstrat ai strat
Nigma-Ks Feb 18, 2026
efad1b8
feat(frontend): form data uploads on submit
Nigma-Ks Feb 19, 2026
85dccff
new data on new submit
Nigma-Ks Feb 19, 2026
6c897f2
feat: function to send results on email
Nigma-Ks Feb 21, 2026
12192f0
feat add building container and loading dataset script
Nigma-Ks Feb 21, 2026
b9f3910
add email validation
Nigma-Ks Feb 21, 2026
12f6bfd
feat: comparison with baseline, field experiment name in form
Nigma-Ks Feb 21, 2026
32f0a55
refactor: runstrat and compstrat cmds to template methods
Nigma-Ks Feb 23, 2026
25dc258
fix model onnx uploads with model.onnx name
Nigma-Ks Feb 23, 2026
d685996
fix docker cmd
Nigma-Ks Feb 23, 2026
3cf7ef7
refactor: split main logic into separate services
Nigma-Ks Feb 23, 2026
d237399
refactor: move image name to build file and make it const
Nigma-Ks Feb 24, 2026
f409623
feat: unique directories for each launch
Nigma-Ks Feb 24, 2026
f9372a1
load dataset
Nigma-Ks Feb 25, 2026
3d8dd92
remove tmp dirs after submit handling
Nigma-Ks Feb 26, 2026
45c31e6
add parallel execution of pipeline using Celery
Nigma-Ks Feb 26, 2026
3194fe4
add tmp dir to gitignore
Nigma-Ks Feb 26, 2026
55cf00d
docker build no cache to update pysymgym
Nigma-Ks Feb 26, 2026
b154e18
refactor Methods class, move build and fetch functions to app_setup
Nigma-Ks Feb 28, 2026
2259ff3
feat add tests
Nigma-Ks Feb 28, 2026
a9883ee
add requirements file
Nigma-Ks Mar 2, 2026
7161f36
add test workflow: build project without docker and run some tests
Nigma-Ks Mar 5, 2026
7c39925
fix remove pytest args
Nigma-Ks Mar 5, 2026
7c0cc41
fix: resolve module imports in tests
Nigma-Ks Mar 5, 2026
48181b6
add building pysymgym workflow
Nigma-Ks Mar 5, 2026
7ddae70
fix: resolve backend module path
Nigma-Ks Mar 5, 2026
345af3d
fix misprint
Nigma-Ks Mar 5, 2026
00200e9
run all tests in docker.yml
Nigma-Ks Mar 5, 2026
9eeb486
add .env file for tests
Nigma-Ks Mar 5, 2026
bc45f70
refactor change docker test result dir to pytest tmp_path
Nigma-Ks Mar 5, 2026
0ef4977
docs: add detailed README with installation instructions
Nigma-Ks Mar 5, 2026
96159f4
refactor(tests): pytest tmp_path instead of TMP_DIR
Nigma-Ks Mar 5, 2026
8231805
refactor(Methods): change Methods methods names
Nigma-Ks Mar 5, 2026
785f136
remove workflow without docker
Nigma-Ks Mar 5, 2026
5233bbb
fetch_dataset function uploads dataset from docker image, ruff format
Nigma-Ks Mar 6, 2026
dc069c4
refactor: join paths
Nigma-Ks Mar 6, 2026
98b1320
refactor: os.sep instead of slash
Nigma-Ks Apr 8, 2026
f59f0ab
refactor: remove comma in item condition
Nigma-Ks Apr 8, 2026
ba3ab86
refactor: use dict instead of defaultdict
Nigma-Ks Apr 8, 2026
a76fc19
refactor: move duplicate strings into separate variables
Nigma-Ks Apr 8, 2026
368e280
refactor: use defaultdict properly
Nigma-Ks Apr 8, 2026
717ee95
refactor: move prefix to const
Nigma-Ks Apr 8, 2026
b98cd16
fix: remove dynamic created files
Nigma-Ks Apr 8, 2026
0455f48
add frontend linter, run ruff format
Nigma-Ks Apr 8, 2026
bea979f
feat: add prettier frontend formatter, add it to pre-commit
Nigma-Ks Apr 8, 2026
fc4eff1
add linting.yml, run format all
Nigma-Ks Apr 8, 2026
8aa3091
add linting.yml, run format all
Nigma-Ks Apr 8, 2026
9ccb45d
refactor: ruff to ruff-check
Nigma-Ks Apr 8, 2026
27f41df
fix: pre-commit prettier script runs write .
Nigma-Ks Apr 8, 2026
2dc1c2c
fix: no Methods.ts formatting
Nigma-Ks Apr 8, 2026
2060993
update requirements.txt
Nigma-Ks Apr 10, 2026
b36d2c1
feat(cancellation): add experiment cancellation functionality
Nigma-Ks May 8, 2026
8057335
feat(frontend pages): homepage, experiments, information, ranking
Nigma-Ks May 8, 2026
1c886a9
fix(cancellation, tmp files deletion): cancellation button disappears…
Nigma-Ks May 8, 2026
11dff44
feat(metrics_calc): add functions to calculate experiments metrics
Nigma-Ks May 8, 2026
5328367
fix(filepathes): reset recent filepathes changes
Nigma-Ks May 9, 2026
f947631
fix(celery task): ensure cleanup runs and improve logging on failure
Nigma-Ks May 9, 2026
6d18274
feat(metrics): add validation performance metrics
Nigma-Ks May 9, 2026
b34ff90
feat: metrics tests
Nigma-Ks May 9, 2026
821086a
feat: add tests for task, task cancellation, fix sender tests
Nigma-Ks May 9, 2026
99d7ad0
feat: add run experiment for publishing page and functionality
Nigma-Ks May 9, 2026
06820a6
fix(ModelRankingPage): route to existing page and update publish butt…
Nigma-Ks May 9, 2026
eb71801
feat(ranking table): save published experiments in database and displ…
Nigma-Ks May 10, 2026
13f35fb
docs(readme): add routing, ranking, database and MinIO setup instruct…
Nigma-Ks May 10, 2026
7460d99
fix(tests): set DB_URL
Nigma-Ks May 10, 2026
7e9e3fe
feat(ranking table): add sorting for all columns
Nigma-Ks May 10, 2026
2fa4ce0
fix: sorting ranking columns by its value
Nigma-Ks May 11, 2026
f950c68
feat(config): make Redis URL configurable for horizontal scaling
Nigma-Ks May 11, 2026
b2d1a1f
docs(scaling): add Redis URL config instruction to run workers on sep…
Nigma-Ks May 11, 2026
c34776b
feat(comparison): add comparison with another model functionality
Nigma-Ks May 11, 2026
38555c6
feat: cancel experiment via email link, add baseline run to ranking s…
Nigma-Ks May 11, 2026
7c331a0
remove publish page and comparison from experiment, add new experimen…
Nigma-Ks May 11, 2026
b8b1ad3
feat(ranking): multi-language, new succesfully launched methods metric
Nigma-Ks May 12, 2026
f028958
feat: add comparison functionality for experiments results
Nigma-Ks May 14, 2026
fe06768
feat: add model interface page with interface description
Nigma-Ks May 14, 2026
6082f98
feat(ModelInterfacePage): add shape description and reference to it f…
Nigma-Ks May 15, 2026
53d44e4
add runstrat zip results attachment, remove old tests
Nigma-Ks May 15, 2026
e394e73
feat(comparison): csv files from comparison results available to down…
Nigma-Ks May 15, 2026
a554805
feat: add graph_description file, send failure notification email, re…
Nigma-Ks May 16, 2026
ed26adc
test: add token and api tests, expand metrics, email and task tests
Nigma-Ks May 16, 2026
5243f20
fix(frontend): remov eslint-disable-line
Nigma-Ks May 23, 2026
eddefaf
docs: update README
Nigma-Ks May 24, 2026
45f51ad
docs (README): update features description
Nigma-Ks May 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .github/workflows/build_and_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Build project and run tests

on: [ push, pull_request ]

jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Python 3.14
uses: actions/setup-python@v5
with:
python-version: '3.14'
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Build dockerfile and update dataset
run: |
python -m backend.launch_service.app_setup

- name: Create .env file for tests
run: |
cat > .env << EOF
EMAIL=test@example.com
APP_PASSWORD=test_password_123
DB_URL=sqlite://
EOF

- name: Run Python tests
run: |
python -m pytest -v
41 changes: 41 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Lint and Format Check

on: [ push, pull_request ]

jobs:
lint-and-format:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Run ruff check
uses: astral-sh/ruff-action@v3
with:
args: "check"

- name: Run ruff format check
uses: astral-sh/ruff-action@v3
with:
args: "format --check"

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
cache-dependency-path: 'frontend/package-lock.json'

- name: Install frontend dependencies
run: |
cd frontend
npm ci

- name: Run ESLint
run: |
cd frontend
npm run lint

- name: Run Prettier check
run: |
cd frontend
npm run format:check
31 changes: 27 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ celerybeat.pid
*.sage.py

# Environments
.env
backend/.env
.envrc
.venv
env/
Expand Down Expand Up @@ -171,7 +171,7 @@ cython_debug/
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# and can be added to the global gitignore or merged into this file_utils. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

Expand All @@ -184,14 +184,14 @@ cython_debug/
# Visual Studio Code
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
# and can be added to the global gitignore or merged into this file. However, if you prefer,
# and can be added to the global gitignore or merged into this file_utils. However, if you prefer,
# you could uncomment the following to ignore the entire vscode folder
# .vscode/

# Ruff stuff:
.ruff_cache/

# PyPI configuration file
# PyPI configuration file_utils
.pypirc

# Cursor
Expand All @@ -205,3 +205,26 @@ cython_debug/
marimo/_static/
marimo/_lsp/
__marimo__/

#IDE
.idea/
.DS_Store

# runstrat artifacts and config
.env
backend/tmp

#Frontend
node_modules/
frontend/.gitignore
frontend/README.md

#dynamic created files
frontend/src/components/components/Methods.ts
backend/resources/dataset.json

# Pre-commit
.pre-commit-config.local

# Memory benchmark scripts, inputs and results
mem_benchmark/
31 changes: 31 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.2
hooks:
- id: ruff-check
args: [ --fix ]
- id: ruff-format

- repo: https://github.com/pre-commit/mirrors-eslint
rev: v9.38.0
hooks:
- id: eslint
files: ^frontend/.*\.(js|jsx|ts|tsx)$
args: [ '--fix', '--config', 'frontend/eslint.config.js' ]
additional_dependencies:
- eslint@9.38.0
- '@typescript-eslint/eslint-plugin@8.58.1'
- '@typescript-eslint/parser@8.58.1'
- '@eslint/js@9.38.0'
- 'eslint-plugin-prettier@5.2.1'
- 'eslint-config-prettier@10.0.1'
- globals@15.12.0

- repo: local
hooks:
- id: prettier-frontend
name: Prettier Frontend
entry: bash -c 'cd frontend && npx prettier --write .'
language: system
files: ^frontend/.*\.(js|jsx|ts|tsx|json|css|md)$
pass_filenames: false
211 changes: 210 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,211 @@
# PySymBench
Infrastructure for models comparison and evalustion

Infrastructure for **AI model comparison and evaluation in symbolic execution workflows**.

PySymBench is a **local web application** for running ONNX models as guiding strategies in symbolic execution. Experiments run inside Docker using [PySymGym](https://github.com/PySymGym/PySymGym) tools on a fixed test set; results are emailed back to the user and saved to a public leaderboard.

The platform is designed to cover three target languages — **C#**, **Java** and **C++** — but only **C#** is available right now; Java and C++ are in development.

## Features

- **Run Experiment** — name the experiment, upload an ONNX model, pick a test set (currently C# or the "All" mode that runs the model against every available test set), and provide an email. The experiment is launched in Docker; metrics and artifacts are emailed back when it finishes. While the task is in progress it can be cancelled via a one-click link in the confirmation email.
- **Model Ranking** — leaderboard of all completed experiments, split into tabs by language plus an **All Methods** tab. Rows are sorted by mean coverage (with ties broken by total tests, total errors, recency and runtime); every column is also sortable in the UI. Per-experiment metrics include mean/median coverage, total tests, errors, runtime and the share of methods that produced results.
- **All Methods mode** — a dedicated experiment mode that runs the model against every language's test set. Each per-language run produces its own leaderboard entry (with that language's metrics), and an additional aggregated entry covering all languages is shown in the **All Methods** tab.
- **Pairwise Comparison** — select any two experiments from a ranking tab and produce side-by-side comparison artifacts (PDFs), downloadable individually or as a single zip.
- **Model Interface docs** — page describing the ONNX input/output specification a model must satisfy to be runnable by the experiment pipeline (tensor names, shapes, graph encoding).

### Routes

The frontend is a multi-page React SPA using `react-router-dom`:

| Route | Page |
|---|---|
| `/` | Home — navigation hub |
| `/experiment` | Run Experiment form |
| `/ranking` | Model Ranking leaderboard + pairwise comparison |
| `/interface` | Model Interface specification |

### Backend API

| Method | Path | Purpose |
|---|---|---|
| `POST` | `/api/upload` | Submit a new experiment (multipart: ONNX file, `email`, `language`, `experiment`) |
| `GET` | `/api/status/{task_uid}` | Celery task state |
| `POST` | `/api/cancel/{task_uid}` | Cancel a running experiment |
| `GET` | `/api/cancel/{task_uid}?token=...` | One-click cancellation link sent by email |
| `GET` | `/api/ranking?language=csharp\|java\|cpp\|all` | Leaderboard entries |
| `POST` | `/api/compare` | Start a pairwise comparison between two experiment IDs |
| `GET` | `/api/compare/{uid}/status` | Comparison task state and result file list |
| `GET` | `/api/compare/{uid}/file/{name}` | Stream a single comparison artifact |
| `GET` | `/api/compare/{uid}/files.zip` | Download all comparison PDFs as a zip |

# Installation

The repository contains **both frontend and backend components**, and **both must be launched** for the application to work.

---

## Email Communication (Gmail)

To enable email delivery of results, add Gmail credentials to your `.env` file:

```
EMAIL=your_email@gmail.com
APP_PASSWORD=your_app_password
```

`EMAIL` — your Gmail address
`APP_PASSWORD` — your Gmail **App Password** (not your regular account password)

---

## Database (PostgreSQL)

The ranking leaderboard stores experiment results in a PostgreSQL database. Add the connection URL to your `.env` file:

```
DB_URL=postgresql://user:password@localhost:5432/pysymbench
```

The required tables are created automatically on server startup. You can run a local PostgreSQL instance via Docker:

```
docker run --name postgres-pysymbench -e POSTGRES_USER=user -e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=pysymbench -p 5432:5432 -d postgres
```

---

## Object Storage (MinIO)

Experiments store their ONNX model and result artifacts in MinIO; the pairwise comparison feature also reads artifacts from there. MinIO must be reachable — if it is not configured or unavailable, the task fails and the user is notified by email. Add the following to your `.env` file:

```
MINIO_ENDPOINT=localhost:9000
MINIO_ACCESS_KEY=your_access_key
MINIO_SECRET_KEY=your_secret_key
MINIO_SECURE=false
MINIO_BUCKET=pysymbench
```

You can run a local MinIO instance via Docker:

```
docker run --name minio -p 9000:9000 -p 9001:9001 \
-e MINIO_ROOT_USER=your_access_key -e MINIO_ROOT_PASSWORD=your_secret_key \
-d minio/minio server /data --console-address ":9001"
```

---

## Redis (Celery Broker)

By default, the Celery broker is expected at `redis://localhost:6379`. To use a remote Redis instance (e.g., for running Celery workers on separate machines), set `REDIS_URL` in your `.env` file:

```
REDIS_URL=redis://<host>:6379
```

All services that connect to Redis — the FastAPI app and every Celery worker — must have the same `REDIS_URL`. Workers on remote machines need only the `backend/` code, Docker, and access to the shared Redis instance.

---

## URLs for email links

Cancellation links sent by email are absolute, so the backend needs to know its own public URL and the URL of the frontend. Defaults match a local setup; override them in `.env` if the app is reachable elsewhere:

```
BASE_URL=http://localhost:8000 # base URL of the FastAPI app
FRONTEND_URL=http://localhost:5173 # base URL of the React frontend
```

---

## Backend Setup

1. Install **Python 3.14** and **Docker**, then install the project dependencies:

```
pip install -r requirements.txt
```

2. Run the application setup script (this builds a Docker container with the **PySymGym repository** and downloads the required dataset):

```
python -m backend.launch_service.app_setup
```

3. Start the **Celery broker (Redis)**:

```
docker run --name redis-for-celery -p 6379:6379 -d redis
```

4. Start the **Celery worker** and the **application server** (in separate terminals):

```
celery -A backend.utils.task worker --loglevel=info
uvicorn backend.main:app
```

---

## Frontend Setup

1. Install **Node.js** with **npm**.

2. Install frontend dependencies:

```
cd frontend
npm install
```

3. Start the frontend development server:

```
npm run dev
```

Or build for production:

```
npm run build
```

### Frontend technology stack

| Package | Purpose |
|---|---|
| `react-router-dom` | Client-side routing between pages |
| `antd` | UI component library (forms, tables, buttons, modals) |
| `tailwindcss` | Utility-first CSS framework |
| `vite` | Build tool and dev server |

---

## Development

### Python

```
ruff check . # Lint
ruff check . --fix # Auto-fix
ruff format . # Format
pytest -v # Run tests
```

### Frontend

```
cd frontend
npm run lint:fix # ESLint auto-fix
npm run format # Prettier format
npm run format:check # Check formatting without writing
```

## CI/CD

GitHub Actions runs on push/PR:
- **`linting.yml`** — ruff check + format, ESLint + Prettier
- **`build_and_test.yml`** — builds Docker image, runs `pytest -v`
Empty file added backend/__init__.py
Empty file.
Empty file added backend/config/__init__.py
Empty file.
Loading
Loading