Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
15487b1
Added cohort definition sets and cohort generation features
azimov May 4, 2026
073bcaa
Merge branch 'develop' into features/cohort-definition-set
azimov May 7, 2026
cabaa19
started benchmarking scripts
azimov May 13, 2026
02fc184
Removed and re-worked slop benchmark code
azimov May 14, 2026
4b89967
implementation of custom end era logic in ibis layer
azimov May 14, 2026
09748f2
more tests around custom era logic for parity with java
azimov May 14, 2026
471d75b
merge fatures/ibis-custom-eras' into features/cohort-definition-set
azimov May 14, 2026
5c99060
Fixed perfromance issues wwith ibis compilation parse trees and impro…
azimov May 15, 2026
b714dc0
fix(execution): CustomEra window partitions on (person_id, event_id) …
azimov May 15, 2026
8f7c4d2
fix(execution): CustomEra window partitions on (person_id, event_id) …
azimov May 15, 2026
3621bb2
feat(execution): add materialize flag to build_cohort for compile-onl…
azimov May 15, 2026
4a4da42
updated benchmarks to run well on databricks
azimov May 15, 2026
4d402f9
Added files needed by python from benchmark script
azimov May 15, 2026
ba78e1b
Attempt to fix test warnings
azimov May 15, 2026
9860604
remove chat slop
azimov May 15, 2026
dd20010
fixes and improved tests;
azimov May 16, 2026
1dfe42b
Merge branch 'features/ibis-custom-eras' into features/cohort-definit…
azimov May 16, 2026
3f5b21f
simplification of checksum process
azimov May 17, 2026
f151953
Error log
azimov May 17, 2026
95fcc6b
added fixes to checksums using raw insert/upsert instead of memtables
azimov May 17, 2026
3141f10
Error log
azimov May 17, 2026
fc48d8a
Async cohort generation and compilation with improved error handling …
azimov May 17, 2026
f27ddb0
updated concept set resolution to remove in memory python nonsense
azimov May 18, 2026
a837d7a
concept set optimizations
azimov May 18, 2026
6a825dd
concept set fixes
azimov May 18, 2026
072e5a9
concept set fixes
azimov May 18, 2026
e99f27f
concept set cohort table names
azimov May 18, 2026
d00c78e
concept set cache table creation up front
azimov May 18, 2026
817727d
concept set cache table creation up front
azimov May 18, 2026
8dcd3d2
concept set cache table creation up front
azimov May 18, 2026
ba4825b
concept set cache table creation
azimov May 18, 2026
e7cf487
Removed ill-faited concept set cache - need to refine design better
azimov May 18, 2026
bdcacbf
Refactor table usage for concept sets
azimov May 18, 2026
15802d6
Error log
azimov May 18, 2026
d60b35b
Removal of inefficient python memory usage and memtables
azimov May 18, 2026
440e6e9
more fixes from concept set changes
azimov May 18, 2026
1c3a3d7
More crazy codeset join fixes
azimov May 19, 2026
2604f9d
Removal of heavy recursion for large unions
azimov May 19, 2026
2078f00
Replaced concept set temp tables per cohort with per execution
azimov May 19, 2026
2af353c
Fixed bad recursion for excluded concept sets
azimov May 19, 2026
739eaac
Modification to inefficient joins in correlated criteria
azimov May 19, 2026
90a374f
Optimization to codeset resolution and Additional criteria
azimov May 20, 2026
8f194df
Fix concept casting issue on databricks
azimov May 20, 2026
3ff5a30
Fixed issue with use of source concepts
azimov May 20, 2026
8b132c8
Tests and fixes for cohorts that are not identical from phenotype lib…
azimov May 20, 2026
55f75b6
Added benchmark report markdown file
azimov May 21, 2026
ed2cec3
Benchmark script changes for local run on databricks
azimov Jun 1, 2026
4412902
git ignore
azimov Jun 1, 2026
f0dca61
git ignore
azimov Jun 1, 2026
e92f452
git ignore
azimov Jun 1, 2026
77da42f
git ignore
azimov Jun 1, 2026
f830cb5
removed benchmarking code (moved to stand alone repository)
azimov Jun 17, 2026
023b28d
Fixing tests and removing python 3.9 support
azimov Jun 17, 2026
1b2b1e3
Actual fix - make ibis a core dependency of the package and update re…
azimov Jun 17, 2026
b7a433a
Ruff fixes
azimov Jun 17, 2026
d99b9c7
Ruff fixes
azimov Jun 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/basic_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.9", "3.10", "3.11", "3.12", "3.13", "3.14" ]
python-version: [ "3.10", "3.11", "3.12", "3.13", "3.14" ]

steps:
- name: Check out repository
Expand Down
11 changes: 10 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ celerybeat.pid

# Environments
.env
.Renviron
.venv
env/
venv/
Expand Down Expand Up @@ -178,4 +179,12 @@ debug_app/user_overrides.json
debug_app/test_results.json

.test_baseline.json
.test_final.json
.test_final.json

# Benchmark outputs (generated by running benchmarks)
benchmark_output/
circepy_benchmarks/
renv.lock
eunomia_data/
renv/
.Rprofile
29 changes: 29 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,35 @@ git pre-commit run --all-files

If pre-commit checks fail, fix the issues and re-run until they pass.

## Ibis Execution Layer: NEVER use Python in-memory operations

The datasets this software processes are large (often 100M+ rows). Operations that pull data into Python memory will crash the process. All data processing MUST remain as lazy ibis expressions executed on the database backend.

### Forbidden patterns in production code (`circe/execution/` and `circe/cohort_definition_set/`):

| Pattern | Example (NEVER do this) | Instead |
|---|---|---|
| `.execute()` | `table.execute()` loads entire table into a pandas DataFrame in memory | Compose ibis expressions; let the backend execute the full query |
| `.to_pandas()` | `table.to_pandas()` pulls result set into Python | Use ibis expressions; only call `.execute()` for small scalars (e.g., `table.limit(1).count().execute()`) |
| Python iteration over results | `for row in table.select(...).distinct().to_pandas().itertuples()` | Push aggregation/distinct into ibis; use window functions or joins |
| `ibis.memtable()` with large DataFrames | Constructing a large `pd.DataFrame` and passing to `ibis.memtable()` | Read directly from the database table (passed tables already exist in the backend) |
| Loading files into Python | `pd.read_csv(...)`, reading Parquet into memory | Use ibis to read files: `ibis.read_csv()`, `ibis.read_parquet()` |

### Existing violations in production code (DO NOT FIX — examples for reference):

1. **`circe/cohort_definition_set/_checksum_store.py`** — uses `pandas`, `.execute()`, `pd.DataFrame()`, row iteration — should use ibis expressions end-to-end
2. **`circe/execution/engine/custom_era.py:86`** — `.execute().iloc[:, 0]` to pull concept IDs into a Python tuple
3. **`circe/execution/engine/group_demographics.py:97`** — `.to_pandas().itertuples()` to iterate over distinct concept IDs
4. **`circe/execution/ibis/operations.py:86`** — `.execute()` to check if rows exist (use `table.limit(1).count()` instead)
5. **`benchmarks/compare_cohort_outputs.py`** — full table `.execute()`, pandas row iteration, set comparison in memory

### Allowed uses of `.execute()`:

- **Tests only** — tests run against small in-memory DuckDB databases with tiny fixtures. Assertions on small result sets are fine.
- **Scalar values** — getting a single count or checking existence: `table.count().execute()`, `table.limit(1).execute()` (only returns 1 row)

When writing new production code, if you find yourself reaching for `.execute()`, `.to_pandas()`, or Python iteration over ibis results, **stop** — the query can be rewritten as a lazy ibis expression.

## Git Workflow
- Do not run `git commit` — the user will handle commits
- Run pre-commit checks to validate code quality before marking tasks complete
28 changes: 17 additions & 11 deletions circe/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,16 @@
- cohort_print_friendly(): Generate Markdown from cohort expression
"""

from typing import TYPE_CHECKING, Any, Literal, Optional

from typing import TYPE_CHECKING, Any, Literal

from .cohort_definition_set import ( # noqa: F401
CohortDefinition,
CohortDefinitionSet,
CohortGenerationResult,
async_generate_cohort_set,
generate_cohort_set,
summarise_generation_results,
)
from .cohortdefinition import (
BuildExpressionQueryOptions,
CohortExpression,
Expand Down Expand Up @@ -114,7 +122,7 @@ def cohort_expression_from_yaml(yaml_str: str) -> CohortExpression:

def build_cohort_query(
expression: CohortExpression,
options: Optional[BuildExpressionQueryOptions] = None,
options: BuildExpressionQueryOptions | None = None,
) -> str:
"""Generate SQL query from a cohort expression.

Expand Down Expand Up @@ -147,8 +155,8 @@ def build_cohort(
*,
backend: IbisBackendLike,
cdm_schema: str,
vocabulary_schema: Optional[str] = None,
results_schema: Optional[str] = None,
vocabulary_schema: str | None = None,
results_schema: str | None = None,
) -> Table:
"""Build a cohort as a relational table expression.

Expand Down Expand Up @@ -199,8 +207,8 @@ def write_cohort(
cdm_schema: str,
cohort_table: str,
cohort_id: int,
vocabulary_schema: Optional[str] = None,
results_schema: Optional[str] = None,
vocabulary_schema: str | None = None,
results_schema: str | None = None,
if_exists: Literal["fail", "replace"] = "fail",
) -> None:
"""Build and write an OHDSI cohort table.
Expand Down Expand Up @@ -260,14 +268,12 @@ def write_cohort(

def cohort_print_friendly(
expression: CohortExpression,
concept_sets: Optional[list[ConceptSet]] = None,
title: Optional[str] = None,
concept_sets: list[ConceptSet] | None = None,
title: str | None = None,
include_concept_sets: bool = False,
) -> str:
"""Generate human-readable Markdown from a cohort expression.

This is equivalent to R CirceR's `cohortPrintFriendly()` function.

Args:
expression: CohortExpression instance
concept_sets: Optional list of concept sets (uses expression.concept_sets if None)
Expand Down
256 changes: 0 additions & 256 deletions circe/chat.py

This file was deleted.

Loading
Loading