Skip to content

Global futures#3427

Open
2kai2kai2 wants to merge 3 commits intocanaryfrom
kai/global-futures
Open

Global futures#3427
2kai2kai2 wants to merge 3 commits intocanaryfrom
kai/global-futures

Conversation

@2kai2kai2
Copy link
Copy Markdown
Collaborator

@2kai2kai2 2kai2kai2 commented Apr 29, 2026

We previously had a bug #3409, investigation of which uncovered the issue that futures are VM-scoped, in addition to the bug as described in the PR.
This PR fixes both by overhauling the futures system:

  • Futures are now global, registered in BexEngine
  • The flow now involves: 1. VM creates an UnscheduledFuture which it immediately yields to the engine for scheduling. 2. The engine registers and allocates new future and schedules execution with the future (unless it is an early-exit future then it returns immediately). 3. Once the future is completed, it updates the heap future and notifies all waiters.
  • Cancellation is now a BAML panic rather than a separate engine error.
  • The future registry acts as the root for incomplete futures. They are removed from the registry once completed, at which point the future can be garbage collected if it has no other threads/root havers that reference it.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added cancellation panic type to represent cancelled operations as catchable exceptions.
    • Added ability to track active async operations count via active_future_count().
  • Improvements

    • Enhanced future lifecycle management and cleanup for async operations.
    • Improved garbage collection and memory write-barrier handling.
    • Strengthened global variable immutability after initialization.

- new `FutureManager` handles registry of all futures. It is befind a shared heap permit.
- Cancellations now throw a `baml.panics.Cancelled` panic
- Futures are now split into `Future` and `UnscheduledFuture`. The VM onle ever creates `UnscheduledFuture`, which is then immediately yielded to the engine. The engine will then schedule it, creating and registering a `Future` object that is returned to th VM.
- Futures are now de-registered when they are resolved (regardless of what resolution they end up with).
- Race condition handling: if we await a pending from the VM, but before the engine handles the await a different thread completes the future, then we will get to the engine and find that the future id is not present. In this case, we check to see if the id was previously valid (which is true if the id is less than the current next id) and if it is then we say it is resolved and the VM can fetch the value from the heap.
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 29, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Apr 29, 2026 0:57am
promptfiddle Ready Ready Preview, Comment Apr 29, 2026 0:57am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive future management and cancellation handling. It replaces EngineError::Cancelled with a panic-based system, adds FutureManager for lifecycle tracking, introduces HeapPermit abstractions for flexible allocation access, implements GC write barriers, transitions globals from mutable to frozen post-initialization, and refactors VM future handling to use explicit scheduling with ID-based tracking.

Changes

Cohort / File(s) Summary
Panic and Object Types
baml_language/crates/baml_builtins2/.../ns_panics/panics.baml, baml_language/crates/bex_vm_types/src/types.rs, baml_language/crates/bex_vm_types/src/lib.rs
Adds Cancelled panic variant; introduces UnscheduledFuture and FutureId types; redefines Future with explicit terminal states (Error, Cancelled, InternalError).
Heap Permit Abstraction
baml_language/crates/bex_heap/src/heap_guard.rs, baml_language/crates/bex_heap/src/tlab.rs, baml_language/crates/bex_heap/src/lib.rs
Introduces HeapPermit<T> trait and SharedHeapPermit<T> for flexible allocation access; adds TlabHolder trait; exposes specialized allocators for diverse object kinds.
GC and Heap Invariants
baml_language/crates/bex_heap/src/heap.rs, baml_language/crates/bex_heap/src/gc.rs, baml_language/crates/bex_heap/src/heap_debugger/real.rs, baml_language/crates/bex_heap/src/accessor.rs
Adds write-barrier methods for inter-generational reference tracking; updates GC tracing/fixup for new future types; extends heap debug invariant checking; handles UnscheduledFuture in conversions.
Future Lifecycle Management
baml_language/crates/bex_engine/src/future.rs, baml_language/crates/bex_engine/tests/future_cleanup.rs
New FutureManager module tracking async operations with atomic state transitions (PendingReady/Error/Cancelled/InternalError); manages per-future waiters and cancellation tokens.
Engine Cancellation and Futures
baml_language/crates/bex_engine/src/lib.rs, baml_language/crates/bex_engine/src/conversion.rs, baml_language/crates/bex_engine/tests/cancellation.rs
Removes EngineError::Cancelled variant; adds CANCELLED_PANIC_CLASS constant and is_cancelled_engine_error() predicate; integrates FutureManager for scheduling/awaiting; refactors value conversion to use HeapPermit; updates cancellation test assertions.
VM Core and Futures
baml_language/crates/bex_vm/src/vm.rs, baml_language/crates/bex_vm/src/errors.rs, baml_language/crates/bex_vm_types/src/indexable.rs
Replaces mutable GlobalPool with VmGlobals (owned during $init, frozen to Arc<[Value]> post-init); changes Await to carry FutureId instead of HeapPtr; adds VmPanic::Cancelled and VmInternalError::StoreGlobalAfterInit; moves write barriers to heap; implements TlabHolder.
VM Utilities and Support
baml_language/crates/bex_vm/src/debug.rs, baml_language/crates/bex_vm/src/package_baml/root.rs, baml_language/crates/bex_vm/src/package_baml/unstable.rs, baml_language/crates/bex_vm_types/src/bytecode.rs
Updates debug display for slice-based globals; refactors deep copy/equality for UnscheduledFuture; adds stringification for unscheduled futures; documents StoreGlobal post-init invariant.
Codegen and Integration
baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs, baml_language/crates/bex_project/src/bex.rs, baml_language/crates/bex_project/src/lib.rs
Generated code imports HeapPermit for scope; exports CANCELLED_PANIC_CLASS, is_cancelled_engine_error, and new is_cancelled_runtime_error() helper.
LSP and Testing
baml_language/crates/baml_lsp_server/src/playground_server.rs, baml_language/crates/baml_lsp_server/tests/exceptions.rs
Replaces manual cancellation matching with is_cancelled_runtime_error() and is_cancelled_engine_error() predicates; updates panic bytecode snapshots reflecting new dispatch layout.
FFI Bridges
baml_language/crates/bridge_cffi/src/ffi/functions.rs, baml_language/crates/bridge_nodejs/src/errors.rs, baml_language/crates/bridge_python/src/errors.rs, baml_language/crates/bridge_wasm/src/lib.rs
Replaces direct EngineError::Cancelled matching with is_cancelled_engine_error() predicate in error-to-language mappings.
Type and Utility Updates
baml_language/crates/sys_types/src/lib.rs, baml_language/crates/sys_ops/src/lib.rs, baml_language/crates/tools_onionskin/src/compiler.rs
Adds Clone derives to OpError and OpErrorKind; imports HeapPermit in tests; adds UnscheduledFuture formatting.

Sequence Diagram(s)

sequenceDiagram
    participant VM as BexVm
    participant Eng as BexEngine
    participant FM as FutureManager
    participant Heap as BexHeap

    rect rgba(100, 200, 100, 0.5)
    Note over VM,Heap: Future Scheduling and Execution
    VM->>VM: DispatchFuture: allocate UnscheduledFuture
    VM->>Eng: ScheduleFuture(UnscheduledFuture ptr)
    Eng->>FM: new_future(SysOp, args)
    FM->>Heap: alloc Future::Pending(FutureId)
    FM-->>Eng: FutureId
    Eng-->>VM: yield with FutureId
    
    Note over Eng,FM: Async operation executes
    
    Eng->>FM: fulfill_future(FutureId, value)
    FM->>Heap: update Future::Ready(value)
    FM->>FM: resolve waiter
    
    VM->>Eng: Await(FutureId)
    Eng->>FM: future_ready(FutureId)
    FM-->>VM: awaitable
    VM->>VM: resume with value
    end
Loading
sequenceDiagram
    participant Init as $init Task
    participant Eng as BexEngine
    participant Heap as BexHeap
    participant VM1 as BexVm 1
    participant VM2 as BexVm 2

    rect rgba(100, 150, 200, 0.5)
    Note over Init,Eng: Global Initialization and Freezing
    Init->>Eng: run_init(VmGlobals::Owned)
    Eng->>Heap: allocate, write to globals
    Init-->>Eng: freeze()
    Eng->>Eng: Arc<[Value]> from owned
    Eng->>VM1: share VmGlobals::Shared(Arc)
    Eng->>VM2: share VmGlobals::Shared(Arc)
    
    VM1->>VM1: LoadGlobal: read from Arc
    VM2->>VM2: LoadGlobal: read from Arc
    
    VM1->>VM1: StoreGlobal: error (StoreGlobalAfterInit)
    end
Loading
sequenceDiagram
    participant VM as BexVm
    participant Eng as BexEngine
    participant FM as FutureManager
    participant User as Caller

    rect rgba(200, 100, 100, 0.5)
    Note over VM,User: Cancellation as Panic
    User->>Eng: cancel_task()
    Eng->>FM: cancel_future(FutureId)
    FM->>FM: update heap Future::Cancelled
    FM->>FM: fire CancellationToken
    
    VM->>Eng: Await(FutureId)
    Eng->>FM: future_ready(FutureId)
    FM-->>Eng: resolved (cancelled)
    Eng->>VM: UnhandledThrow(Cancelled panic)
    VM->>User: EngineError::UnhandledThrow(panic)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • GC synchronization cleanup #3405: Introduces permit/proof system for heap access control; overlaps with HeapPermit<T> trait, SharedHeapPermit<T>, and generic holder-based allocation patterns throughout engine and codegen.
  • bex_engine: add CancellationToken support across all layers #3136: Implements cancellation-as-panic architecture; shares CANCELLED_PANIC_CLASS constant, is_cancelled_engine_error() predicate, and VM panic handling for Cancelled variant.
  • New garbage collector #3386: Refactors heap/GC coordination and write-barrier mechanisms; overlaps with write_barrier() methods, GC tracing for new object types, and heap invariant checking.

Poem

🐰 Futures now await their turn so fair,
With permits guiding allocations with care,
Globals freeze after init's flight,
While cancellation sings as a panic's right,
The heap keeps watch with barriers so bright!
~🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Global futures' directly aligns with the main architectural change: futures are now global and registered in BexEngine rather than VM-scoped.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kai/global-futures

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
baml_language/crates/bex_vm/src/package_baml/root.rs (1)

205-210: ⚠️ Potential issue | 🟠 Major

Future::Error values are never equal in deep-equals.

This match now only handles Ready and Pending, so two errored futures with equivalent payloads always return false.

Proposed fix
                 (Object::Future(a_fut), Object::Future(b_fut)) => match (a_fut, b_fut) {
                     (Future::Ready(a_val), Future::Ready(b_val)) => {
                         deep_equals_recursive(vm, *a_val, *b_val, visited)
                     }
+                    (Future::Error(a_val), Future::Error(b_val)) => {
+                        deep_equals_recursive(vm, *a_val, *b_val, visited)
+                    }
                     (Future::Pending(a_id), Future::Pending(b_id)) => a_id == b_id,
                     _ => false,
                 },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm/src/package_baml/root.rs` around lines 205 - 210,
The deep-equals implementation in the match for (Object::Future(a_fut),
Object::Future(b_fut)) only compares Future::Ready and Future::Pending, so two
Future::Error cases will incorrectly return false; update that match in
deep_equals_recursive to also handle (Future::Error(a_err),
Future::Error(b_err)) by invoking deep_equals_recursive(vm, *a_err, *b_err,
visited) (and keep other cross-state cases returning false), ensuring errored
futures with equivalent payloads compare correctly.
baml_language/crates/bex_engine/src/lib.rs (1)

1255-1653: ⚠️ Potential issue | 🟠 Major

Add lib unit tests for the future/permit flow in run_event_loop and run_future.

Integration tests exist for this code (future_cleanup.rs, cancellation.rs), but per coding guidelines, unit tests are preferred. The existing test_concurrent_calls_safe in lib.rs is a placeholder and doesn't run. Add focused lib unit tests covering future cancellation, permit acquisition/release, and cleanup during completion to ensure the rewired scheduling and heap coordination is correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_engine/src/lib.rs` around lines 1255 - 1653, Add
focused unit tests that exercise run_event_loop and run_future to validate
future/permit lifecycle: create a synthetic OpFuture that resolves and one that
is cancelled/errs, drive run_event_loop to ScheduleFuture/Await paths by
invoking exec to schedule a future (using vm.unscheduled_future/new_future
behavior) and assert futures.acquire() permit is released after completion or
cancellation; test that run_future fulfills the future via fulfill_future when
Ok(value) and calls internal_error_future on Err, and that cancel_future is
invoked when cancellation token fires; reference run_event_loop, run_future,
futures.acquire/new_future/fulfill_future/cancel_future/internal_error_future,
convert_external_to_vm_value, and vm.unscheduled_future in your tests to target
the exact code paths.
🧹 Nitpick comments (1)
baml_language/crates/bex_vm/src/vm.rs (1)

3064-3159: Add a VM-level test for the new future state machine.

DispatchFuture and Await now encode the core Pending/Ready/Error/Cancelled/InternalError behavior directly in bex_vm, but the coverage called out for this change is engine-level. A small unit test here would make regressions in the VM semantics much cheaper to catch.

As per coding guidelines, "Prefer writing Rust unit tests over integration tests where possible".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm/src/vm.rs` around lines 3064 - 3159, Add unit
tests in the bex_vm crate that exercise the VM-level future state machine around
Instruction::DispatchFuture and Instruction::Await: create tests that (1)
dispatch a sys-op future and assert the VM returns VmExecState::ScheduleFuture
with a valid object index, (2) simulate a Future::Pending and verify Await
yields VmExecState::Await with the same future id and preserves instruction_ptr
in the current Frame::Bytecode, (3) simulate Future::Ready and ensure Await
pops/returns the contained value, (4) simulate Future::Error and assert Await
results in VmError::Thrown, (5) simulate Future::Cancelled and assert Await
results in a thrown Cancelled exception (via panic_to_exception_value), and (6)
simulate Future::InternalError and assert Await yields VmExecState::Await so the
engine can handle it; reference types/constructors like UnscheduledFuture,
Future::{Pending,Ready,Error,Cancelled,InternalError},
VmExecState::{ScheduleFuture,Await}, as_object_ptr, tlab.alloc and the
DispatchFuture/Await instruction handling to locate where to drive the VM state
and inject test futures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/bex_engine/src/future.rs`:
- Around line 305-309: FutureManagerInner.forward_roots currently forwards roots
for each active future but fails to invalidate its owned Tlab, allowing
allocations via new_future() to use a stale pre-GC cursor; after iterating
active_futures and calling future.forward_roots(roots) add a call to
self.tlab.invalidate() so the manager's Tlab is reset post-GC (i.e., update the
forward_roots method in FutureManagerInner to call self.tlab.invalidate() after
forwarding).
- Around line 188-214: The current complete_pending method only uses
debug_assert! to ensure the heap future is Pending, which is stripped in release
and allows stale completions to overwrite preserved errors; change the
debug-only assertion to an actual runtime check inside complete_pending: inspect
fut (the mutable heap state obtained via unsafe { entry.get_mut() }) and if it
is not bex_vm_types::Future::Pending(_), return an EngineError (create/use a
suitable variant, e.g. InvalidFutureState or FutureAlreadyCompleted) including
the future id and actual FutureType::of(fut) rather than proceeding to
overwrite; keep the subsequent assignment (*fut = new_state) and
entry.ready.set(result) only in the pending case so the original EngineError
cannot be lost.

In `@baml_language/crates/bex_engine/src/lib.rs`:
- Around line 998-1002: The cancel_function_call currently removes the
ActiveCall entry (active_calls.remove(&call_id)) which frees the CallId early;
instead look up the entry without removing it (e.g., get_mut or entry API), call
cancel() on the stored ActiveCall/CancelToken in-place, and return Ok(()); do
not delete the map entry here—leave removal to the existing ActiveCallGuard
cleanup on drop so the CallId remains reserved until the VM/execution actually
exits. Ensure you reference cancel_function_call, CallId, active_calls, and
ActiveCallGuard when making the change.

In `@baml_language/crates/bex_heap/src/heap.rs`:
- Around line 466-500: The write-barrier currently dirties cards for any
container older than the written ref and for Gen1 containers in the conservative
barrier, causing unnecessary remembered-set entries; limit marking to Gen2
containers only and in the precise barrier only mark when the written ref is
young (Generation::Gen0 or Generation::Gen1). Concretely: in write_barrier
(function write_barrier) after extracting container_gen and ref_gen via
generation_of, change the condition to only call mark_card_for_ptr when
container_gen == Generation::Gen2 && (ref_gen == Generation::Gen0 || ref_gen ==
Generation::Gen1); in conservative_write_barrier only call mark_card_for_ptr
when generation_of(container_ptr) == Generation::Gen2. Add unit tests in the
crate lib tests that assert Gen2 -> Gen0 and Gen2 -> Gen1 cause
mark_card_for_ptr to be invoked (or produce remembered-set entries) while Gen2
-> CompileTime (non-heap) does not; run cargo test --lib to verify.

In `@baml_language/crates/bex_vm/src/package_baml/root.rs`:
- Line 99: The UnscheduledFuture match arm currently reuses the inner future's
argument references causing a shallow copy; update the Object::UnscheduledFuture
branch to deep-copy the inner future and its args before allocating.
Specifically, create a new UnscheduledFuture instance where you deep-clone each
argument (e.g., map f.args through the repo's deep-copy helper or vm-based clone
routine) and any nested fields, then pass that fully-copied future into
vm.tlab.alloc(Object::UnscheduledFuture(new_future)) so the allocated object has
isolated copies of all nested mutable data.

---

Outside diff comments:
In `@baml_language/crates/bex_engine/src/lib.rs`:
- Around line 1255-1653: Add focused unit tests that exercise run_event_loop and
run_future to validate future/permit lifecycle: create a synthetic OpFuture that
resolves and one that is cancelled/errs, drive run_event_loop to
ScheduleFuture/Await paths by invoking exec to schedule a future (using
vm.unscheduled_future/new_future behavior) and assert futures.acquire() permit
is released after completion or cancellation; test that run_future fulfills the
future via fulfill_future when Ok(value) and calls internal_error_future on Err,
and that cancel_future is invoked when cancellation token fires; reference
run_event_loop, run_future,
futures.acquire/new_future/fulfill_future/cancel_future/internal_error_future,
convert_external_to_vm_value, and vm.unscheduled_future in your tests to target
the exact code paths.

In `@baml_language/crates/bex_vm/src/package_baml/root.rs`:
- Around line 205-210: The deep-equals implementation in the match for
(Object::Future(a_fut), Object::Future(b_fut)) only compares Future::Ready and
Future::Pending, so two Future::Error cases will incorrectly return false;
update that match in deep_equals_recursive to also handle (Future::Error(a_err),
Future::Error(b_err)) by invoking deep_equals_recursive(vm, *a_err, *b_err,
visited) (and keep other cross-state cases returning false), ensuring errored
futures with equivalent payloads compare correctly.

---

Nitpick comments:
In `@baml_language/crates/bex_vm/src/vm.rs`:
- Around line 3064-3159: Add unit tests in the bex_vm crate that exercise the
VM-level future state machine around Instruction::DispatchFuture and
Instruction::Await: create tests that (1) dispatch a sys-op future and assert
the VM returns VmExecState::ScheduleFuture with a valid object index, (2)
simulate a Future::Pending and verify Await yields VmExecState::Await with the
same future id and preserves instruction_ptr in the current Frame::Bytecode, (3)
simulate Future::Ready and ensure Await pops/returns the contained value, (4)
simulate Future::Error and assert Await results in VmError::Thrown, (5) simulate
Future::Cancelled and assert Await results in a thrown Cancelled exception (via
panic_to_exception_value), and (6) simulate Future::InternalError and assert
Await yields VmExecState::Await so the engine can handle it; reference
types/constructors like UnscheduledFuture,
Future::{Pending,Ready,Error,Cancelled,InternalError},
VmExecState::{ScheduleFuture,Await}, as_object_ptr, tlab.alloc and the
DispatchFuture/Await instruction handling to locate where to drive the VM state
and inject test futures.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f3e284f1-fe81-4efe-9426-5aebe2b4a4bc

📥 Commits

Reviewing files that changed from the base of the PR and between 96b6edb and c4b2886.

⛔ Files ignored due to path filters (5)
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (34)
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_panics/panics.baml
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_lsp_server/src/playground_server.rs
  • baml_language/crates/baml_tests/tests/exceptions.rs
  • baml_language/crates/bex_engine/src/conversion.rs
  • baml_language/crates/bex_engine/src/future.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/bex_engine/tests/cancellation.rs
  • baml_language/crates/bex_engine/tests/future_cleanup.rs
  • baml_language/crates/bex_heap/src/accessor.rs
  • baml_language/crates/bex_heap/src/gc.rs
  • baml_language/crates/bex_heap/src/heap.rs
  • baml_language/crates/bex_heap/src/heap_debugger/real.rs
  • baml_language/crates/bex_heap/src/heap_guard.rs
  • baml_language/crates/bex_heap/src/lib.rs
  • baml_language/crates/bex_heap/src/tlab.rs
  • baml_language/crates/bex_project/src/bex.rs
  • baml_language/crates/bex_project/src/lib.rs
  • baml_language/crates/bex_vm/src/debug.rs
  • baml_language/crates/bex_vm/src/errors.rs
  • baml_language/crates/bex_vm/src/package_baml/root.rs
  • baml_language/crates/bex_vm/src/package_baml/unstable.rs
  • baml_language/crates/bex_vm/src/vm.rs
  • baml_language/crates/bex_vm_types/src/bytecode.rs
  • baml_language/crates/bex_vm_types/src/indexable.rs
  • baml_language/crates/bex_vm_types/src/lib.rs
  • baml_language/crates/bex_vm_types/src/types.rs
  • baml_language/crates/bridge_cffi/src/ffi/functions.rs
  • baml_language/crates/bridge_nodejs/src/errors.rs
  • baml_language/crates/bridge_python/src/errors.rs
  • baml_language/crates/bridge_wasm/src/lib.rs
  • baml_language/crates/sys_ops/src/lib.rs
  • baml_language/crates/sys_types/src/lib.rs
  • baml_language/crates/tools_onionskin/src/compiler.rs

Comment on lines +188 to +214
fn complete_pending(
&mut self,
id: FutureId,
new_state: bex_vm_types::Future,
result: Result<(), EngineError>,
) -> Result<FutureState, EngineError> {
let mut entry = self
.inner
.active_futures
.remove(&id)
.ok_or(EngineError::FutureNotFound { future_id: id })?;
// SAFETY: the `FutureManagerGuard` holds an exclusive heap permit.
let fut = unsafe { entry.get_mut() }?;
debug_assert!(
matches!(fut, bex_vm_types::Future::Pending(_)),
"complete_pending called with non-Pending heap state for {id:?} \
(actual: {:?}); invariant violated — only fulfill/err/cancel may \
route through this helper",
FutureType::of(fut)
);
*fut = new_state;
let set = entry.ready.set(result);
debug_assert!(
set.is_ok(),
"Should not have been ready if the heap future was pending."
);
Ok(entry)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Enforce the Pending-only completion invariant in release builds.

After Line 197 removes the entry, the only guard against a stale second completion is the debug_assert! at Lines 201-207. In release builds that means a late fulfill_future/err_future/cancel_future can overwrite a leaked InternalError, drop the registry entry, and make future_ready() lose the original EngineError that this module is explicitly trying to preserve.

Suggested direction
 fn complete_pending(
     &mut self,
     id: FutureId,
     new_state: bex_vm_types::Future,
     result: Result<(), EngineError>,
 ) -> Result<FutureState, EngineError> {
-    let mut entry = self
-        .inner
-        .active_futures
-        .remove(&id)
-        .ok_or(EngineError::FutureNotFound { future_id: id })?;
+    let entry = self
+        .inner
+        .active_futures
+        .get_mut(&id)
+        .ok_or(EngineError::FutureNotFound { future_id: id })?;
     // SAFETY: the `FutureManagerGuard` holds an exclusive heap permit.
     let fut = unsafe { entry.get_mut() }?;
-    debug_assert!(
-        matches!(fut, bex_vm_types::Future::Pending(_)),
-        "complete_pending called with non-Pending heap state for {id:?} \
-         (actual: {:?}); invariant violated — only fulfill/err/cancel may \
-         route through this helper",
-        FutureType::of(fut)
-    );
+    if !matches!(fut, bex_vm_types::Future::Pending(_)) {
+        return Err(EngineError::FutureNotFound { future_id: id });
+    }
     *fut = new_state;
     let set = entry.ready.set(result);
     debug_assert!(
         set.is_ok(),
         "Should not have been ready if the heap future was pending."
     );
-    Ok(entry)
+    Ok(self
+        .inner
+        .active_futures
+        .remove(&id)
+        .expect("future entry must still exist after terminal transition"))
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn complete_pending(
&mut self,
id: FutureId,
new_state: bex_vm_types::Future,
result: Result<(), EngineError>,
) -> Result<FutureState, EngineError> {
let mut entry = self
.inner
.active_futures
.remove(&id)
.ok_or(EngineError::FutureNotFound { future_id: id })?;
// SAFETY: the `FutureManagerGuard` holds an exclusive heap permit.
let fut = unsafe { entry.get_mut() }?;
debug_assert!(
matches!(fut, bex_vm_types::Future::Pending(_)),
"complete_pending called with non-Pending heap state for {id:?} \
(actual: {:?}); invariant violated — only fulfill/err/cancel may \
route through this helper",
FutureType::of(fut)
);
*fut = new_state;
let set = entry.ready.set(result);
debug_assert!(
set.is_ok(),
"Should not have been ready if the heap future was pending."
);
Ok(entry)
fn complete_pending(
&mut self,
id: FutureId,
new_state: bex_vm_types::Future,
result: Result<(), EngineError>,
) -> Result<FutureState, EngineError> {
let entry = self
.inner
.active_futures
.get_mut(&id)
.ok_or(EngineError::FutureNotFound { future_id: id })?;
// SAFETY: the `FutureManagerGuard` holds an exclusive heap permit.
let fut = unsafe { entry.get_mut() }?;
if !matches!(fut, bex_vm_types::Future::Pending(_)) {
return Err(EngineError::FutureNotFound { future_id: id });
}
*fut = new_state;
let set = entry.ready.set(result);
debug_assert!(
set.is_ok(),
"Should not have been ready if the heap future was pending."
);
Ok(self
.inner
.active_futures
.remove(&id)
.expect("future entry must still exist after terminal transition"))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_engine/src/future.rs` around lines 188 - 214, The
current complete_pending method only uses debug_assert! to ensure the heap
future is Pending, which is stripped in release and allows stale completions to
overwrite preserved errors; change the debug-only assertion to an actual runtime
check inside complete_pending: inspect fut (the mutable heap state obtained via
unsafe { entry.get_mut() }) and if it is not bex_vm_types::Future::Pending(_),
return an EngineError (create/use a suitable variant, e.g. InvalidFutureState or
FutureAlreadyCompleted) including the future id and actual FutureType::of(fut)
rather than proceeding to overwrite; keep the subsequent assignment (*fut =
new_state) and entry.ready.set(result) only in the pending case so the original
EngineError cannot be lost.

Comment on lines +305 to +309
fn forward_roots(&mut self, roots: &HashMap<HeapPtr, HeapPtr>) {
for future in self.active_futures.values_mut() {
future.forward_roots(roots);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Invalidate the manager TLAB after GC forwarding.

FutureManagerInner owns a Tlab and allocates in new_future(), but forward_roots() never calls self.tlab.invalidate(). BexVm::forward_roots() does this explicitly after GC; without the same reset here, the manager can keep allocating with a pre-GC cursor after semispace swap.

Suggested fix
 impl RootHaver for FutureManagerInner {
     fn collect_roots(&self, roots: &mut Vec<HeapPtr>) {
         // blocking is fine since we should only ever call this while holding exclusive heap access
         for future in self.active_futures.values() {
             future.collect_roots(roots);
         }
     }
     fn forward_roots(&mut self, roots: &HashMap<HeapPtr, HeapPtr>) {
+        self.tlab.invalidate();
         for future in self.active_futures.values_mut() {
             future.forward_roots(roots);
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn forward_roots(&mut self, roots: &HashMap<HeapPtr, HeapPtr>) {
for future in self.active_futures.values_mut() {
future.forward_roots(roots);
}
}
fn forward_roots(&mut self, roots: &HashMap<HeapPtr, HeapPtr>) {
self.tlab.invalidate();
for future in self.active_futures.values_mut() {
future.forward_roots(roots);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_engine/src/future.rs` around lines 305 - 309,
FutureManagerInner.forward_roots currently forwards roots for each active future
but fails to invalidate its owned Tlab, allowing allocations via new_future() to
use a stale pre-GC cursor; after iterating active_futures and calling
future.forward_roots(roots) add a call to self.tlab.invalidate() so the
manager's Tlab is reset post-GC (i.e., update the forward_roots method in
FutureManagerInner to call self.tlab.invalidate() after forwarding).

Comment on lines 998 to 1002
pub fn cancel_function_call(&self, call_id: CallId) -> Result<(), EngineError> {
let mut active_calls = self.active_calls.lock().unwrap();
if let Some(cancel) = active_calls.remove(&call_id) {
cancel.cancel();
if let Some(call) = active_calls.remove(&call_id) {
call.cancel();
Ok(())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the CallId reserved until the VM actually exits.

Removing the entry here lets a second call_function reuse the same call_id while the first call is still unwinding. That can mix events under one ID, and the old ActiveCallGuard can then delete the new entry on drop. Cancel the token in place and leave removal to the normal guard cleanup.

Suggested fix
-        let mut active_calls = self.active_calls.lock().unwrap();
-        if let Some(call) = active_calls.remove(&call_id) {
+        let active_calls = self.active_calls.lock().unwrap();
+        if let Some(call) = active_calls.get(&call_id) {
             call.cancel();
             Ok(())
         } else {
             Err(EngineError::FunctionCallNotFound { call_id })
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn cancel_function_call(&self, call_id: CallId) -> Result<(), EngineError> {
let mut active_calls = self.active_calls.lock().unwrap();
if let Some(cancel) = active_calls.remove(&call_id) {
cancel.cancel();
if let Some(call) = active_calls.remove(&call_id) {
call.cancel();
Ok(())
pub fn cancel_function_call(&self, call_id: CallId) -> Result<(), EngineError> {
let active_calls = self.active_calls.lock().unwrap();
if let Some(call) = active_calls.get(&call_id) {
call.cancel();
Ok(())
} else {
Err(EngineError::FunctionCallNotFound { call_id })
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_engine/src/lib.rs` around lines 998 - 1002, The
cancel_function_call currently removes the ActiveCall entry
(active_calls.remove(&call_id)) which frees the CallId early; instead look up
the entry without removing it (e.g., get_mut or entry API), call cancel() on the
stored ActiveCall/CancelToken in-place, and return Ok(()); do not delete the map
entry here—leave removal to the existing ActiveCallGuard cleanup on drop so the
CallId remains reserved until the VM/execution actually exits. Ensure you
reference cancel_function_call, CallId, active_calls, and ActiveCallGuard when
making the change.

Comment on lines +466 to +500
/// Write barrier for field/element/cell writes.
///
/// Called *before* the actual field write at each mutation site. If `container_ptr`
/// is in an older generation than the object being written (`written_value`), the
/// card containing `container_ptr` is marked dirty so partial GC can discover
/// the cross-generation reference.
///
/// This is a no-op when either side is not a heap object, or when the container
/// is in Gen0 (no card table for Gen0).
#[inline]
pub fn write_barrier(&self, container_ptr: HeapPtr, written_value: Value) {
if let Value::Object(ref_ptr) = written_value {
let container_gen = self.generation_of(container_ptr);
let ref_gen = self.generation_of(ref_ptr);
if container_gen > ref_gen {
self.mark_card_for_ptr(container_ptr);
}
}
}

/// Conservative write barrier for mutable accessor paths (builtin dispatch).
///
/// Unconditionally marks the card dirty if `container_ptr` is in an older
/// generation. Used by `as_array_mut` / `as_map_mut` where the actual written
/// value is not yet known (it's supplied by the callee trait method).
///
/// This over-marks (any mutable access to an older-gen object dirties the card),
/// but it is always safe and the cost is negligible since most objects are Gen0.
#[inline]
pub fn conservative_write_barrier(&self, container_ptr: HeapPtr) {
let container_gen = self.generation_of(container_ptr);
if container_gen > Generation::Gen0 {
self.mark_card_for_ptr(container_ptr);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Only dirty cards for Gen2 → young edges.

mark_card_for_ptr() only tracks remembered-set entries for gen2, but these predicates still route Gen1 containers and Gen2 -> CompileTime writes through it. That adds avoidable hot-path scans, and the precise barrier will also dirty cards for references minor GC never needs to revisit. Tighten this to Generation::Gen2 containers, and in the precise barrier only mark when the written ref is young (Gen0/Gen1). Please add a lib test for Gen2 -> Gen0/Gen1 vs Gen2 -> CompileTime while you’re here.

♻️ Proposed fix
 pub fn write_barrier(&self, container_ptr: HeapPtr, written_value: Value) {
     if let Value::Object(ref_ptr) = written_value {
         let container_gen = self.generation_of(container_ptr);
         let ref_gen = self.generation_of(ref_ptr);
-        if container_gen > ref_gen {
+        if matches!(container_gen, Generation::Gen2) && ref_gen.is_young() {
             self.mark_card_for_ptr(container_ptr);
         }
     }
 }
 
 pub fn conservative_write_barrier(&self, container_ptr: HeapPtr) {
-    let container_gen = self.generation_of(container_ptr);
-    if container_gen > Generation::Gen0 {
+    if matches!(self.generation_of(container_ptr), Generation::Gen2) {
         self.mark_card_for_ptr(container_ptr);
     }
 }

As per coding guidelines "Prefer writing Rust unit tests over integration tests where possible" and "Always run cargo test --lib if you changed any Rust code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_heap/src/heap.rs` around lines 466 - 500, The
write-barrier currently dirties cards for any container older than the written
ref and for Gen1 containers in the conservative barrier, causing unnecessary
remembered-set entries; limit marking to Gen2 containers only and in the precise
barrier only mark when the written ref is young (Generation::Gen0 or
Generation::Gen1). Concretely: in write_barrier (function write_barrier) after
extracting container_gen and ref_gen via generation_of, change the condition to
only call mark_card_for_ptr when container_gen == Generation::Gen2 && (ref_gen
== Generation::Gen0 || ref_gen == Generation::Gen1); in
conservative_write_barrier only call mark_card_for_ptr when
generation_of(container_ptr) == Generation::Gen2. Add unit tests in the crate
lib tests that assert Gen2 -> Gen0 and Gen2 -> Gen1 cause mark_card_for_ptr to
be invoked (or produce remembered-set entries) while Gen2 -> CompileTime
(non-heap) does not; run cargo test --lib to verify.

Object::Variant(v) => vm.tlab.alloc(Object::Variant(v)),
Object::RustData(arc) => vm.tlab.alloc(Object::RustData(Arc::clone(&arc))),
Object::Future(f) => vm.tlab.alloc(Object::Future(f)),
Object::UnscheduledFuture(f) => vm.tlab.alloc(Object::UnscheduledFuture(f)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

UnscheduledFuture deep copy is currently shallow on args.

Line 99 copies the wrapper object but reuses argument object references, which breaks deep-copy isolation for nested mutable data.

Proposed fix
-                Object::UnscheduledFuture(f) => vm.tlab.alloc(Object::UnscheduledFuture(f)),
+                Object::UnscheduledFuture(mut f) => {
+                    for arg in &mut f.args {
+                        *arg = deep_copy_value_recursive(vm, *arg, copied_objects);
+                    }
+                    vm.tlab.alloc(Object::UnscheduledFuture(f))
+                }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Object::UnscheduledFuture(f) => vm.tlab.alloc(Object::UnscheduledFuture(f)),
Object::UnscheduledFuture(mut f) => {
for arg in &mut f.args {
*arg = deep_copy_value_recursive(vm, *arg, copied_objects);
}
vm.tlab.alloc(Object::UnscheduledFuture(f))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm/src/package_baml/root.rs` at line 99, The
UnscheduledFuture match arm currently reuses the inner future's argument
references causing a shallow copy; update the Object::UnscheduledFuture branch
to deep-copy the inner future and its args before allocating. Specifically,
create a new UnscheduledFuture instance where you deep-clone each argument
(e.g., map f.args through the repo's deep-copy helper or vm-based clone routine)
and any nested fields, then pass that fully-copied future into
vm.tlab.alloc(Object::UnscheduledFuture(new_future)) so the allocated object has
isolated copies of all nested mutable data.

@github-actions
Copy link
Copy Markdown

Binary size checks passed

5 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 6.1 MB 5.7 MB +386.1 KB (+6.8%) OK
bridge_cffi-stripped Linux 6.0 MB 5.7 MB +351.5 KB (+6.2%) OK
bridge_cffi macOS 5.0 MB 4.6 MB +364.0 KB (+7.9%) OK
bridge_cffi-stripped macOS 5.0 MB 4.7 MB +295.6 KB (+6.3%) OK
bridge_wasm WASM 3.3 MB 3.2 MB +54.7 KB (+1.7%) OK

Generated by cargo size-gate · workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant