Skip to content

baml_language: packaging and running#3399

Open
codeshaunted wants to merge 6 commits intocanaryfrom
avery/pkg
Open

baml_language: packaging and running#3399
codeshaunted wants to merge 6 commits intocanaryfrom
avery/pkg

Conversation

@codeshaunted
Copy link
Copy Markdown
Collaborator

@codeshaunted codeshaunted commented Apr 23, 2026

Summary by CodeRabbit

  • New Features

    • baml pack produces standalone executables embedding BAML programs; packaged binaries run via a bundled host.
    • New auto-CLI for typed entry points: flag parsing, inline/@file/STDIN JSON args, and per-target --help.
    • Selectable output formats (json / debug).
  • Behavior

    • help parameter is reserved and rejected.
    • baml.sys.exit(code) now yields the process exit code (clamped).
  • Tests

    • New unit and end-to-end tests covering packing, runtime, outputs, and exit-code behavior.

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Apr 23, 2026 7:57pm
promptfiddle Ready Ready Preview, Comment Apr 23, 2026 7:57pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Adds a packaging pipeline: new shared execution crate (baml_exec), a baml pack CLI and baml-pack-host binary, wiring for catchable baml.sys.exit(code) through VM/engine to process exit, and broad serde derives/impls across VM, bytecode, and runtime types to enable envelope serialization.

Changes

Cohort / File(s) Summary
Workspace manifest & feature updates
baml_language/Cargo.toml, baml_language/stow.toml
Add baml_exec and baml_pack_host to workspace; enable serde features for various workspace deps and update stow allowlists.
CLI: pack command & integration
baml_language/crates/baml_cli/Cargo.toml, baml_language/crates/baml_cli/src/commands.rs, .../src/lib.rs, .../src/main.rs, .../src/pack_command.rs, .../src/project_load.rs, .../src/run_command.rs
Add baml pack subcommand, implement packing logic, integrate baml_exec APIs, change ExitCode handling (deferred exit), centralize diagnostics checking, and refactor run command to use shared exec/dispatch.
Pack host binary
baml_language/crates/baml_pack_host/Cargo.toml, .../build.rs, .../src/main.rs
New baml-pack-host binary that reads embedded PackEnvelope, initializes engine, dispatches targets, prints help, and converts DispatchResult → process outcomes.
Shared execution crate (baml_exec)
baml_language/crates/baml_exec/Cargo.toml, .../src/lib.rs, .../src/auto_cli.rs, .../src/dispatch.rs, .../src/envelope.rs, .../src/json_coerce.rs, .../src/output.rs
New crate exposing auto-CLI parsing, signature→args builders, JSON coercion, dispatch helpers, PackEnvelope wire type, output formatting, and re-exports for CLI/host reuse.
Packaging tests & helpers
baml_language/crates/baml_cli/tests/common/mod.rs, baml_language/crates/baml_cli/tests/pack_e2e.rs
Add test helper to build artifacts once, project fixture writer, and E2E tests validating packing, packed runtime execution, output-format semantics, and exit-code propagation.
Builtins / Exit panic & syscall
baml_language/crates/baml_builtins2/.../ns_panics/panics.baml, .../ns_sys/sys.baml, baml_language/crates/baml_builtins2_codegen/.../codegen.rs, .../codegen_io.rs
Introduce Exit { code } panic class and baml.sys.exit(code) builtin; mark baml.sys.exit fallible in codegen and add serde derives to generated SysOp.
VM & engine panic propagation
baml_language/crates/bex_vm/src/errors.rs, .../src/package_baml/sys.rs, .../src/vm.rs, baml_language/crates/bex_engine/src/lib.rs
Add VmPanic::Exit, map it to baml.panics.Exit, implement sys.exit to panic, and propagate uncaught Exit into EngineError::Exit { code }.
Serde across types & bytecode
baml_language/crates/baml_base/Cargo.toml, .../src/attr.rs, .../src/core_types.rs, baml_language/crates/baml_type/Cargo.toml, .../src/lib.rs, baml_language/crates/bex_vm_types/Cargo.toml, .../src/bytecode.rs, .../src/heap_ptr.rs, .../src/indexable.rs, .../src/types.rs
Add serde derives and custom (de)serializers across core attr/types, type representations, bytecode, heap/index/pool wrappers, and runtime object/value types; skip or adapt non-serializable internals (e.g., raw pointers).
Codegen/IO adjustments & snapshots
baml_language/crates/bex_vm_types/..., baml_language/crates/baml_tests/tests/exceptions.rs
Update generated types to be serde-compatible; update bytecode snapshot(s) to reflect panic/exit dispatch changes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as baml-cli
    participant Compiler
    participant Packager
    participant Host as baml-pack-host
    participant Engine

    User->>CLI: baml pack --target main --output-format json
    CLI->>Compiler: compile project → Program + metadata
    CLI->>Packager: create PackEnvelope(program,target,format)
    Packager->>Packager: serialize envelope (bitcode / serde)
    Packager->>Packager: embed envelope into host binary
    Packager-->>User: write packed executable

    User->>Host: ./packed_binary arg1 arg2
    Host->>Host: extract PackEnvelope from binary section
    Host->>Engine: initialize with Program + argv
    Host->>Host: parse CLI args (auto_cli / json_coerce)
    Host->>Engine: dispatch_target(args)
    Engine-->>Host: Ok | Exit(code) | TargetError
    Host->>Host: write_output (json/debug) or std::process::exit(code)
    Host-->>User: stdout + exit code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I stitched a program snug and neat,
Into one host, a single treat.
Serde sewed the bytes so fine,
Exit codes hop home on time,
Hooray — a packed rabbit feat! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'baml_language: packaging and running' directly and comprehensively reflects the core changes: it introduces packaging functionality (baml pack command, PackEnvelope, baml-pack-host binary) and execution/running improvements (baml run refactoring, dispatch helpers, exit code handling).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch avery/pkg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
baml_language/crates/bex_vm_types/src/indexable.rs (1)

274-279: ⚠️ Potential issue | 🟡 Minor

heap_debug changes the wire format of ObjectIndex.

With heap_debug enabled, ObjectIndex serializes as the struct { raw, epoch }; without it, ObjectIndex = Index<ObjectKind> serializes as a bare usize (see lines 38–48). A pack produced by a heap_debug-enabled builder cannot be loaded by a non-debug host (and vice-versa) without a cryptic serde error.

If packs are always produced/consumed with the same feature set this is fine, but consider either gating heap_debug off for pack-producing builds or normalizing the serialized representation to just the raw index under both configs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/indexable.rs` around lines 274 - 279,
The heap_debug feature changes ObjectIndex's wire format (ObjectIndex struct
with raw and epoch vs Index<ObjectKind> as usize), causing incompatible serde
output; fix by normalizing the serialized representation: keep the in-memory
ObjectIndex struct (raw, epoch) under #[cfg(feature = "heap_debug")] but
implement custom Serialize/Deserialize for ObjectIndex that
serializes/deserializes only the raw usize (raw) so it matches the non-debug
Index<ObjectKind> representation, or alternatively disable/gate heap_debug for
pack-producing builds so packs are always produced with the same feature set.
baml_language/crates/bex_vm_types/src/types.rs (1)

548-560: ⚠️ Potential issue | 🟡 Minor

Defensive serialization guard needed for HeapPtr in Value::Object.

The lossy HeapPtr serde implementation (serializes to unit, deserializes as null) is intentional and safe in current code—packed output correctly uses ConstValue::Object(ObjectIndex) instead of Value::Object(HeapPtr). However, a future refactor could accidentally serialize a live Value containing an active pointer. Add a debug_assert! in HeapPtr::serialize() to detect this at the choke point, rather than silently losing data:

Example guard
impl Serialize for HeapPtr {
    fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
        debug_assert!(self.ptr.is_null(), "Attempting to serialize a non-null HeapPtr; use ConstValue instead");
        serializer.serialize_unit()
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/types.rs` around lines 548 - 560, Add a
defensive debug assertion in the HeapPtr serialization choke point: inside the
impl Serialize for HeapPtr (the HeapPtr::serialize method) assert that the raw
pointer is null (e.g., debug_assert!(self.ptr.is_null(), "...")) so any
accidental attempt to serialize a live Value::Object is caught during debug
runs; keep the existing unit serialization but include a message suggesting to
use ConstValue::Object(ObjectIndex) instead of serializing live HeapPtr.
🧹 Nitpick comments (6)
baml_language/crates/bex_vm_types/src/bytecode.rs (1)

840-876: Document the post-deserialize resolve_constants requirement.

resolved_constants is #[serde(skip)], so after Bytecode::deserialize the field is empty and any LoadConst would index out-of-bounds. It's reconstructed from constants in bex_heap/src/heap.rs:250, but that contract isn't obvious from this file. A one-line doc note on the field (or on Bytecode) would prevent new loader code paths from skipping the step.

📝 Proposed doc tweak
     /// Resolved constants (runtime, populated at load time).
-    /// Contains `HeapPtr` for object references. Used by `LoadConst`.
+    /// Contains `HeapPtr` for object references. Used by `LoadConst`.
+    ///
+    /// Skipped during (de)serialization; callers loading a packed `Bytecode`
+    /// MUST invoke [`Bytecode::resolve_constants`] before executing, otherwise
+    /// `LoadConst` will panic with an out-of-bounds index.
     #[serde(skip)]
     pub resolved_constants: Vec<crate::Value>,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/bytecode.rs` around lines 840 - 876,
Bytecode's resolved_constants field is skipped by serde and must be
reconstructed after deserialization (otherwise LoadConst will index OOB); add a
one-line doc note on the resolved_constants field (or on the Bytecode struct)
stating that deserializing Bytecode leaves resolved_constants empty and callers
must call the runtime reconstruction function (e.g. resolve_constants) to
populate it from constants before executing or loading instructions like
LoadConst.
baml_language/crates/bex_vm_types/src/types.rs (2)

248-287: Simpler FunctionKind Serialize avoids the unreachable! arm.

The current impl matches self twice (once to intercept Native, once inside the else branch, with an unreachable! for Native that can never fire). You can express the same intent in a single match by building FunctionKindRef directly:

♻️ Proposed simplification
 impl Serialize for FunctionKind {
     fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
-        // Native pointers are runtime-only; serialize as NativeUnresolved.
-        match self {
-            Self::Native(_) => Self::NativeUnresolved.serialize(serializer),
-            _ => {
-                #[derive(Serialize)]
-                enum FunctionKindRef<'a> {
-                    Bytecode,
-                    SysOp(&'a SysOp),
-                    NativeUnresolved,
-                }
-                match self {
-                    Self::Bytecode => FunctionKindRef::Bytecode.serialize(serializer),
-                    Self::SysOp(op) => FunctionKindRef::SysOp(op).serialize(serializer),
-                    Self::NativeUnresolved => {
-                        FunctionKindRef::NativeUnresolved.serialize(serializer)
-                    }
-                    Self::Native(_) => unreachable!(),
-                }
-            }
-        }
+        // Native pointers are runtime-only; serialize as NativeUnresolved.
+        #[derive(Serialize)]
+        enum FunctionKindRef<'a> {
+            Bytecode,
+            SysOp(&'a SysOp),
+            NativeUnresolved,
+        }
+        let proxy = match self {
+            Self::Bytecode => FunctionKindRef::Bytecode,
+            Self::SysOp(op) => FunctionKindRef::SysOp(op),
+            Self::NativeUnresolved | Self::Native(_) => FunctionKindRef::NativeUnresolved,
+        };
+        proxy.serialize(serializer)
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/types.rs` around lines 248 - 287, The
Serialize impl for FunctionKind currently matches self twice and contains an
unreachable! arm; replace it by matching self only once inside impl Serialize
for FunctionKind: define the inner serde-annotated enum FunctionKindRef<'a> {
Bytecode, SysOp(&'a SysOp), NativeUnresolved } and then match on self once to
construct the appropriate FunctionKindRef (map Self::Native(_) to
FunctionKindRef::NativeUnresolved) and call serialize on that; update the impl
Serialize for FunctionKind to use this single-match approach and remove the
unreachable! path.

775-825: Consider validating Object variants upstream of serialization.

Returning serde::ser::Error::custom from Object::serialize surfaces the failure correctly, but for some formats (binary, streaming) it happens mid-stream after other sibling fields have already been written — producing a partial/corrupt output file that a user then has to clean up. Since RustData/Collector/Sentinel are statically known to be runtime-only, a pack-writer precondition check (walk the ObjectPool once, fail fast with a pack-level error naming the offending object/index) would give a cleaner UX than a mid-stream serde error.

Nice touch that the match self exhaustiveness on Serialize means any future Object variant will force a compile error here, so the proxy stays in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/types.rs` around lines 775 - 825, Add
an upfront validation pass that walks the collection being written (e.g., the
ObjectPool or whatever container feeds Object::serialize) before any serde
serialization begins to detect runtime-only variants (RustData, Collector, and
#[cfg(feature = "heap_debug")] Sentinel) and fail with a clear pack-level error
naming the offending object/index; specifically, implement a function (e.g.,
validate_objects_before_pack or a method on ObjectPool) that iterates objects
and returns a Result<(), PackError> if any invalid Object variants are found,
call this validation from the pack-writer entry point before invoking
Object::serialize, and return the pack-level error instead of allowing serde to
emit a mid-stream serde::ser::Error from Object::serialize.
baml_language/crates/baml_cli/src/project_load.rs (1)

40-79: LGTM, with a small style nit.

Logic is correct — warnings are returned cleanly; errors are rendered to stderr with full source context and propagate a caller-specified bail message. The #[cfg(test)] mod tests pair covers both branches.

Optional: errors.iter().copied().cloned().collect::<Vec<_>>() re-copies the references just to re-clone into owned — you can build the owned list once up front:

♻️ Proposed simplification
-    let errors: Vec<_> = diagnostics
-        .iter()
-        .filter(|d| d.severity == Severity::Error)
-        .collect();
-    let warnings: Vec<Diagnostic> = diagnostics
+    let errors: Vec<Diagnostic> = diagnostics
+        .iter()
+        .filter(|d| d.severity == Severity::Error)
+        .cloned()
+        .collect();
+    let warnings: Vec<Diagnostic> = diagnostics
         .iter()
         .filter(|d| d.severity == Severity::Warning)
         .cloned()
         .collect();
@@
-    let rendered = render::render_diagnostics(
-        &errors.iter().copied().cloned().collect::<Vec<_>>(),
-        &sources,
+    let rendered = render::render_diagnostics(
+        &errors,
+        &sources,
         &file_paths,
         &render::RenderConfig::cli(),
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/project_load.rs` around lines 40 - 79, The
code currently builds `errors` as references then later does
`errors.iter().copied().cloned().collect::<Vec<_>>()`, causing redundant
copying/cloning; modify `check_project_diagnostics` to produce an owned
Vec<Diagnostic> directly (e.g., replace the `errors: Vec<_> =
diagnostics.iter().filter(|d| d.severity == Severity::Error).collect();` with a
cloned/owned collection) and pass that owned Vec to
`render::render_diagnostics`; keep the existing `warnings` handling and the rest
of the logic (variables/functions to locate: `check_project_diagnostics`,
`diagnostics`, `errors`, `warnings`, and the call to
`render::render_diagnostics`).
baml_language/crates/baml_cli/src/commands.rs (1)

1-3: Update the command list comment to include Pack.

The dispatcher now wires baml pack, but the file header still describes the old command set.

Documentation cleanup
-// Wires up the BAML CLI subcommands: Run, Describe, Generate, Grep, Test,
-// Format, and LanguageServer. `baml run` is the top-level entry for
-// standalone execution.
+// Wires up the BAML CLI subcommands: Run, Pack, Describe, Generate, Grep,
+// Test, Format, and LanguageServer. `baml run` and `baml pack` are the
+// top-level entries for standalone execution and packaging.

Also applies to: 75-76

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/commands.rs` around lines 1 - 3, The file
header comment in commands.rs is outdated: it lists the CLI subcommands as "Run,
Describe, Generate, Grep, Test, Format, and LanguageServer" but the dispatcher
now also wires a Pack command; update that top-of-file comment (and the similar
comment around lines 75-76) to include "Pack" so the documented command list
matches the actual dispatcher wiring (look for references to the dispatcher and
the Pack subcommand names in this file to ensure consistency).
baml_language/crates/bex_vm_types/Cargo.toml (1)

30-38: Remove the now-redundant dev-dependency entry.

serde is now a normal dependency, so the identical [dev-dependencies] entry is no longer needed unless tests intentionally require different features.

Cleanup diff
 [dev-dependencies]
 baml_compiler2_emit = { workspace = true }
 baml_tests = { workspace = true }
 anyhow = { workspace = true }
 divan = { workspace = true }
 indexmap = { workspace = true }
-serde = { workspace = true }
 serde_json = { workspace = true }
 tokio = { workspace = true, features = [ "rt", "macros" ] }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/Cargo.toml` around lines 30 - 38, Remove
the duplicate dev-dependency entry for serde: open the Cargo.toml and delete the
serde = { workspace = true } line under the [dev-dependencies] section so serde
remains only as a normal dependency; ensure no test-specific features require a
separate dev-dependency entry and run cargo check to confirm the manifest is
valid (look for the serde entry and the [dev-dependencies] header to locate the
line).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_cli/src/pack_command.rs`:
- Around line 221-226: The error text shown when engine.function_exists("main")
is false for a standalone .baml target incorrectly suggests using `--function
<name>` (which is mutually exclusive with a positional .baml target); update the
remediation message in the branch that checks engine.function_exists("main") to
instruct the user to either supply a non-standalone package target when using
`--function <name>` or add a `main` function to the standalone file — reference
the existing check engine.function_exists("main") and change the message to
explicitly say "Add a `main` function to hello.baml, or pack a package and use
`--function <name>`" (or similar wording) so the suggested invocation is
actually possible.
- Around line 293-313: The write_executable function currently treats any
non-Linux/non-Windows OS as Mach-O; change the OS dispatch to explicitly check
for macOS (e.g., target == "macos" or target.contains("darwin")) and only run
the libsui::Macho code when macOS is detected, and for all other targets return
an explicit error (use anyhow::bail or return Err with context) indicating the
OS is unsupported for embedding PACK_SECTION_NAME; keep the existing paths for
libsui::Elf (Linux) and libsui::PortableExecutable (Windows) and ensure the
error message names write_executable and the unsupported target string for
easier debugging.

In `@baml_language/crates/baml_cli/tests/common/mod.rs`:
- Around line 57-73: The build step in ensure_built() always runs plain `cargo
build` so it can produce debug binaries while profile() may return "release";
modify the Command construction in ensure_built() so it builds the same profile
that profile() returns — e.g., when profile() == "release" add the "--release"
flag (or otherwise pass the appropriate cargo flag for the requested profile) to
the args for Command::new(&cargo). Ensure BUILT.get_or_init closure uses the
adjusted args so baml_cli and baml_pack_host are built into the same
target/<profile> directory you later look up.

In `@baml_language/crates/baml_cli/tests/pack_e2e.rs`:
- Around line 167-184: The test pack_e2e_output_format_debug_baked_in currently
only asserts stdout.contains("hello"), which passes for both debug and JSON
outputs; update the assertion to distinguish debug formatting by asserting exact
expected debug output (for example assert_eq!(stdout, "hello\n") or
assert_eq!(stdout.trim_end(), "hello") and assert!(!stdout.starts_with('\"'))),
or else change the fixture passed to pack_project to a value that renders
differently in debug vs JSON and assert the exact debug rendering; locate the
pack_e2e_output_format_debug_baked_in function and adjust the assertions around
the run(&bin, &[]) / stdout handling (and/or change the source string passed
into pack_project) accordingly.

In `@baml_language/crates/baml_exec/src/dispatch.rs`:
- Around line 86-100: dispatch_target fails to enforce the reserved --help
parameter because it never calls validate_help_param; update dispatch_target to
call validate_help_param with the resolved func_info (e.g., pass
&func_info.param_names and target_name or the function/method signature) before
calling build_args_from_signature so any function signature that collides with
the reserved "help" param is rejected at the dispatch boundary; ensure the call
occurs after locating func_info and before building args so validate_help_param
can short-circuit invalid targets.

In `@baml_language/crates/baml_exec/src/output.rs`:
- Around line 58-68: The BexExternalValue::Instance arm in output.rs currently
unconditionally inserts "__type" into the JSON object, overwriting any user
field with that name; modify the Instance handling (where you build `map` using
`external_to_json`) to only insert the type when the map does not already
contain the key "__type" (e.g., check `map.contains_key("__type")` before
calling `map.insert("__type", ...)`) so user-provided "__type" fields are
preserved.
- Around line 126-136: The help example generator example_value currently falls
back to "..." which is not valid JSON for complex types; update the match in
example_value to return valid JSON placeholders for structured types used by
auto_cli::parse_cli_value — e.g., add arms for Ty::Class {..} => "{}", Ty::Map
{..} => "{}", Ty::List {..} => "[]", and Ty::Union(..) => "{}" (keep existing
literals for Ty::String, Ty::Int, Ty::Float, Ty::Bool, Ty::Null, Ty::Enum) so
generated CLI examples are parseable.

In `@baml_language/crates/baml_pack_host/Cargo.toml`:
- Line 27: The Tokio dependency in Cargo.toml currently only enables the
single-threaded "rt" feature but main.rs calls tokio::runtime::Runtime::new(),
which requires the multi-threaded runtime; update the tokio entry to also
include the "rt-multi-thread" feature so the runtime creation compiles (i.e.,
modify the tokio dependency features in Cargo.toml to include "rt-multi-thread"
along with "rt").

In `@baml_language/crates/bex_vm_types/src/heap_ptr.rs`:
- Around line 153-164: The Serialize implementation for HeapPtr currently always
emits unit and Deserialize always returns HeapPtr::null(), silently discarding
non-null pointers; update the serialize method (impl Serialize for
HeapPtr::serialize) to assert and guard the invariant by calling
debug_assert!(self.ptr.is_null()) and, to prevent silent data loss in release
builds, return a serialization error (via S::Error::custom) if self.ptr is not
null; keep Deserialize as-is (returning HeapPtr::null()) but document the
invariant on HeapPtr and Program packing so future changes don't violate it.

In `@baml_language/stow.toml`:
- Around line 88-89: Update the explanatory reason string for the stow rule to
reflect the current allowed_crates list: change the reason that currently says
"Only compiler_emit and baml_project crates should depend on bex_vm_types (pure
data layer)" so it mentions the newly allowed crates (`baml_cli`, `baml_exec`,
`baml_pack_host`) along with the compiler emit and `baml_project` crates; edit
the reason adjacent to allowed_crates in the stow.toml block so the text
accurately describes that these specific crates may depend on `bex_vm_types`.

---

Outside diff comments:
In `@baml_language/crates/bex_vm_types/src/indexable.rs`:
- Around line 274-279: The heap_debug feature changes ObjectIndex's wire format
(ObjectIndex struct with raw and epoch vs Index<ObjectKind> as usize), causing
incompatible serde output; fix by normalizing the serialized representation:
keep the in-memory ObjectIndex struct (raw, epoch) under #[cfg(feature =
"heap_debug")] but implement custom Serialize/Deserialize for ObjectIndex that
serializes/deserializes only the raw usize (raw) so it matches the non-debug
Index<ObjectKind> representation, or alternatively disable/gate heap_debug for
pack-producing builds so packs are always produced with the same feature set.

In `@baml_language/crates/bex_vm_types/src/types.rs`:
- Around line 548-560: Add a defensive debug assertion in the HeapPtr
serialization choke point: inside the impl Serialize for HeapPtr (the
HeapPtr::serialize method) assert that the raw pointer is null (e.g.,
debug_assert!(self.ptr.is_null(), "...")) so any accidental attempt to serialize
a live Value::Object is caught during debug runs; keep the existing unit
serialization but include a message suggesting to use
ConstValue::Object(ObjectIndex) instead of serializing live HeapPtr.

---

Nitpick comments:
In `@baml_language/crates/baml_cli/src/commands.rs`:
- Around line 1-3: The file header comment in commands.rs is outdated: it lists
the CLI subcommands as "Run, Describe, Generate, Grep, Test, Format, and
LanguageServer" but the dispatcher now also wires a Pack command; update that
top-of-file comment (and the similar comment around lines 75-76) to include
"Pack" so the documented command list matches the actual dispatcher wiring (look
for references to the dispatcher and the Pack subcommand names in this file to
ensure consistency).

In `@baml_language/crates/baml_cli/src/project_load.rs`:
- Around line 40-79: The code currently builds `errors` as references then later
does `errors.iter().copied().cloned().collect::<Vec<_>>()`, causing redundant
copying/cloning; modify `check_project_diagnostics` to produce an owned
Vec<Diagnostic> directly (e.g., replace the `errors: Vec<_> =
diagnostics.iter().filter(|d| d.severity == Severity::Error).collect();` with a
cloned/owned collection) and pass that owned Vec to
`render::render_diagnostics`; keep the existing `warnings` handling and the rest
of the logic (variables/functions to locate: `check_project_diagnostics`,
`diagnostics`, `errors`, `warnings`, and the call to
`render::render_diagnostics`).

In `@baml_language/crates/bex_vm_types/Cargo.toml`:
- Around line 30-38: Remove the duplicate dev-dependency entry for serde: open
the Cargo.toml and delete the serde = { workspace = true } line under the
[dev-dependencies] section so serde remains only as a normal dependency; ensure
no test-specific features require a separate dev-dependency entry and run cargo
check to confirm the manifest is valid (look for the serde entry and the
[dev-dependencies] header to locate the line).

In `@baml_language/crates/bex_vm_types/src/bytecode.rs`:
- Around line 840-876: Bytecode's resolved_constants field is skipped by serde
and must be reconstructed after deserialization (otherwise LoadConst will index
OOB); add a one-line doc note on the resolved_constants field (or on the
Bytecode struct) stating that deserializing Bytecode leaves resolved_constants
empty and callers must call the runtime reconstruction function (e.g.
resolve_constants) to populate it from constants before executing or loading
instructions like LoadConst.

In `@baml_language/crates/bex_vm_types/src/types.rs`:
- Around line 248-287: The Serialize impl for FunctionKind currently matches
self twice and contains an unreachable! arm; replace it by matching self only
once inside impl Serialize for FunctionKind: define the inner serde-annotated
enum FunctionKindRef<'a> { Bytecode, SysOp(&'a SysOp), NativeUnresolved } and
then match on self once to construct the appropriate FunctionKindRef (map
Self::Native(_) to FunctionKindRef::NativeUnresolved) and call serialize on
that; update the impl Serialize for FunctionKind to use this single-match
approach and remove the unreachable! path.
- Around line 775-825: Add an upfront validation pass that walks the collection
being written (e.g., the ObjectPool or whatever container feeds
Object::serialize) before any serde serialization begins to detect runtime-only
variants (RustData, Collector, and #[cfg(feature = "heap_debug")] Sentinel) and
fail with a clear pack-level error naming the offending object/index;
specifically, implement a function (e.g., validate_objects_before_pack or a
method on ObjectPool) that iterates objects and returns a Result<(), PackError>
if any invalid Object variants are found, call this validation from the
pack-writer entry point before invoking Object::serialize, and return the
pack-level error instead of allowing serde to emit a mid-stream
serde::ser::Error from Object::serialize.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ef9c6716-6ffb-4739-8f34-324d79cadd62

📥 Commits

Reviewing files that changed from the base of the PR and between ea9547a and 6f2d49e.

⛔ Files ignored due to path filters (6)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
📒 Files selected for processing (39)
  • baml_language/Cargo.toml
  • baml_language/crates/baml_base/Cargo.toml
  • baml_language/crates/baml_base/src/attr.rs
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_panics/panics.baml
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.baml
  • baml_language/crates/baml_builtins2_codegen/src/codegen.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_cli/Cargo.toml
  • baml_language/crates/baml_cli/src/commands.rs
  • baml_language/crates/baml_cli/src/lib.rs
  • baml_language/crates/baml_cli/src/main.rs
  • baml_language/crates/baml_cli/src/pack_command.rs
  • baml_language/crates/baml_cli/src/project_load.rs
  • baml_language/crates/baml_cli/src/run_command.rs
  • baml_language/crates/baml_cli/tests/common/mod.rs
  • baml_language/crates/baml_cli/tests/pack_e2e.rs
  • baml_language/crates/baml_exec/Cargo.toml
  • baml_language/crates/baml_exec/src/auto_cli.rs
  • baml_language/crates/baml_exec/src/dispatch.rs
  • baml_language/crates/baml_exec/src/envelope.rs
  • baml_language/crates/baml_exec/src/json_coerce.rs
  • baml_language/crates/baml_exec/src/lib.rs
  • baml_language/crates/baml_exec/src/output.rs
  • baml_language/crates/baml_pack_host/Cargo.toml
  • baml_language/crates/baml_pack_host/build.rs
  • baml_language/crates/baml_pack_host/src/main.rs
  • baml_language/crates/baml_type/Cargo.toml
  • baml_language/crates/baml_type/src/lib.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/bex_vm/src/errors.rs
  • baml_language/crates/bex_vm/src/package_baml/sys.rs
  • baml_language/crates/bex_vm/src/vm.rs
  • baml_language/crates/bex_vm_types/Cargo.toml
  • baml_language/crates/bex_vm_types/src/bytecode.rs
  • baml_language/crates/bex_vm_types/src/heap_ptr.rs
  • baml_language/crates/bex_vm_types/src/indexable.rs
  • baml_language/crates/bex_vm_types/src/types.rs
  • baml_language/stow.toml

Comment thread baml_language/crates/baml_cli/src/pack_command.rs
Comment on lines +293 to +313
fn write_executable(host_bytes: &[u8], data: &[u8], writer: &mut std::fs::File) -> Result<()> {
let target = std::env::consts::OS;
if target.contains("linux") {
libsui::Elf::new(host_bytes)
.append(PACK_SECTION_NAME, data, writer)
.context("Failed to write ELF binary")?;
} else if target.contains("windows") {
libsui::PortableExecutable::from(host_bytes)
.context("Failed to parse PE binary")?
.write_resource(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write PE resource")?
.build(writer)
.context("Failed to build PE binary")?;
} else {
libsui::Macho::from(host_bytes.to_vec())
.context("Failed to parse Mach-O binary")?
.write_section(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write Mach-O section")?
.build_and_sign(writer)
.context("Failed to build Mach-O binary")?;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t treat every non-Linux/non-Windows target as Mach-O.

On targets like FreeBSD/OpenBSD, this falls into the Mach-O branch and tries to write a macOS binary format. Bail explicitly for unsupported OSes instead of producing an invalid package path.

Suggested OS dispatch fix
 fn write_executable(host_bytes: &[u8], data: &[u8], writer: &mut std::fs::File) -> Result<()> {
-    let target = std::env::consts::OS;
-    if target.contains("linux") {
-        libsui::Elf::new(host_bytes)
-            .append(PACK_SECTION_NAME, data, writer)
-            .context("Failed to write ELF binary")?;
-    } else if target.contains("windows") {
-        libsui::PortableExecutable::from(host_bytes)
-            .context("Failed to parse PE binary")?
-            .write_resource(PACK_SECTION_NAME, data.to_vec())
-            .context("Failed to write PE resource")?
-            .build(writer)
-            .context("Failed to build PE binary")?;
-    } else {
-        libsui::Macho::from(host_bytes.to_vec())
-            .context("Failed to parse Mach-O binary")?
-            .write_section(PACK_SECTION_NAME, data.to_vec())
-            .context("Failed to write Mach-O section")?
-            .build_and_sign(writer)
-            .context("Failed to build Mach-O binary")?;
-    }
+    match std::env::consts::OS {
+        "linux" => {
+            libsui::Elf::new(host_bytes)
+                .append(PACK_SECTION_NAME, data, writer)
+                .context("Failed to write ELF binary")?;
+        }
+        "windows" => {
+            libsui::PortableExecutable::from(host_bytes)
+                .context("Failed to parse PE binary")?
+                .write_resource(PACK_SECTION_NAME, data.to_vec())
+                .context("Failed to write PE resource")?
+                .build(writer)
+                .context("Failed to build PE binary")?;
+        }
+        "macos" => {
+            libsui::Macho::from(host_bytes.to_vec())
+                .context("Failed to parse Mach-O binary")?
+                .write_section(PACK_SECTION_NAME, data.to_vec())
+                .context("Failed to write Mach-O section")?
+                .build_and_sign(writer)
+                .context("Failed to build Mach-O binary")?;
+        }
+        other => anyhow::bail!("`baml pack` is not supported on `{other}` yet"),
+    }
     Ok(())
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn write_executable(host_bytes: &[u8], data: &[u8], writer: &mut std::fs::File) -> Result<()> {
let target = std::env::consts::OS;
if target.contains("linux") {
libsui::Elf::new(host_bytes)
.append(PACK_SECTION_NAME, data, writer)
.context("Failed to write ELF binary")?;
} else if target.contains("windows") {
libsui::PortableExecutable::from(host_bytes)
.context("Failed to parse PE binary")?
.write_resource(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write PE resource")?
.build(writer)
.context("Failed to build PE binary")?;
} else {
libsui::Macho::from(host_bytes.to_vec())
.context("Failed to parse Mach-O binary")?
.write_section(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write Mach-O section")?
.build_and_sign(writer)
.context("Failed to build Mach-O binary")?;
}
fn write_executable(host_bytes: &[u8], data: &[u8], writer: &mut std::fs::File) -> Result<()> {
match std::env::consts::OS {
"linux" => {
libsui::Elf::new(host_bytes)
.append(PACK_SECTION_NAME, data, writer)
.context("Failed to write ELF binary")?;
}
"windows" => {
libsui::PortableExecutable::from(host_bytes)
.context("Failed to parse PE binary")?
.write_resource(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write PE resource")?
.build(writer)
.context("Failed to build PE binary")?;
}
"macos" => {
libsui::Macho::from(host_bytes.to_vec())
.context("Failed to parse Mach-O binary")?
.write_section(PACK_SECTION_NAME, data.to_vec())
.context("Failed to write Mach-O section")?
.build_and_sign(writer)
.context("Failed to build Mach-O binary")?;
}
other => anyhow::bail!("`baml pack` is not supported on `{other}` yet"),
}
Ok(())
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/pack_command.rs` around lines 293 - 313,
The write_executable function currently treats any non-Linux/non-Windows OS as
Mach-O; change the OS dispatch to explicitly check for macOS (e.g., target ==
"macos" or target.contains("darwin")) and only run the libsui::Macho code when
macOS is detected, and for all other targets return an explicit error (use
anyhow::bail or return Err with context) indicating the OS is unsupported for
embedding PACK_SECTION_NAME; keep the existing paths for libsui::Elf (Linux) and
libsui::PortableExecutable (Windows) and ensure the error message names
write_executable and the unsupported target string for easier debugging.

Comment on lines +57 to +73
pub fn ensure_built() -> &'static BuiltPaths {
BUILT.get_or_init(|| {
let cargo = std::env::var("CARGO").unwrap_or_else(|_| "cargo".to_string());
let status = Command::new(&cargo)
.args(["build", "-p", "baml_cli", "-p", "baml_pack_host"])
.status()
.expect("spawn cargo build");
assert!(
status.success(),
"cargo build for baml_cli + baml_pack_host failed — see output above",
);

let target = target_dir();
let profile = profile();
let bin_dir = target.join(&profile);
let baml_cli = bin_dir.join(bin_name("baml-cli"));
let baml_pack_host = bin_dir.join(bin_name("baml-pack-host"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Build the same profile that the harness looks up.

When tests run with cargo test --release, profile() returns release, but the nested command still runs plain cargo build, producing debug binaries. The later lookup in target/release can fail or accidentally use stale artifacts.

🧪 Proposed fix
 pub fn ensure_built() -> &'static BuiltPaths {
     BUILT.get_or_init(|| {
         let cargo = std::env::var("CARGO").unwrap_or_else(|_| "cargo".to_string());
-        let status = Command::new(&cargo)
-            .args(["build", "-p", "baml_cli", "-p", "baml_pack_host"])
-            .status()
+        let profile = profile();
+        let mut cmd = Command::new(&cargo);
+        cmd.args(["build", "-p", "baml_cli", "-p", "baml_pack_host"]);
+        if profile == "release" {
+            cmd.arg("--release");
+        }
+        let status = cmd.status()
             .expect("spawn cargo build");
         assert!(
             status.success(),
             "cargo build for baml_cli + baml_pack_host failed — see output above",
         );
 
         let target = target_dir();
-        let profile = profile();
         let bin_dir = target.join(&profile);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/tests/common/mod.rs` around lines 57 - 73, The
build step in ensure_built() always runs plain `cargo build` so it can produce
debug binaries while profile() may return "release"; modify the Command
construction in ensure_built() so it builds the same profile that profile()
returns — e.g., when profile() == "release" add the "--release" flag (or
otherwise pass the appropriate cargo flag for the requested profile) to the args
for Command::new(&cargo). Ensure BUILT.get_or_init closure uses the adjusted
args so baml_cli and baml_pack_host are built into the same target/<profile>
directory you later look up.

Comment on lines +167 to +184
fn pack_e2e_output_format_debug_baked_in() {
let built = common::ensure_built();
let (_tmp, bin) = pack_project(
built,
"function main() -> string { \"hello\" }\n",
&["--output-format", "debug"],
);
let out = run(&bin, &[]);
assert!(out.status.success());
let stdout = String::from_utf8_lossy(&out.stdout);
// Debug format writes the string; a JSON-default build would also
// write a string — but in debug mode it's NOT wrapped in the JSON
// pretty-printer, so there's no extra whitespace / quoting artifacts.
// Easiest distinguishing check: debug output is one line ending
// with the string; JSON pretty-print output would be the same for
// a simple string, so we verify the content is there and trust the
// envelope roundtrip test below for separation.
assert!(stdout.contains("hello"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the debug-format assertion distinguish it from JSON.

This currently passes for both debug and JSON output because both contain hello, so the test would not catch the host ignoring the baked-in output_format.

🧪 Suggested direction
-    assert!(stdout.contains("hello"));
+    assert!(
+        serde_json::from_str::<serde_json::Value>(stdout.trim()).is_err(),
+        "debug output should not be valid JSON; got: {stdout}"
+    );

If debug output for this fixture is intentionally JSON-compatible, use a fixture with known divergent debug/JSON rendering and assert the exact output.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/tests/pack_e2e.rs` around lines 167 - 184, The
test pack_e2e_output_format_debug_baked_in currently only asserts
stdout.contains("hello"), which passes for both debug and JSON outputs; update
the assertion to distinguish debug formatting by asserting exact expected debug
output (for example assert_eq!(stdout, "hello\n") or
assert_eq!(stdout.trim_end(), "hello") and assert!(!stdout.starts_with('\"'))),
or else change the fixture passed to pack_project to a value that renders
differently in debug vs JSON and assert the exact debug rendering; locate the
pack_e2e_output_format_debug_baked_in function and adjust the assertions around
the run(&bin, &[]) / stdout handling (and/or change the source string passed
into pack_project) accordingly.

Comment on lines +86 to +100
let func_info = engine
.user_functions()
.into_iter()
.find(|f| {
f.qualified_name == target_name
|| f.display_name == target_name.strip_prefix("user.").unwrap_or(target_name)
})
.ok_or_else(|| anyhow!("Function `{target_name}` not found"))?;

let args = build_args_from_signature(
cli_tokens,
json_args.as_ref(),
&func_info.param_names,
&func_info.param_types,
)?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Enforce the reserved help parameter at the dispatch boundary.

validate_help_param is defined here, but dispatch_target does not call it. Current in-tree callers validate separately, but this public shared dispatcher can still execute a target whose signature collides with --help.

Suggested dispatch-boundary guard
     let func_info = engine
         .user_functions()
         .into_iter()
         .find(|f| {
             f.qualified_name == target_name
                 || f.display_name == target_name.strip_prefix("user.").unwrap_or(target_name)
         })
         .ok_or_else(|| anyhow!("Function `{target_name}` not found"))?;
 
+    validate_help_param(engine.as_ref(), &func_info.qualified_name)?;
+
     let args = build_args_from_signature(
         cli_tokens,
         json_args.as_ref(),
         &func_info.param_names,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let func_info = engine
.user_functions()
.into_iter()
.find(|f| {
f.qualified_name == target_name
|| f.display_name == target_name.strip_prefix("user.").unwrap_or(target_name)
})
.ok_or_else(|| anyhow!("Function `{target_name}` not found"))?;
let args = build_args_from_signature(
cli_tokens,
json_args.as_ref(),
&func_info.param_names,
&func_info.param_types,
)?;
let func_info = engine
.user_functions()
.into_iter()
.find(|f| {
f.qualified_name == target_name
|| f.display_name == target_name.strip_prefix("user.").unwrap_or(target_name)
})
.ok_or_else(|| anyhow!("Function `{target_name}` not found"))?;
validate_help_param(engine.as_ref(), &func_info.qualified_name)?;
let args = build_args_from_signature(
cli_tokens,
json_args.as_ref(),
&func_info.param_names,
&func_info.param_types,
)?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 86 - 100,
dispatch_target fails to enforce the reserved --help parameter because it never
calls validate_help_param; update dispatch_target to call validate_help_param
with the resolved func_info (e.g., pass &func_info.param_names and target_name
or the function/method signature) before calling build_args_from_signature so
any function signature that collides with the reserved "help" param is rejected
at the dispatch boundary; ensure the call occurs after locating func_info and
before building args so validate_help_param can short-circuit invalid targets.

Comment on lines +58 to +68
BexExternalValue::Instance {
class_name, fields, ..
} => {
let mut map: serde_json::Map<String, serde_json::Value> = fields
.iter()
.map(|(k, v)| (k.clone(), external_to_json(v)))
.collect();
if !class_name.is_empty() {
map.insert("__type".to_string(), serde_json::json!(class_name));
}
serde_json::Value::Object(map)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve user fields named __type.

For class instances, this overwrites a real field named __type, so JSON output can silently drop user data.

Minimal data-preserving fix
             if !class_name.is_empty() {
-                map.insert("__type".to_string(), serde_json::json!(class_name));
+                // Avoid overwriting a user field named `__type`.
+                if !map.contains_key("__type") {
+                    map.insert("__type".to_string(), serde_json::json!(class_name));
+                }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/output.rs` around lines 58 - 68, The
BexExternalValue::Instance arm in output.rs currently unconditionally inserts
"__type" into the JSON object, overwriting any user field with that name; modify
the Instance handling (where you build `map` using `external_to_json`) to only
insert the type when the map does not already contain the key "__type" (e.g.,
check `map.contains_key("__type")` before calling `map.insert("__type", ...)`)
so user-provided "__type" fields are preserved.

Comment on lines +126 to +136
/// Generate a placeholder example value for a type (used in `--help` output).
pub fn example_value(ty: &Ty) -> &'static str {
match ty {
Ty::String { .. } => "\"value\"",
Ty::Int { .. } => "42",
Ty::Float { .. } => "3.14",
Ty::Bool { .. } => "true",
Ty::Null { .. } => "null",
Ty::Enum(..) => "VariantName",
_ => "...",
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use valid JSON placeholders for complex CLI examples.

auto_cli::parse_cli_value requires JSON for class/map/list/union parameters, but example_value falls back to ..., so generated help prints examples users cannot run.

Suggested placeholder fix
 pub fn example_value(ty: &Ty) -> &'static str {
     match ty {
         Ty::String { .. } => "\"value\"",
         Ty::Int { .. } => "42",
         Ty::Float { .. } => "3.14",
         Ty::Bool { .. } => "true",
         Ty::Null { .. } => "null",
         Ty::Enum(..) => "VariantName",
+        Ty::Class(..) | Ty::Map { .. } => "{}",
+        Ty::List(..) => "[]",
+        Ty::Union(..) => "null",
         _ => "...",
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Generate a placeholder example value for a type (used in `--help` output).
pub fn example_value(ty: &Ty) -> &'static str {
match ty {
Ty::String { .. } => "\"value\"",
Ty::Int { .. } => "42",
Ty::Float { .. } => "3.14",
Ty::Bool { .. } => "true",
Ty::Null { .. } => "null",
Ty::Enum(..) => "VariantName",
_ => "...",
}
/// Generate a placeholder example value for a type (used in `--help` output).
pub fn example_value(ty: &Ty) -> &'static str {
match ty {
Ty::String { .. } => "\"value\"",
Ty::Int { .. } => "42",
Ty::Float { .. } => "3.14",
Ty::Bool { .. } => "true",
Ty::Null { .. } => "null",
Ty::Enum(..) => "VariantName",
Ty::Class(..) | Ty::Map { .. } => "{}",
Ty::List(..) => "[]",
Ty::Union(..) => "null",
_ => "...",
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/output.rs` around lines 126 - 136, The
help example generator example_value currently falls back to "..." which is not
valid JSON for complex types; update the match in example_value to return valid
JSON placeholders for structured types used by auto_cli::parse_cli_value — e.g.,
add arms for Ty::Class {..} => "{}", Ty::Map {..} => "{}", Ty::List {..} =>
"[]", and Ty::Union(..) => "{}" (keep existing literals for Ty::String, Ty::Int,
Ty::Float, Ty::Bool, Ty::Null, Ty::Enum) so generated CLI examples are
parseable.

bitcode = { workspace = true }
libsui = { workspace = true }
serde = { workspace = true }
tokio = { workspace = true, features = [ "rt" ] }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Tokio feature declarations:"
rg -n 'tokio\s*=.*features' --iglob 'Cargo.toml' -C 2

echo
echo "Runtime::new call sites:"
rg -n 'Runtime::new\s*\(' --type=rust -C 2

Repository: BoundaryML/baml

Length of output: 49914


🏁 Script executed:

cat -n baml_language/Cargo.toml | grep -A 30 "\[workspace\]"

Repository: BoundaryML/baml

Length of output: 1386


🏁 Script executed:

grep -n "tokio" baml_language/Cargo.toml | head -20

Repository: BoundaryML/baml

Length of output: 175


🏁 Script executed:

# Check if baml_pack_host has any other dependencies or features that might inherit tokio features
cat -n baml_language/crates/baml_pack_host/Cargo.toml

Repository: BoundaryML/baml

Length of output: 1199


🏁 Script executed:

# Check which crates baml_pack_host depends on and their tokio features
echo "Direct dependencies of baml_pack_host:"
cat baml_language/crates/baml_pack_host/Cargo.toml | grep -A 30 "\[dependencies\]" | grep -E "^\w"

echo -e "\n\nChecking if any direct dependency enables rt-multi-thread:"
for crate in baml_exec bex_engine bex_events bex_vm_types sys_native; do
  echo "=== $crate ==="
  grep -n "tokio.*features" "baml_language/crates/${crate}/Cargo.toml" 2>/dev/null || echo "No tokio with features"
done

Repository: BoundaryML/baml

Length of output: 1073


🏁 Script executed:

# Verify Tokio 1.36 requires rt-multi-thread for Runtime::new()
# Search for any documentation or comments about tokio features in the codebase
rg "rt-multi-thread" -B 2 -A 2 --type=toml | head -50

Repository: BoundaryML/baml

Length of output: 3323


Add the rt-multi-thread feature to Tokio.

baml_language/crates/baml_pack_host/src/main.rs line 116 calls tokio::runtime::Runtime::new(), which requires Tokio's multi-threaded runtime feature. The current declaration includes only rt, which provides single-threaded runtime support. This will fail to compile when built independently.

🔧 Proposed fix
-tokio = { workspace = true, features = [ "rt" ] }
+tokio = { workspace = true, features = [ "rt", "rt-multi-thread" ] }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tokio = { workspace = true, features = [ "rt" ] }
tokio = { workspace = true, features = [ "rt", "rt-multi-thread" ] }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_pack_host/Cargo.toml` at line 27, The Tokio
dependency in Cargo.toml currently only enables the single-threaded "rt" feature
but main.rs calls tokio::runtime::Runtime::new(), which requires the
multi-threaded runtime; update the tokio entry to also include the
"rt-multi-thread" feature so the runtime creation compiles (i.e., modify the
tokio dependency features in Cargo.toml to include "rt-multi-thread" along with
"rt").

Comment on lines +153 to +164
impl Serialize for HeapPtr {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_unit()
}
}

impl<'de> Deserialize<'de> for HeapPtr {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
<()>::deserialize(deserializer)?;
Ok(HeapPtr::null())
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify that no compile-time (pre-resolve) code path constructs Object::Instance,
# Object::Variant, or Object::Closure with a non-null HeapPtr that could end up
# inside a packed Program.
rg -nP --type=rust -C3 '\b(Instance|Variant|Closure)\s*\{\s*(class|enm|function)\s*:' \
  -g '!**/heap*.rs' -g '!**/vm.rs' -g '!**/target/**'

# Also check what ends up inside PackEnvelope / Program before ser.
ast-grep --pattern $'pub struct PackEnvelope { $$$ }'

Repository: BoundaryML/baml

Length of output: 1550


🏁 Script executed:

# First, examine the HeapPtr structure
cat -n baml_language/crates/bex_vm_types/src/heap_ptr.rs | head -60

Repository: BoundaryML/baml

Length of output: 2690


🏁 Script executed:

# Find where Instance, Variant, Closure are defined
fd -t f '\.rs$' | xargs rg -l 'struct Instance|struct Variant|struct Closure' | head -10

Repository: BoundaryML/baml

Length of output: 670


🏁 Script executed:

# Check what fields Instance, Variant, Closure have
rg -A5 'pub struct Instance' baml_language/crates/bex_vm_types/src/ 2>/dev/null || \
rg -A5 'pub struct Instance' baml_language/crates/ --type=rust | head -20

Repository: BoundaryML/baml

Length of output: 521


🏁 Script executed:

# Check the serialization flow - when PackEnvelope gets serialized
rg -n 'serialize.*Program\|Program.*serialize' baml_language/crates/ --type=rust | head -10

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Look for any compile-time object construction that might embed HeapPtr
rg -n 'HeapPtr::' baml_language/crates/ --type=rust | head -20

Repository: BoundaryML/baml

Length of output: 1224


🏁 Script executed:

# Find Variant and Closure definitions
rg -A3 'pub struct Variant' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 150


🏁 Script executed:

# Find Closure definition
rg -A3 'pub struct Closure' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 239


🏁 Script executed:

# Check the Program structure
rg -A10 'pub struct Program' baml_language/crates/ --type=rust | head -40

Repository: BoundaryML/baml

Length of output: 961


🏁 Script executed:

# Look for where Program is constructed at pack time
rg -n 'Program\s*\{' baml_language/crates/baml_exec/src/ --type=rust | head -15

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if Instance/Variant/Closure have Serialize/Deserialize derives
rg -B2 'pub struct Instance|pub struct Variant|pub struct Closure' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 429


🏁 Script executed:

# Check what ConstValue is
rg -A15 'pub enum ConstValue|pub struct ConstValue' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 495


🏁 Script executed:

# Check ObjectPool definition
rg -A10 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if ConstValue ever wraps Instance/Variant/Closure
rg -n 'ConstValue' baml_language/crates/bex_vm_types/src/types.rs | head -30

Repository: BoundaryML/baml

Length of output: 837


🏁 Script executed:

# Check how Program is built at pack time - look for Object creation
rg -n 'Object::Instance\|Object::Variant\|Object::Closure' baml_language/crates/bex_compiler/ --type=rust | head -15

Repository: BoundaryML/baml

Length of output: 133


🏁 Script executed:

# Check if Instance/Variant/Closure appear as enum variants or in different context
rg -n 'enum Value|pub enum Object' baml_language/crates/bex_vm_types/src/types.rs -A20 | head -60

Repository: BoundaryML/baml

Length of output: 1681


🏁 Script executed:

# Look at how Instance/Variant/Closure are actually constructed
rg -n 'Instance\s*\{|Variant\s*\{|Closure\s*\{' baml_language/crates/bex_vm_types/src/ --type=rust -A2 | head -40

Repository: BoundaryML/baml

Length of output: 1703


🏁 Script executed:

# Check if ObjectPool stores Objects and whether it's serialized
rg -A5 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if Instance/Variant/Closure are ever created at compile time (non-runtime)
fd -t f '\.rs$' baml_language/crates/ | xargs grep -l 'Instance {' | head -5

Repository: BoundaryML/baml

Length of output: 354


🏁 Script executed:

# Look specifically at Object enum serialization
rg -A30 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs | grep -A30 'Serialize'

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Verify whether ObjectPool itself is Serialize
rg -B3 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Find ObjectPool definition and how it's constructed
rg -B5 -A15 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if Object enum itself has Serialize/Deserialize
rg -B2 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 240


🏁 Script executed:

# Look for where Objects containing Instance/Variant/Closure are created at compile time
rg -n 'Object::Instance|Object::Variant|Object::Closure' baml_language/crates/ --type=rust | grep -v test | head -20

Repository: BoundaryML/baml

Length of output: 2219


🏁 Script executed:

# Check how ObjectPool gets populated at compile/pack time
rg -n 'add_object\|ObjectPool::new\|objects\.push' baml_language/crates/ --type=rust | head -20

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Directly check if Object enum derives Serialize
rg -A1 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs | head -5

Repository: BoundaryML/baml

Length of output: 194


🏁 Script executed:

# Get the full ObjectPool definition
rg -A20 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Confirm Object does NOT derive Serialize
rg -B5 -A50 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs | head -60

Repository: BoundaryML/baml

Length of output: 1708


🏁 Script executed:

# Check if Program and ObjectPool derive Serialize
rg -B2 'pub struct Program' baml_language/crates/bex_vm_types/src/types.rs | head -10

Repository: BoundaryML/baml

Length of output: 222


🏁 Script executed:

# Check what fields Program serializes - look for #[serde] attributes
rg -B3 -A20 'pub struct Program' baml_language/crates/bex_vm_types/src/types.rs | head -40

Repository: BoundaryML/baml

Length of output: 1220


🏁 Script executed:

# Most importantly: trace through the serialization flow
# What calls serialize on Program/PackEnvelope?
rg -n 'to_bytes\|serialize' baml_language/crates/baml_exec/src/ --type=rust | head -20

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Get full ObjectPool definition with derives
rg -B5 -A30 'pub struct ObjectPool' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if ObjectPool has custom serialization
rg -n 'impl Serialize for ObjectPool\|impl Deserialize for ObjectPool' baml_language/crates/bex_vm_types/src/ --type=rust

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check what's in PackEnvelope and whether it serializes Program.objects
rg -A10 'pub struct PackEnvelope' baml_language/crates/baml_exec/src/envelope.rs

Repository: BoundaryML/baml

Length of output: 512


🏁 Script executed:

# Look at what gets serialized in the packing process
rg -n 'serde_json\|bincode\|to_bytes' baml_language/crates/baml_exec/src/ --type=rust -A2 | head -30

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Search more carefully for ObjectPool - it might be a type alias
rg -n 'ObjectPool' baml_language/crates/bex_vm_types/src/types.rs | head -20

Repository: BoundaryML/baml

Length of output: 167


🏁 Script executed:

# Check the full types.rs file structure around ObjectPool
cat -n baml_language/crates/bex_vm_types/src/types.rs | grep -A15 -B5 'ObjectPool'

Repository: BoundaryML/baml

Length of output: 2101


🏁 Script executed:

# Look for where Instance/Variant/Closure could appear in globals or serialized structures
rg -n 'Instance\|Variant\|Closure' baml_language/crates/bex_vm_types/src/types.rs | grep -i 'global\|const\|pack'

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if there's a custom serialization for Program that might include live objects
rg -B5 -A30 'impl.*Serialize.*Program\|impl.*Deserialize.*Program' baml_language/crates/bex_vm_types/src/ --type=rust

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Most directly: Can Instance/Variant/Closure ever contain non-null HeapPtr at pack time?
# Check what can be stored in Program at compile time
rg -n 'ConstValue' baml_language/crates/bex_vm_types/src/types.rs -A5 | grep -A5 'enum ConstValue'

Repository: BoundaryML/baml

Length of output: 234


🏁 Script executed:

# ObjectPool is imported from indexable - find its definition
fd -t f indexable.rs | head -5

Repository: BoundaryML/baml

Length of output: 186


🏁 Script executed:

# Search for ObjectPool definition
rg -n 'pub.*ObjectPool\|type ObjectPool' baml_language/crates/bex_vm_types/src/ --type=rust

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check the indexable module
find baml_language/crates/bex_vm_types/src -name '*indexable*'

Repository: BoundaryML/baml

Length of output: 109


🏁 Script executed:

# See if ObjectPool derives Serialize
rg -B3 'ObjectPool' baml_language/crates/bex_vm_types/src/lib.rs

Repository: BoundaryML/baml

Length of output: 201


🏁 Script executed:

# Check what's imported and exported
cat baml_language/crates/bex_vm_types/src/lib.rs | head -40

Repository: BoundaryML/baml

Length of output: 1256


🏁 Script executed:

# Check ObjectPool in indexable.rs
rg -A20 'pub.*ObjectPool' baml_language/crates/bex_vm_types/src/indexable.rs

Repository: BoundaryML/baml

Length of output: 590


🏁 Script executed:

# Check if it's a struct or type alias
rg -n 'ObjectPool' baml_language/crates/bex_vm_types/src/indexable.rs | head -10

Repository: BoundaryML/baml

Length of output: 110


🏁 Script executed:

# Get the full indexable.rs to understand ObjectPool
cat -n baml_language/crates/bex_vm_types/src/indexable.rs | head -100

Repository: BoundaryML/baml

Length of output: 3706


🏁 Script executed:

# Critical check: Does Object enum derive Serialize?
rg -B3 -A60 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs | head -70

Repository: BoundaryML/baml

Length of output: 2048


🏁 Script executed:

# Check if Pool struct derives Serialize
rg -B5 -A30 'pub struct Pool' baml_language/crates/bex_vm_types/src/indexable.rs

Repository: BoundaryML/baml

Length of output: 996


🏁 Script executed:

# Most importantly: when Program is serialized, what happens to objects field?
# Check if there's a custom serialize skip
rg -B2 -A15 'pub struct Program' baml_language/crates/bex_vm_types/src/types.rs | grep -A15 'Serialize\|objects'

Repository: BoundaryML/baml

Length of output: 778


🏁 Script executed:

# Critical: Check if objects field has #[serde] attributes
rg -B5 -A20 'pub struct Program' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 1258


🏁 Script executed:

# Verify Object enum does NOT derive Serialize
rg -B10 'pub enum Object' baml_language/crates/bex_vm_types/src/types.rs | head -20

Repository: BoundaryML/baml

Length of output: 736


🏁 Script executed:

# Check the complete Serialize derive for Program
cat -n baml_language/crates/bex_vm_types/src/types.rs | sed -n '29,70p'

Repository: BoundaryML/baml

Length of output: 2179


🏁 Script executed:

# Critical: check if Object has a custom Serialize impl
rg -n 'impl.*Serialize.*Object\|impl Serialize for Object' baml_language/crates/bex_vm_types/src/ --type=rust

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check if objects field has #[serde(skip)] or similar
rg -B2 -A2 'pub objects:' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 252


🏁 Script executed:

# Verify by looking at actual serialization in baml_exec
cat baml_language/crates/baml_exec/src/envelope.rs | head -60

Repository: BoundaryML/baml

Length of output: 1416


🏁 Script executed:

# CRITICAL: If Object doesn't derive Serialize, how does Pool<Object, K> compile?
# Does the crate even compile?
cd baml_language && cargo check --lib 2>&1 | grep -i 'serialize\|object' | head -20

Repository: BoundaryML/baml

Length of output: 145


🏁 Script executed:

# Check if there's a test or build to validate Object serialization
rg -n 'Object' baml_language/crates/bex_vm_types/src/types.rs | grep -i 'serialize'

Repository: BoundaryML/baml

Length of output: 199


🏁 Script executed:

# Most directly: try to find if Object ever gets serialized in practice
rg -n 'serialize.*Object\|Object.*serialize' baml_language/crates/ --type=rust | grep -v test | head -20

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check for custom serde impl on Object
rg -B5 -A30 '#\[derive.*Serialize' baml_language/crates/bex_vm_types/src/types.rs | grep -B5 -A10 'Object'

Repository: BoundaryML/baml

Length of output: 6499


🏁 Script executed:

# Now that we found custom Object Serialize, check if Instance/Variant/Closure serialize properly
# Look at the complete ObjectSerde implementation
rg -A50 'enum ObjectSerde' baml_language/crates/bex_vm_types/src/types.rs | head -80

Repository: BoundaryML/baml

Length of output: 2073


🏁 Script executed:

# Check the deserialize impl for Object
rg -A40 'impl.*Deserialize.*Object' baml_language/crates/bex_vm_types/src/types.rs

Repository: BoundaryML/baml

Length of output: 2124


🏁 Script executed:

# Verify: are Instance, Variant, Closure actually used in compile-time Program.objects?
# Check if they ever appear outside of runtime heap
rg -n 'Object::Instance\|Object::Variant\|Object::Closure' baml_language/crates/baml_exec/src/ --type=rust

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Final key check: are Instance/Variant/Closure actually ever created during compilation?
# Search the entire baml_exec crate for where Objects are constructed
rg -n 'Object::' baml_language/crates/baml_exec/src/ --type=rust | head -30

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check how Program is built - look for add_object calls
rg -n 'add_object\|objects\.push' baml_language/crates/ --type=rust -l | head -10

Repository: BoundaryML/baml

Length of output: 41


🏁 Script executed:

# Check the baml_compiler crate structure
fd -t d 'compiler' baml_language/crates | head -5

Repository: BoundaryML/baml

Length of output: 265


🏁 Script executed:

# Look for where Instance/Variant/Closure objects might be constructed at compile time
fd -t f '\.rs$' -path '*/baml_compiler*' | head -5

Repository: BoundaryML/baml

Length of output: 227


Silent data loss on non-null HeapPtr serialization.

Serialize unconditionally emits () and Deserialize always produces null, regardless of the actual pointer value. This relies on an implicit invariant that no non-null HeapPtr ever reaches the serializer (i.e., Instance, Variant, Closure, and Value::Object variants never appear in a packed Program with live pointers).

In practice, at compile time, object references use ObjectIndex via ConstValue, which are converted to HeapPtr at load time. However, if a future change (e.g., a compile-time optimization) ends up storing a pre-resolved HeapPtr into an Instance, Variant, or Closure that gets packed into Program.objects, serialization would silently discard the pointer and produce a dangling null on load, with no diagnostic.

Consider adding a debug_assert!(self.ptr.is_null()) in serialize to make the invariant self-enforcing, or document it prominently.

🛡️ Proposed defensive check
 impl Serialize for HeapPtr {
     fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
+        debug_assert!(
+            self.ptr.is_null(),
+            "HeapPtr serialization expects null pointers only; \
+             non-null pointers should be represented via ObjectIndex/ConstValue at pack time",
+        );
         serializer.serialize_unit()
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl Serialize for HeapPtr {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_unit()
}
}
impl<'de> Deserialize<'de> for HeapPtr {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
<()>::deserialize(deserializer)?;
Ok(HeapPtr::null())
}
}
impl Serialize for HeapPtr {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
debug_assert!(
self.ptr.is_null(),
"HeapPtr serialization expects null pointers only; \
non-null pointers should be represented via ObjectIndex/ConstValue at pack time",
);
serializer.serialize_unit()
}
}
impl<'de> Deserialize<'de> for HeapPtr {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
<()>::deserialize(deserializer)?;
Ok(HeapPtr::null())
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/heap_ptr.rs` around lines 153 - 164,
The Serialize implementation for HeapPtr currently always emits unit and
Deserialize always returns HeapPtr::null(), silently discarding non-null
pointers; update the serialize method (impl Serialize for HeapPtr::serialize) to
assert and guard the invariant by calling debug_assert!(self.ptr.is_null()) and,
to prevent silent data loss in release builds, return a serialization error (via
S::Error::custom) if self.ptr is not null; keep Deserialize as-is (returning
HeapPtr::null()) but document the invariant on HeapPtr and Program packing so
future changes don't violate it.

Comment thread baml_language/stow.toml
Comment on lines +88 to 89
allowed_crates = ["baml_compiler_emit", "baml_compiler2_emit", "baml_project", "baml_cli", "baml_exec", "baml_pack_host"]
reason = "Only compiler_emit and baml_project crates should depend on bex_vm_types (pure data layer)."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the stow failure reason to match the new allow-list.

Line 88 now permits baml_cli, baml_exec, and baml_pack_host, but Line 89 still says only compiler emit and baml_project crates may depend on bex_vm_types.

Suggested wording update
- reason = "Only compiler_emit and baml_project crates should depend on bex_vm_types (pure data layer)."
+ reason = "Only compiler emit, project, and surface execution crates should depend on bex_vm_types (pure data layer)."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
allowed_crates = ["baml_compiler_emit", "baml_compiler2_emit", "baml_project", "baml_cli", "baml_exec", "baml_pack_host"]
reason = "Only compiler_emit and baml_project crates should depend on bex_vm_types (pure data layer)."
allowed_crates = ["baml_compiler_emit", "baml_compiler2_emit", "baml_project", "baml_cli", "baml_exec", "baml_pack_host"]
reason = "Only compiler emit, project, and surface execution crates should depend on bex_vm_types (pure data layer)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/stow.toml` around lines 88 - 89, Update the explanatory reason
string for the stow rule to reflect the current allowed_crates list: change the
reason that currently says "Only compiler_emit and baml_project crates should
depend on bex_vm_types (pure data layer)" so it mentions the newly allowed
crates (`baml_cli`, `baml_exec`, `baml_pack_host`) along with the compiler emit
and `baml_project` crates; edit the reason adjacent to allowed_crates in the
stow.toml block so the text accurately describes that these specific crates may
depend on `bex_vm_types`.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
baml_language/crates/baml_cli/src/run_command.rs (1)

208-214: Minor: event_sink is unconditionally Some.

event_sink is built as Some(...) in both branches of the if let, so every subsequent if let Some(sink) = &event_sink (here and in run_expression) always matches. Not a bug, but the Option wrapper no longer carries information — consider dropping it to Arc<dyn EventSink> or keeping the Option only if there's a planned "no sink" path.

Also applies to: 303-305

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/run_command.rs` around lines 208 - 214, The
variable event_sink is always constructed as Some(...) so the Option wrapper is
redundant; change its type from Option<Arc<dyn bex_events::EventSink>> to
Arc<dyn bex_events::EventSink> (or, if you intend to support a "no sink" path
later, add that path and keep the Option). Update the initialization to assign
the Arc returned by bex_events_native::start(...) or
bex_events_native::start_stderr() directly and then remove/adjust subsequent
pattern matches like if let Some(sink) = &event_sink in run_expression and
elsewhere to use the concrete Arc (or handle the None case if you choose to keep
Option).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@baml_language/crates/baml_cli/src/run_command.rs`:
- Around line 208-214: The variable event_sink is always constructed as
Some(...) so the Option wrapper is redundant; change its type from
Option<Arc<dyn bex_events::EventSink>> to Arc<dyn bex_events::EventSink> (or, if
you intend to support a "no sink" path later, add that path and keep the
Option). Update the initialization to assign the Arc returned by
bex_events_native::start(...) or bex_events_native::start_stderr() directly and
then remove/adjust subsequent pattern matches like if let Some(sink) =
&event_sink in run_expression and elsewhere to use the concrete Arc (or handle
the None case if you choose to keep Option).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bcd8fe28-d3ac-402a-a7d8-915ab8196222

📥 Commits

Reviewing files that changed from the base of the PR and between 6f2d49e and 8d47d56.

⛔ Files ignored due to path filters (5)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
📒 Files selected for processing (4)
  • baml_language/Cargo.toml
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_cli/src/run_command.rs
✅ Files skipped from review due to trivial changes (1)
  • baml_language/crates/baml_base/src/core_types.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • baml_language/Cargo.toml

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 23, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 15 untouched benchmarks
⏩ 105 skipped benchmarks1


Comparing avery/pkg (c3d19af) with canary (beead95)

Open in CodSpeed

Footnotes

  1. 105 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 6.0 MB 5.7 MB +361.1 KB (+6.4%) OK
bridge_cffi-stripped Linux 6.0 MB 5.7 MB +329.1 KB (+5.8%) OK
bridge_cffi macOS 5.0 MB 4.6 MB +342.0 KB (+7.4%) OK
bridge_cffi-stripped macOS 4.9 MB 4.7 MB +272.4 KB (+5.8%) OK
bridge_cffi Windows 5.0 MB 4.6 MB +341.9 KB (+7.4%) OK
bridge_cffi-stripped Windows 4.9 MB 4.7 MB +284.4 KB (+6.1%) OK
bridge_wasm WASM 3.3 MB 3.2 MB +40.2 KB (+1.3%) OK

Generated by cargo size-gate · workflow run

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
baml_language/crates/bex_vm_types/src/types.rs (1)

565-576: ⚠️ Potential issue | 🟠 Major

Don't serialize HeapPtr through Value.

Value::Object(HeapPtr) now round-trips raw heap addresses. Any serialized Instance/Closure/Future/collection that contains object refs will deserialize with dangling pointers, so a packaged artifact can silently corrupt runtime state instead of failing fast. Please switch serde to an index-based representation here (like ConstValue) or reject Object values during serialization.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/types.rs` around lines 565 - 576, The
Value enum currently serializes raw pointers via Value::Object(HeapPtr) which
yields dangling addresses on deserialize; update the serde behavior for Value
(in types.rs) so Object variants are not serialized as raw pointers: implement
custom Serialize/Deserialize for Value that either (a) serializes
Object(HeapPtr) as an index-based handle (consult the existing ConstValue/index
mapping pattern in the crate to translate HeapPtr -> index and back during
(de)serialization), or (b) refuses to serialize Object by returning a
serialization error for that variant; apply this change to the Value enum (and
update any helpers that construct/consume Value) so serialized artifacts never
contain raw heap addresses.
baml_language/crates/baml_cli/src/lib.rs (1)

43-65: ⚠️ Potential issue | 🟠 Major

Validate ExitCode::Exit to a portable process-exit range.

std::process::exit(i32) has platform-specific behavior: on Unix, only the least significant 8 bits are visible to the parent process (via WEXITSTATUS), while on Windows the full 32-bit value is preserved. Forwarding baml.sys.exit(n) without validation means a user-controlled exit code like 300 exits with code 44 on Unix but 300 on Windows—a silent cross-platform inconsistency. Please reject or clamp out-of-range values (typically 0–255) before constructing this variant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/lib.rs` around lines 43 - 65, The
From<ExitCode> for i32 impl must normalize user-supplied Exit(n) to a portable
range; update the Exit(…) arm in the match to validate and clamp the integer
into the 0–255 range (handling negative values too) before returning it to
satisfy Unix/WIN differences, e.g., coerce the incoming i32 to a signed wide
type if needed, clamp to 0..=255, then cast to i32; keep all other arms
unchanged and ensure the logic lives in the impl From<ExitCode> for i32 where
ExitCode::Exit is matched.
♻️ Duplicate comments (3)
baml_language/crates/baml_pack_host/Cargo.toml (1)

27-27: ⚠️ Potential issue | 🟠 Major

Declare rt-multi-thread explicitly for Runtime::new().

baml_language/crates/baml_pack_host/src/main.rs constructs tokio::runtime::Runtime::new(), but this manifest only enables rt. That makes this crate depend on some other dependency incidentally pulling in rt-multi-thread, which is fragile and can break standalone builds.

🔧 Suggested fix
-tokio = { workspace = true, features = [ "rt" ] }
+tokio = { workspace = true, features = [ "rt", "rt-multi-thread" ] }
#!/bin/bash
set -euo pipefail

echo "Tokio features declared by baml_pack_host:"
sed -n '17,30p' baml_language/crates/baml_pack_host/Cargo.toml

echo
echo "Runtime::new() call site:"
sed -n '112,120p' baml_language/crates/baml_pack_host/src/main.rs

echo
echo "Search direct workspace deps for rt-multi-thread:"
for crate in baml_exec bex_engine bex_events bex_vm_types sys_native; do
  manifest="baml_language/crates/$crate/Cargo.toml"
  if [ -f "$manifest" ]; then
    echo "== $manifest =="
    rg -n 'tokio|rt-multi-thread' "$manifest" || true
  fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_pack_host/Cargo.toml` at line 27, The Cargo.toml
entry currently enables tokio = { workspace = true, features = [ "rt" ] } but
main.rs calls tokio::runtime::Runtime::new(), which requires the rt-multi-thread
runtime; update the features to include "rt-multi-thread" (e.g. features =
["rt-multi-thread"]) while keeping workspace = true so this crate explicitly
requests the multi-thread runtime; locate the tokio declaration in Cargo.toml
and change the features array accordingly and then verify the call site in
src/main.rs (where Runtime::new() is used) builds standalone without relying on
incidental transitive deps.
baml_language/crates/baml_cli/src/pack_command.rs (2)

221-226: ⚠️ Potential issue | 🟡 Minor

Fix the standalone-file remediation message.

This branch already committed to hermetic .baml mode, but the error tells the user to use --function <name>, which is rejected with a positional target on Lines 191-193. Tell them to add a main to the file, or pack from a project and select a function there instead.

🔧 Suggested wording
                 if !engine.function_exists("main") {
                     anyhow::bail!(
                         "Standalone file `{target}` has no `main` function. \
-                         Use `--function <name>` to pack a specific function."
+                         Add a `main` function to the file, or pack from a \
+                         project and use `--function <name>` there."
                     );
                 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/pack_command.rs` around lines 221 - 226,
The error message in the .baml standalone-file branch (when target
ends_with(".baml") and engine.function_exists("main") is false) is misleading;
update the bail message used in that branch to instruct the user to either add a
`main` function to the standalone .baml file or instead pack from a project and
select a function there (rather than telling them to use `--function <name>`
which is invalid for positional .baml targets). Keep the check using
engine.function_exists("main") and only change the text returned by
anyhow::bail! to the new remediation guidance.

293-313: ⚠️ Potential issue | 🟠 Major

Don’t fall through to Mach-O on unsupported OSes.

Any non-Linux/non-Windows target currently takes the Mach-O path. On FreeBSD/OpenBSD/etc. that writes the wrong binary format instead of failing fast with a clear error.

🔧 Suggested fix
 fn write_executable(host_bytes: &[u8], data: &[u8], writer: &mut std::fs::File) -> Result<()> {
-    let target = std::env::consts::OS;
-    if target.contains("linux") {
-        libsui::Elf::new(host_bytes)
-            .append(PACK_SECTION_NAME, data, writer)
-            .context("Failed to write ELF binary")?;
-    } else if target.contains("windows") {
-        libsui::PortableExecutable::from(host_bytes)
-            .context("Failed to parse PE binary")?
-            .write_resource(PACK_SECTION_NAME, data.to_vec())
-            .context("Failed to write PE resource")?
-            .build(writer)
-            .context("Failed to build PE binary")?;
-    } else {
-        libsui::Macho::from(host_bytes.to_vec())
-            .context("Failed to parse Mach-O binary")?
-            .write_section(PACK_SECTION_NAME, data.to_vec())
-            .context("Failed to write Mach-O section")?
-            .build_and_sign(writer)
-            .context("Failed to build Mach-O binary")?;
+    match std::env::consts::OS {
+        "linux" => {
+            libsui::Elf::new(host_bytes)
+                .append(PACK_SECTION_NAME, data, writer)
+                .context("Failed to write ELF binary")?;
+        }
+        "windows" => {
+            libsui::PortableExecutable::from(host_bytes)
+                .context("Failed to parse PE binary")?
+                .write_resource(PACK_SECTION_NAME, data.to_vec())
+                .context("Failed to write PE resource")?
+                .build(writer)
+                .context("Failed to build PE binary")?;
+        }
+        "macos" => {
+            libsui::Macho::from(host_bytes.to_vec())
+                .context("Failed to parse Mach-O binary")?
+                .write_section(PACK_SECTION_NAME, data.to_vec())
+                .context("Failed to write Mach-O section")?
+                .build_and_sign(writer)
+                .context("Failed to build Mach-O binary")?;
+        }
+        other => anyhow::bail!("`baml pack` is not supported on `{other}` yet"),
     }
     Ok(())
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/pack_command.rs` around lines 293 - 313,
The write_executable function currently treats any non-Linux/non-Windows OS as
macOS and attempts Mach-O modifications; change the branching to explicitly
detect macOS (std::env::consts::OS == "macos" or contains "macos") and run the
Mach-O path only for that case, and for any other OS return an error (e.g.,
bail/context with "Unsupported OS: {target}") so we fail fast instead of writing
the wrong binary; update handling around write_executable and PACK_SECTION_NAME
accordingly.
🧹 Nitpick comments (4)
baml_language/crates/bex_vm/src/vm.rs (1)

1252-1284: Add a focused unit test for the new Exit panic mapping.

Please add a unit test asserting that Line 1283 maps VmPanic::Exit { code } to PanicClass::Exit with Value::Int(code), and run cargo test --lib for touched crates before merge.

As per coding guidelines, **/*.rs: Prefer writing Rust unit tests over integration tests where possible, and always run cargo test --lib if Rust code changed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm/src/vm.rs` around lines 1252 - 1284, Add a
focused unit test for panic_to_exception_value that constructs a VmPanic::Exit {
code } (use a representative code like 42), calls panic_to_exception_value, and
asserts the resulting Value encodes PanicClass::Exit with a single field
Value::Int(code); place the test in the same module/file under #[cfg(test)] so
it’s a Rust unit test, reference the panic_to_exception_value function and the
VmPanic::Exit / PanicClass::Exit / Value::Int symbols in your assertions, and
run cargo test --lib for the crate before merging.
baml_language/crates/baml_exec/src/json_coerce.rs (1)

176-187: Union coercion swallows all but the last variant's error.

When every union variant fails, callers only see the error from the last variant — which is rarely the one closest to what the user intended (e.g. for int | string | SomeClass, a malformed object surfaces "Expected array for ..." or similar, depending on variant order). Aggregating errors per variant makes debugging --json-args much easier.

♻️ Sketch
 fn coerce_json_union(value: &serde_json::Value, variants: &[Ty]) -> Result<BexExternalValue> {
-    let mut last_err: Option<anyhow::Error> = None;
+    let mut errors: Vec<String> = Vec::with_capacity(variants.len());
     for variant in variants {
         match json_to_external_with_ty(value, variant) {
             Ok(v) => return Ok(v),
-            Err(e) => last_err = Some(e),
+            Err(e) => errors.push(format!("  - {variant}: {e}")),
         }
     }
-    Err(last_err.unwrap_or_else(|| anyhow!("No union variant matched value `{value}`")))
+    Err(anyhow!(
+        "No union variant matched value `{value}`:\n{}",
+        errors.join("\n")
+    ))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/json_coerce.rs` around lines 176 - 187,
coerce_json_union currently only returns the last variant error, making failures
hard to debug; modify coerce_json_union to collect each Err from
json_to_external_with_ty (include variant identity from Ty or its debug/display)
into a Vec<anyhow::Error> and when all variants fail return a single aggregated
error (e.g., join messages or use anyhow::Context to attach each variant's
error) so callers see per-variant failure reasons for BexExternalValue coercion.
baml_language/crates/baml_exec/src/dispatch.rs (2)

30-39: Consider #[must_use] on DispatchResult.

The whole point of returning DispatchResult (vs. Result<()>) is that Exit(code) and TargetError must be propagated to process exit handling in baml-pack-host and run_command. Accidentally dropping it via let _ = dispatch_target(...).await?; would silently turn an exit-code target into a success. Marking the enum #[must_use] makes that mistake a lint.

 /// Result of dispatching a target.
+#[must_use]
 pub enum DispatchResult {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 30 - 39, Add the
#[must_use] attribute to the DispatchResult enum so callers cannot accidentally
ignore its value; update the enum declaration for DispatchResult (which contains
variants Ok, TargetError, and Exit(i64)) by prefixing it with #[must_use] so
dropped return values (like Exit or TargetError) become a lint, ensuring callers
of dispatch_target(...).await propagate the result to process exit handling.

169-186: Unknown-key warning conflates two very different cases.

After merged has params consumed, what remains is any JSON key (or CLI-derived key) that didn't match a parameter name. A --json-args typo like {"nam": "alice"} for a target with param name will produce both a "Warning: unknown argument(s) ignored: nam" and "Missing required argument --name" — which is useful, but right now it's only a warning, so a user running a non-required-param typo in a packaged binary will appear to succeed with a silent wrong answer.

At minimum, consider upgrading this to an error (or gating behind a --strict/environment flag), and making the message mention where the key came from (JSON vs CLI) so typos are easier to diagnose.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 169 - 186, The
current logic in dispatch.rs that checks leftover keys in merged after consuming
params (the loop over param_names and the subsequent merged.is_empty() branch)
treats any unmatched keys as a mere warning; change this to surface a clear
error (or gate it behind a new --strict/ENV flag) and include the key origin
(JSON vs CLI) in the message so typos are diagnosable. To implement: when
constructing merged, track the source of each entry (e.g. mark keys added from
--json-args vs CLI flags), then replace the warning block that collects unknown
with an error return (anyhow::bail!) when strict mode is enabled (or always if
you choose to fail fast), formatting the message to list keys and their origins
and reference the missing/unknown alongside the existing Missing required
argument bail in the param consumption loop; update function/method dispatch
logic to accept the new strict flag or env toggle and surface that to callers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_cli/src/run_command.rs`:
- Around line 169-173: The current check_project_diagnostics method logs only
w.message which drops file/line/column context; change the vlog call in
check_project_diagnostics to emit the rendered/fully formatted diagnostic (the
full text provided by the diagnostic object) rather than the bare message—e.g.
log w.rendered or the diagnostic's formatted/string representation
(format!("{}", w) / w.to_string()) so verbose output retains location info from
crate::project_load::check_project_diagnostics.

In `@baml_language/crates/baml_exec/src/auto_cli.rs`:
- Around line 156-163: print_target_help currently omits the "(pass JSON)" hint
for Ty::Union even though parse_cli_value routes Ty::Union through JSON parsing;
update print_target_help to treat Ty::Union like Ty::Class, Ty::Map and Ty::List
by appending the "(pass JSON)" hint to the parameter/help text for union types
(apply same change in the second occurrence noted around the other block
referenced), ensuring the function name print_target_help and the type variant
Ty::Union are used to locate where to add the hint so help text matches
parse_cli_value behavior.
- Around line 56-64: The parsing logic in parse_auto_cli_args (and similarly in
extract_flag_keys) wrongly treats the next token as a value without checking if
it's another flag; update the branches where you handle the no-`=` case (the
block using raw, tokens, and i) to only consume tokens[i] as the value if it
exists and does not start with a '-' (or "--"); otherwise return an error (or
bail) for a missing value for `--{raw}`. Apply the same safeguard to the
analogous code path around the other block (the second occurrence you noted) so
flags like `--first --second=1` do not get misinterpreted as `--first` having
value `--second=1` and ensure extract_flag_keys also ignores tokens beginning
with '-' when deciding if a flag has an attached value.

In `@baml_language/crates/baml_exec/src/dispatch.rs`:
- Around line 122-126: The Err(e) branch in dispatch.rs truncates the error
chain by using eprintln!("Error: {e}"); change it to print the full anyhow error
chain (e.g., eprintln!("Error: {:#}", e) or eprintln!("{:?}", e)) so contextual
.context(...) layers and sources are shown when returning
Ok(DispatchResult::TargetError) from the match handling
bex_engine::EngineError::Exit and the Err(e) arm.

In `@baml_language/crates/baml_exec/src/json_coerce.rs`:
- Around line 84-97: The Ty::Int branch currently only accepts
serde_json::Number values that as_i64()/as_u64() provide, which rejects numeric
tokens parsed via the float path (e.g., 1.0); update the match in json_coerce.rs
(the Ty::Int { .. } arm that returns BexExternalValue::Int) to also handle
n.as_f64(): if as_f64() is finite and its fractional part is zero and the value
fits into an i64, coerce it to i64 and return BexExternalValue::Int; otherwise
preserve the existing error paths and messages (including the out-of-range error
for unsigned-to-signed conversion and the "Expected integer" bail).

In `@baml_language/crates/baml_pack_host/src/main.rs`:
- Around line 107-112: The current shortcut treats any occurrence of "--help" or
"-h" in cli_tokens as a request for target help; update the condition in main.rs
so target help is only triggered when the forwarded tokens consist solely of a
single help token. Concretely, keep using target_is_typed(&engine,
&envelope.target_name) and handle_help(&engine, &envelope.target_name) but
change the cli_tokens check to require that cli_tokens has length 1 and that the
sole element is "--help" or "-h" (i.e., match an exact single forwarded help
token instead of any occurrence); this ensures ./app --msg --help forwards the
literal "--help" to the typed parameter while still showing help for a lone
token.

In `@baml_language/crates/bex_vm_types/src/indexable.rs`:
- Around line 38-48: The Index<K> impls currently serialize/deserialize the
inner usize, which is architecture-dependent; instead serialize the index as a
fixed-width integer (e.g. u64) in the Serialize for Index<K> impl and in
Deserialize for Index<K> deserialize a u64 and convert it to usize using TryFrom
(or usize::try_from), returning a serde deserializer error on overflow;
reference the Serialize/Deserialize impls for Index<K>, the inner tuple value
Self(v, PhantomData), and use serde error creation (e.g. D::Error::custom) when
the u64 cannot fit into usize.

---

Outside diff comments:
In `@baml_language/crates/baml_cli/src/lib.rs`:
- Around line 43-65: The From<ExitCode> for i32 impl must normalize
user-supplied Exit(n) to a portable range; update the Exit(…) arm in the match
to validate and clamp the integer into the 0–255 range (handling negative values
too) before returning it to satisfy Unix/WIN differences, e.g., coerce the
incoming i32 to a signed wide type if needed, clamp to 0..=255, then cast to
i32; keep all other arms unchanged and ensure the logic lives in the impl
From<ExitCode> for i32 where ExitCode::Exit is matched.

In `@baml_language/crates/bex_vm_types/src/types.rs`:
- Around line 565-576: The Value enum currently serializes raw pointers via
Value::Object(HeapPtr) which yields dangling addresses on deserialize; update
the serde behavior for Value (in types.rs) so Object variants are not serialized
as raw pointers: implement custom Serialize/Deserialize for Value that either
(a) serializes Object(HeapPtr) as an index-based handle (consult the existing
ConstValue/index mapping pattern in the crate to translate HeapPtr -> index and
back during (de)serialization), or (b) refuses to serialize Object by returning
a serialization error for that variant; apply this change to the Value enum (and
update any helpers that construct/consume Value) so serialized artifacts never
contain raw heap addresses.

---

Duplicate comments:
In `@baml_language/crates/baml_cli/src/pack_command.rs`:
- Around line 221-226: The error message in the .baml standalone-file branch
(when target ends_with(".baml") and engine.function_exists("main") is false) is
misleading; update the bail message used in that branch to instruct the user to
either add a `main` function to the standalone .baml file or instead pack from a
project and select a function there (rather than telling them to use `--function
<name>` which is invalid for positional .baml targets). Keep the check using
engine.function_exists("main") and only change the text returned by
anyhow::bail! to the new remediation guidance.
- Around line 293-313: The write_executable function currently treats any
non-Linux/non-Windows OS as macOS and attempts Mach-O modifications; change the
branching to explicitly detect macOS (std::env::consts::OS == "macos" or
contains "macos") and run the Mach-O path only for that case, and for any other
OS return an error (e.g., bail/context with "Unsupported OS: {target}") so we
fail fast instead of writing the wrong binary; update handling around
write_executable and PACK_SECTION_NAME accordingly.

In `@baml_language/crates/baml_pack_host/Cargo.toml`:
- Line 27: The Cargo.toml entry currently enables tokio = { workspace = true,
features = [ "rt" ] } but main.rs calls tokio::runtime::Runtime::new(), which
requires the rt-multi-thread runtime; update the features to include
"rt-multi-thread" (e.g. features = ["rt-multi-thread"]) while keeping workspace
= true so this crate explicitly requests the multi-thread runtime; locate the
tokio declaration in Cargo.toml and change the features array accordingly and
then verify the call site in src/main.rs (where Runtime::new() is used) builds
standalone without relying on incidental transitive deps.

---

Nitpick comments:
In `@baml_language/crates/baml_exec/src/dispatch.rs`:
- Around line 30-39: Add the #[must_use] attribute to the DispatchResult enum so
callers cannot accidentally ignore its value; update the enum declaration for
DispatchResult (which contains variants Ok, TargetError, and Exit(i64)) by
prefixing it with #[must_use] so dropped return values (like Exit or
TargetError) become a lint, ensuring callers of dispatch_target(...).await
propagate the result to process exit handling.
- Around line 169-186: The current logic in dispatch.rs that checks leftover
keys in merged after consuming params (the loop over param_names and the
subsequent merged.is_empty() branch) treats any unmatched keys as a mere
warning; change this to surface a clear error (or gate it behind a new
--strict/ENV flag) and include the key origin (JSON vs CLI) in the message so
typos are diagnosable. To implement: when constructing merged, track the source
of each entry (e.g. mark keys added from --json-args vs CLI flags), then replace
the warning block that collects unknown with an error return (anyhow::bail!)
when strict mode is enabled (or always if you choose to fail fast), formatting
the message to list keys and their origins and reference the missing/unknown
alongside the existing Missing required argument bail in the param consumption
loop; update function/method dispatch logic to accept the new strict flag or env
toggle and surface that to callers.

In `@baml_language/crates/baml_exec/src/json_coerce.rs`:
- Around line 176-187: coerce_json_union currently only returns the last variant
error, making failures hard to debug; modify coerce_json_union to collect each
Err from json_to_external_with_ty (include variant identity from Ty or its
debug/display) into a Vec<anyhow::Error> and when all variants fail return a
single aggregated error (e.g., join messages or use anyhow::Context to attach
each variant's error) so callers see per-variant failure reasons for
BexExternalValue coercion.

In `@baml_language/crates/bex_vm/src/vm.rs`:
- Around line 1252-1284: Add a focused unit test for panic_to_exception_value
that constructs a VmPanic::Exit { code } (use a representative code like 42),
calls panic_to_exception_value, and asserts the resulting Value encodes
PanicClass::Exit with a single field Value::Int(code); place the test in the
same module/file under #[cfg(test)] so it’s a Rust unit test, reference the
panic_to_exception_value function and the VmPanic::Exit / PanicClass::Exit /
Value::Int symbols in your assertions, and run cargo test --lib for the crate
before merging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d018c8f2-e258-446f-b4c6-c44a0978b4fb

📥 Commits

Reviewing files that changed from the base of the PR and between 8d47d56 and 3d7023e.

⛔ Files ignored due to path filters (8)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (40)
  • baml_language/Cargo.toml
  • baml_language/crates/baml_base/Cargo.toml
  • baml_language/crates/baml_base/src/attr.rs
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_panics/panics.baml
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.baml
  • baml_language/crates/baml_builtins2_codegen/src/codegen.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_cli/Cargo.toml
  • baml_language/crates/baml_cli/src/commands.rs
  • baml_language/crates/baml_cli/src/lib.rs
  • baml_language/crates/baml_cli/src/main.rs
  • baml_language/crates/baml_cli/src/pack_command.rs
  • baml_language/crates/baml_cli/src/project_load.rs
  • baml_language/crates/baml_cli/src/run_command.rs
  • baml_language/crates/baml_cli/tests/common/mod.rs
  • baml_language/crates/baml_cli/tests/pack_e2e.rs
  • baml_language/crates/baml_exec/Cargo.toml
  • baml_language/crates/baml_exec/src/auto_cli.rs
  • baml_language/crates/baml_exec/src/dispatch.rs
  • baml_language/crates/baml_exec/src/envelope.rs
  • baml_language/crates/baml_exec/src/json_coerce.rs
  • baml_language/crates/baml_exec/src/lib.rs
  • baml_language/crates/baml_exec/src/output.rs
  • baml_language/crates/baml_pack_host/Cargo.toml
  • baml_language/crates/baml_pack_host/build.rs
  • baml_language/crates/baml_pack_host/src/main.rs
  • baml_language/crates/baml_tests/tests/exceptions.rs
  • baml_language/crates/baml_type/Cargo.toml
  • baml_language/crates/baml_type/src/lib.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/bex_vm/src/errors.rs
  • baml_language/crates/bex_vm/src/package_baml/sys.rs
  • baml_language/crates/bex_vm/src/vm.rs
  • baml_language/crates/bex_vm_types/Cargo.toml
  • baml_language/crates/bex_vm_types/src/bytecode.rs
  • baml_language/crates/bex_vm_types/src/heap_ptr.rs
  • baml_language/crates/bex_vm_types/src/indexable.rs
  • baml_language/crates/bex_vm_types/src/types.rs
  • baml_language/stow.toml
✅ Files skipped from review due to trivial changes (9)
  • baml_language/crates/baml_base/Cargo.toml
  • baml_language/crates/baml_type/Cargo.toml
  • baml_language/crates/baml_pack_host/build.rs
  • baml_language/crates/baml_cli/Cargo.toml
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.baml
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_base/src/attr.rs
  • baml_language/crates/baml_exec/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (11)
  • baml_language/crates/bex_vm_types/Cargo.toml
  • baml_language/crates/baml_exec/src/envelope.rs
  • baml_language/crates/bex_vm_types/src/heap_ptr.rs
  • baml_language/Cargo.toml
  • baml_language/crates/baml_type/src/lib.rs
  • baml_language/crates/baml_cli/tests/common/mod.rs
  • baml_language/crates/baml_exec/src/output.rs
  • baml_language/crates/baml_exec/src/lib.rs
  • baml_language/crates/baml_cli/src/project_load.rs
  • baml_language/stow.toml
  • baml_language/crates/baml_cli/tests/pack_e2e.rs

Comment on lines 169 to 173
fn check_project_diagnostics(&self, db: &ProjectDatabase, bail_context: &str) -> Result<()> {
let project = db
.get_project()
.ok_or_else(|| anyhow!("No project context"))?;
let source_files = db.get_source_files();
let diagnostics = baml_project::collect_diagnostics(db, project, &source_files);

let errors: Vec<_> = diagnostics
.iter()
.filter(|d| d.severity == Severity::Error)
.collect();
let warnings: Vec<_> = diagnostics
.iter()
.filter(|d| d.severity == Severity::Warning)
.collect();

let needs_sources = !errors.is_empty() || (self.verbose && !warnings.is_empty());
let (sources, file_paths) = if needs_sources {
let mut sources = HashMap::new();
let mut file_paths = HashMap::new();
for sf in &source_files {
let file_id = sf.file_id(db);
sources.insert(file_id, sf.text(db).to_string());
file_paths.insert(file_id, sf.path(db));
}
(sources, file_paths)
} else {
(HashMap::new(), HashMap::new())
};

if self.verbose && !warnings.is_empty() {
let rendered = render::render_diagnostics(
&warnings.iter().copied().cloned().collect::<Vec<_>>(),
&sources,
&file_paths,
&render::RenderConfig::cli(),
);
eprintln!("{rendered}");
}

if !errors.is_empty() {
let rendered = render::render_diagnostics(
&errors.iter().copied().cloned().collect::<Vec<_>>(),
&sources,
&file_paths,
&render::RenderConfig::cli(),
);
eprintln!("{rendered}");
anyhow::bail!("{bail_context}");
let warnings = crate::project_load::check_project_diagnostics(db, bail_context)?;
for w in &warnings {
self.vlog(format_args!("warning: {}", w.message));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t strip location info from verbose warnings.

This path logs only w.message, so verbose mode drops the file/line/column context that makes warnings actionable. Users end up back at warning: ... with no way to locate the source. Emit the rendered diagnostic text here instead of the bare message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/run_command.rs` around lines 169 - 173, The
current check_project_diagnostics method logs only w.message which drops
file/line/column context; change the vlog call in check_project_diagnostics to
emit the rendered/fully formatted diagnostic (the full text provided by the
diagnostic object) rather than the bare message—e.g. log w.rendered or the
diagnostic's formatted/string representation (format!("{}", w) / w.to_string())
so verbose output retains location info from
crate::project_load::check_project_diagnostics.

Comment on lines +56 to +64
let (key, val_str) = if let Some(eq_pos) = raw.find('=') {
(&raw[..eq_pos], &raw[eq_pos + 1..])
} else {
i += 1;
if i >= tokens.len() {
anyhow::bail!("Missing value for `--{raw}`");
}
(raw, tokens[i].as_str())
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat the next flag as a value.

Both walkers blindly consume the token after --name even when that token is another flag. That misparses inputs like --first --second=1: parse_auto_cli_args binds "--second=1" as the first value, and extract_flag_keys drops second entirely.

🛠️ Proposed fix
         let (key, val_str) = if let Some(eq_pos) = raw.find('=') {
             (&raw[..eq_pos], &raw[eq_pos + 1..])
         } else {
             i += 1;
-            if i >= tokens.len() {
+            if i >= tokens.len() || tokens[i].starts_with("--") {
                 anyhow::bail!("Missing value for `--{raw}`");
             }
             (raw, tokens[i].as_str())
         };
@@
             if !key.is_empty() {
                 keys.push(key.to_string());
             }
-            if !raw.contains('=') {
+            if !raw.contains('=')
+                && tokens.get(i + 1).is_some_and(|next| !next.starts_with("--"))
+            {
                 i += 1;
             }
         }

Also applies to: 94-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/auto_cli.rs` around lines 56 - 64, The
parsing logic in parse_auto_cli_args (and similarly in extract_flag_keys)
wrongly treats the next token as a value without checking if it's another flag;
update the branches where you handle the no-`=` case (the block using raw,
tokens, and i) to only consume tokens[i] as the value if it exists and does not
start with a '-' (or "--"); otherwise return an error (or bail) for a missing
value for `--{raw}`. Apply the same safeguard to the analogous code path around
the other block (the second occurrence you noted) so flags like `--first
--second=1` do not get misinterpreted as `--first` having value `--second=1` and
ensure extract_flag_keys also ignores tokens beginning with '-' when deciding if
a flag has an attached value.

Comment on lines +156 to +163
Ty::Class(..) | Ty::Map { .. } | Ty::List(..) | Ty::Union(..) => {
match serde_json::from_str::<serde_json::Value>(raw) {
Ok(json) => json_to_external_with_ty(&json, ty),
Err(_) => anyhow::bail!(
"Parameter type `{ty}` requires JSON.\n\
Use `--json-args '{{...}}'` or pass a JSON string for this parameter."
),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mark union parameters as JSON in the generated help.

parse_cli_value routes Ty::Union(..) through JSON parsing, but print_target_help omits the (pass JSON) hint for unions, so the help text is misleading for exactly the inputs that will otherwise fail.

📝 Proposed fix
-                Ty::Class(..) | Ty::Map { .. } | Ty::List(..) => " (pass JSON)".to_string(),
+                Ty::Class(..) | Ty::Map { .. } | Ty::List(..) | Ty::Union(..) => {
+                    " (pass JSON)".to_string()
+                }

Also applies to: 199-203

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/auto_cli.rs` around lines 156 - 163,
print_target_help currently omits the "(pass JSON)" hint for Ty::Union even
though parse_cli_value routes Ty::Union through JSON parsing; update
print_target_help to treat Ty::Union like Ty::Class, Ty::Map and Ty::List by
appending the "(pass JSON)" hint to the parameter/help text for union types
(apply same change in the second occurrence noted around the other block
referenced), ensuring the function name print_target_help and the type variant
Ty::Union are used to locate where to add the hint so help text matches
parse_cli_value behavior.

Comment on lines +122 to +126
Err(bex_engine::EngineError::Exit { code }) => Ok(DispatchResult::Exit(code)),
Err(e) => {
eprintln!("Error: {e}");
Ok(DispatchResult::TargetError)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Error chain is truncated on target failure.

eprintln!("Error: {e}") prints only the outermost anyhow::Error message, hiding all .context(...) layers and the source cause. Most target-side failures in this codebase attach context (json coercion path, parameter name, etc.), so users see a decontextualized message here. Prefer the alternate or debug formatter so the full chain is surfaced on stderr.

♻️ Proposed fix
-        Err(e) => {
-            eprintln!("Error: {e}");
-            Ok(DispatchResult::TargetError)
-        }
+        Err(e) => {
+            eprintln!("Error: {e:#}");
+            Ok(DispatchResult::TargetError)
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 122 - 126, The
Err(e) branch in dispatch.rs truncates the error chain by using
eprintln!("Error: {e}"); change it to print the full anyhow error chain (e.g.,
eprintln!("Error: {:#}", e) or eprintln!("{:?}", e)) so contextual .context(...)
layers and sources are shown when returning Ok(DispatchResult::TargetError) from
the match handling bex_engine::EngineError::Exit and the Err(e) arm.

Comment on lines +84 to +97
Ty::Int { .. } => match value {
J::Number(n) => {
if let Some(i) = n.as_i64() {
Ok(BexExternalValue::Int(i))
} else if let Some(u) = n.as_u64() {
i64::try_from(u)
.map(BexExternalValue::Int)
.map_err(|_| anyhow!("Integer out of range for int: {u}"))
} else {
anyhow::bail!("Expected integer, got `{value}`")
}
}
_ => anyhow::bail!("Expected integer, got `{value}`"),
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Integer coercion rejects JSON numbers in float form.

n.as_i64() / n.as_u64() both return None for any JSON number that serde_json parsed through its floating-point path (e.g. 1.0, 1e3, or values written with a decimal point). For a Ty::Int parameter, --json-args '{"n": 1.0}' will therefore fail with "Expected integer" even though the value is mathematically integral. If users commonly pipe JSON from tools that emit whole numbers as floats (jq with certain ops, many languages' default JSON encoders), this will surprise them.

Consider also accepting a finite as_f64() whose fractional part is zero and that fits in i64:

♻️ Sketch
         Ty::Int { .. } => match value {
             J::Number(n) => {
                 if let Some(i) = n.as_i64() {
                     Ok(BexExternalValue::Int(i))
                 } else if let Some(u) = n.as_u64() {
                     i64::try_from(u)
                         .map(BexExternalValue::Int)
                         .map_err(|_| anyhow!("Integer out of range for int: {u}"))
+                } else if let Some(f) = n.as_f64().filter(|f| f.is_finite() && f.fract() == 0.0
+                    && *f >= i64::MIN as f64 && *f <= i64::MAX as f64)
+                {
+                    Ok(BexExternalValue::Int(f as i64))
                 } else {
                     anyhow::bail!("Expected integer, got `{value}`")
                 }
             }
             _ => anyhow::bail!("Expected integer, got `{value}`"),
         },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Ty::Int { .. } => match value {
J::Number(n) => {
if let Some(i) = n.as_i64() {
Ok(BexExternalValue::Int(i))
} else if let Some(u) = n.as_u64() {
i64::try_from(u)
.map(BexExternalValue::Int)
.map_err(|_| anyhow!("Integer out of range for int: {u}"))
} else {
anyhow::bail!("Expected integer, got `{value}`")
}
}
_ => anyhow::bail!("Expected integer, got `{value}`"),
},
Ty::Int { .. } => match value {
J::Number(n) => {
if let Some(i) = n.as_i64() {
Ok(BexExternalValue::Int(i))
} else if let Some(u) = n.as_u64() {
i64::try_from(u)
.map(BexExternalValue::Int)
.map_err(|_| anyhow!("Integer out of range for int: {u}"))
} else if let Some(f) = n.as_f64().filter(|f| f.is_finite() && f.fract() == 0.0
&& *f >= i64::MIN as f64 && *f <= i64::MAX as f64)
{
Ok(BexExternalValue::Int(f as i64))
} else {
anyhow::bail!("Expected integer, got `{value}`")
}
}
_ => anyhow::bail!("Expected integer, got `{value}`"),
},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/json_coerce.rs` around lines 84 - 97, The
Ty::Int branch currently only accepts serde_json::Number values that
as_i64()/as_u64() provide, which rejects numeric tokens parsed via the float
path (e.g., 1.0); update the match in json_coerce.rs (the Ty::Int { .. } arm
that returns BexExternalValue::Int) to also handle n.as_f64(): if as_f64() is
finite and its fractional part is zero and the value fits into an i64, coerce it
to i64 and return BexExternalValue::Int; otherwise preserve the existing error
paths and messages (including the out-of-range error for unsigned-to-signed
conversion and the "Expected integer" bail).

Comment on lines +107 to +112
// `--help` is reserved on typed targets only; parameterless `main()`
// owns its full argv and can handle `--help` itself.
if target_is_typed(&engine, &envelope.target_name)
&& cli_tokens.iter().any(|t| t == "--help" || t == "-h")
&& handle_help(&engine, &envelope.target_name)
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only reserve help when it is the sole forwarded token.

This short-circuits on help if any CLI token equals --help/-h, so a packaged binary cannot pass the literal string --help to a typed parameter (./app --msg --help prints help instead of calling the function). baml run only treats a lone forwarded help token as target help; the host should match that.

🔧 Suggested fix
-    if target_is_typed(&engine, &envelope.target_name)
-        && cli_tokens.iter().any(|t| t == "--help" || t == "-h")
+    if target_is_typed(&engine, &envelope.target_name)
+        && matches!(cli_tokens, [t] if t == "--help" || t == "-h")
         && handle_help(&engine, &envelope.target_name)
     {
         return ExitCode::SUCCESS;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// `--help` is reserved on typed targets only; parameterless `main()`
// owns its full argv and can handle `--help` itself.
if target_is_typed(&engine, &envelope.target_name)
&& cli_tokens.iter().any(|t| t == "--help" || t == "-h")
&& handle_help(&engine, &envelope.target_name)
{
// `--help` is reserved on typed targets only; parameterless `main()`
// owns its full argv and can handle `--help` itself.
if target_is_typed(&engine, &envelope.target_name)
&& matches!(cli_tokens, [t] if t == "--help" || t == "-h")
&& handle_help(&engine, &envelope.target_name)
{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_pack_host/src/main.rs` around lines 107 - 112, The
current shortcut treats any occurrence of "--help" or "-h" in cli_tokens as a
request for target help; update the condition in main.rs so target help is only
triggered when the forwarded tokens consist solely of a single help token.
Concretely, keep using target_is_typed(&engine, &envelope.target_name) and
handle_help(&engine, &envelope.target_name) but change the cli_tokens check to
require that cli_tokens has length 1 and that the sole element is "--help" or
"-h" (i.e., match an exact single forwarded help token instead of any
occurrence); this ensures ./app --msg --help forwards the literal "--help" to
the typed parameter while still showing help for a lone token.

Comment on lines +38 to +48
impl<K> Serialize for Index<K> {
fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
self.0.serialize(serializer)
}
}

impl<'de, K> Deserialize<'de> for Index<K> {
fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
usize::deserialize(deserializer).map(|v| Self(v, PhantomData))
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Index serde implementation =="
rg -n "impl<.*Serialize for Index|impl<'de, K> Deserialize<'de> for Index|usize::deserialize" \
  baml_language/crates/bex_vm_types/src/indexable.rs

echo
echo "== Pack/envelope serialization boundaries =="
rg -n "PackEnvelope|serialize|deserialize|to_vec|from_slice|bincode|postcard|rmp|serde_json" \
  --type rust baml_language/crates

echo
echo "== 32-bit / wasm targets present in runtime tree =="
rg -n "wasm32|target_arch\\s*=\\s*\"wasm32\"" --type rust baml_language/crates

Repository: BoundaryML/baml

Length of output: 50372


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Checking for Cargo.toml with wasm32/32-bit targets =="
find baml_language/crates -name "Cargo.toml" -exec grep -l "wasm32\|x86\|armv7\|target" {} \; | head -20

echo
echo "== Index usage in file I/O or serialization contexts =="
rg -n "Index<|Pool<" baml_language/crates/bex_vm_types/src/indexable.rs | head -20

echo
echo "== Search for encode_to_vec, to_vec, bincode in bex_vm_types or related =="
rg -n "bincode|postcard|encode_to_vec|to_bytes" baml_language/crates/bex_vm_types --type rust | head -20

echo
echo "== Check if Index is serialized in artifacts/packages =="
rg -n "Index.*serialize\|Pool.*serialize" baml_language/crates --type rust | head -20

Repository: BoundaryML/baml

Length of output: 1337


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check bridge_wasm/Cargo.toml for wasm targets =="
grep -A 10 "target" baml_language/crates/bridge_wasm/Cargo.toml | head -20

echo
echo "== Check bex_vm_types Cargo.toml =="
cat baml_language/crates/bex_vm_types/Cargo.toml | grep -A 5 "serde\|features"

echo
echo "== Search for actual Index serialization usage (serde_json, bincode, etc) =="
rg -n "serde::ser\|serde::de\|serialize.*Index\|Index.*serialize" baml_language/crates --type rust -A 3 | head -40

echo
echo "== Check if Index is used in network/file APIs =="
rg -n "Index<.*to_vec\|Index.*encode\|Pool.*to_vec\|Pool.*encode" baml_language/crates --type rust | head -20

echo
echo "== Check what serde data format is used for artifact storage =="
rg -n "serde_json::to_string\|bincode::serialize\|postcard::to_bytes\|rmp::encode" baml_language/crates/bex_vm_types --type rust

Repository: BoundaryML/baml

Length of output: 781


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check if bex_vm_types types are serialized to disk/network =="
rg -n "Index\|Pool" baml_language/crates/bex_vm_types/src/lib.rs | head -20

echo
echo "== Look for actual serialization of bex_vm_types (file writes, network sends) =="
rg -n "encode_to_vec\|serde_json::to_string\|bincode\|to_vec.*Index\|to_vec.*Pool" \
  baml_language/crates --type rust -B 3 -A 3 | grep -E "(Index|Pool|encode_to_vec|to_string)" | head -40

echo
echo "== Check if bex_vm_types is exported/used across boundaries =="
rg -n "pub use.*Index\|pub use.*Pool\|pub struct.*Index\|pub struct.*Pool" \
  baml_language/crates --type rust | head -20

echo
echo "== Examine bridge code for serialization of VM types =="
ls -la baml_language/crates/bridge*/src/ 2>/dev/null | head -20

echo
echo "== Check what types are encoded in bridge_python/nodejs/wasm =="
rg -n "encode_to_vec\|to_vec" baml_language/crates/bridge_{python,nodejs,wasm}/src --type rust | head -20

Repository: BoundaryML/baml

Length of output: 123


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== List bex_vm_types source files =="
find baml_language/crates/bex_vm_types/src -type f -name "*.rs"

echo
echo "== Read the indexable.rs file to understand Index/Pool types =="
wc -l baml_language/crates/bex_vm_types/src/indexable.rs
head -200 baml_language/crates/bex_vm_types/src/indexable.rs

echo
echo "== Check if bex_vm_types is used to serialize runtime state =="
rg -n "bex_vm_types" baml_language/crates --type rust | grep -E "(serialize|encode|to_vec|to_string)" | head -20

echo
echo "== Look for wasm32-unknown-unknown build target usage =="
rg -n "wasm32" baml_language/crates --type rust --type toml | head -10

Repository: BoundaryML/baml

Length of output: 7622


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check if Index/Pool are part of serialized bytecode or artifacts =="
rg -n "Index\|Pool" baml_language/crates/bex_vm_types/src/lib.rs -A 3

echo
echo "== Search for bytecode serialization that might include Index values =="
cat baml_language/crates/bex_vm_types/src/bytecode.rs | head -100

echo
echo "== Check if bytecode crosses architecture/process boundaries =="
rg -n "bytecode.*serde\|Bytecode.*Serialize\|encode_to_vec.*bytecode" baml_language/crates --type rust | head -20

echo
echo "== Look at what bex_events serializes (cross-process event stream) =="
head -50 baml_language/crates/bex_events/src/serialize.rs

Repository: BoundaryML/baml

Length of output: 129


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check bex_vm_types/lib.rs for what is exported =="
cat baml_language/crates/bex_vm_types/src/lib.rs

echo
echo "== Check bytecode.rs first 100 lines =="
head -100 baml_language/crates/bex_vm_types/src/bytecode.rs

echo
echo "== Check if bridge_wasm depends on bex_vm_types =="
grep -E "bex_vm|dependencies" baml_language/crates/bridge_wasm/Cargo.toml

Repository: BoundaryML/baml

Length of output: 10424


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check if Instruction enum contains Index types =="
grep -n "pub enum Instruction" baml_language/crates/bex_vm_types/src/bytecode.rs -A 200 | head -150

echo
echo "== Check if Bytecode contains serializable Instructions =="
grep -n "pub struct Bytecode\|pub.*Instruction" baml_language/crates/bex_vm_types/src/bytecode.rs | head -20

echo
echo "== Look for where bytecode is actually serialized/persisted =="
rg -n "Bytecode.*encode\|Bytecode.*serialize\|encode_to_vec\|to_vec" baml_language/crates/bex_vm --type rust | head -15

Repository: BoundaryML/baml

Length of output: 6563


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search more carefully for bytecode serialization =="
rg -n "serialize.*Bytecode\|Bytecode.*to_vec\|encode.*Bytecode\|serde_json\|bincode" \
  baml_language/crates/bex_vm --type rust | head -30

echo
echo "== Check bex_events serialize.rs which appeared earlier =="
grep -n "Bytecode\|Index\|serialize" baml_language/crates/bex_events/src/serialize.rs | head -40

echo
echo "== Check if Bytecode is part of Program which gets serialized =="
grep -n "pub struct Program\|Bytecode" baml_language/crates/bex_vm_types/src/types.rs | head -30

Repository: BoundaryML/baml

Length of output: 113


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check Program struct definition =="
grep -n "pub struct Program" baml_language/crates/bex_vm_types/src/types.rs -A 20

echo
echo "== Check if Program is serialized =="
rg -n "Program.*Serialize\|Program.*serialize\|Program.*to_vec" baml_language/crates --type rust | head -20

echo
echo "== Look at actual Index usage in Instruction =="
grep -n "GlobalIndex\|ObjectIndex\|StackIndex" baml_language/crates/bex_vm_types/src/bytecode.rs | head -30

echo
echo "== Check indexable.rs exports again to see type aliases =="
tail -100 baml_language/crates/bex_vm_types/src/indexable.rs

Repository: BoundaryML/baml

Length of output: 1126


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Search for Bytecode Serialize derive =="
rg -n "#\[derive.*Serialize" baml_language/crates/bex_vm_types/src/bytecode.rs | head -20

echo
echo "== Check what comes right before pub struct Bytecode =="
grep -B 5 "pub struct Bytecode" baml_language/crates/bex_vm_types/src/bytecode.rs

echo
echo "== Check Instruction enum serialize derive =="
grep -B 5 "pub enum Instruction" baml_language/crates/bex_vm_types/src/bytecode.rs

echo
echo "== Find where GlobalIndex type alias is defined =="
rg -n "type GlobalIndex\|pub.*GlobalIndex" baml_language/crates/bex_vm_types/src/indexable.rs

Repository: BoundaryML/baml

Length of output: 1932


🏁 Script executed:

#!/bin/bash
set -euo pymakefile

echo "== Find type aliases for GlobalIndex, ObjectIndex, StackIndex =="
rg -n "GlobalIndex|ObjectIndex|StackIndex" baml_language/crates/bex_vm_types/src/indexable.rs | tail -20

Repository: BoundaryML/baml

Length of output: 114


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Find type aliases for GlobalIndex, ObjectIndex, StackIndex =="
rg -n "GlobalIndex|ObjectIndex|StackIndex" baml_language/crates/bex_vm_types/src/indexable.rs | tail -20

Repository: BoundaryML/baml

Length of output: 788


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== Check if there are any tests for Bytecode serialization =="
rg -n "test.*bytecode\|#\[test\].*serialize" baml_language/crates/bex_vm_types/src --type rust -A 5 | head -40

echo
echo "== Look for cargo test --lib instructions =="
grep -r "cargo test" baml_language/crates/bex_vm_types --type rust | head -10

Repository: BoundaryML/baml

Length of output: 120


Use a fixed-width integer for serialized Index<K.

Serializing Index<K> as usize makes the on-disk/on-wire format architecture-dependent. The codebase explicitly supports WASM (wasm32-unknown-unknown), and Bytecode containing Index values (via GlobalIndex, ObjectIndex) derives Serialize/Deserialize. If bytecode is persisted and consumed on a different architecture (64-bit vs 32-bit/WASM), deserialization will be incorrect or fail.

💡 Proposed fix
 impl<K> Serialize for Index<K> {
     fn serialize<S: serde::Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
-        self.0.serialize(serializer)
+        (self.0 as u64).serialize(serializer)
     }
 }

 impl<'de, K> Deserialize<'de> for Index<K> {
     fn deserialize<D: serde::Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
-        usize::deserialize(deserializer).map(|v| Self(v, PhantomData))
+        let raw = u64::deserialize(deserializer)?;
+        let raw = usize::try_from(raw).map_err(D::Error::custom)?;
+        Ok(Self(raw, PhantomData))
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/indexable.rs` around lines 38 - 48, The
Index<K> impls currently serialize/deserialize the inner usize, which is
architecture-dependent; instead serialize the index as a fixed-width integer
(e.g. u64) in the Serialize for Index<K> impl and in Deserialize for Index<K>
deserialize a u64 and convert it to usize using TryFrom (or usize::try_from),
returning a serde deserializer error on overflow; reference the
Serialize/Deserialize impls for Index<K>, the inner tuple value Self(v,
PhantomData), and use serde error creation (e.g. D::Error::custom) when the u64
cannot fit into usize.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
baml_language/crates/bex_vm_types/src/indexable.rs (1)

274-346: ⚠️ Potential issue | 🟠 Major

ObjectIndex has two incompatible serialized shapes depending on the heap_debug feature.

With heap_debug enabled, ObjectIndex is a struct that derives Serialize and serializes as { raw: usize, epoch: u32 }. Without the feature, ObjectIndex = Index<ObjectKind>, which serializes via Index<K>'s custom impl as a bare usize. Since ObjectIndex appears directly in the Instruction enum (AllocInstance, AllocVariant, MakeClosure) and in ConstValue::Object, and these are embedded in the Program wrapped by PackEnvelope, a binary packed with heap_debug=on will produce a different wire format than one packed with heap_debug=off. When loading, a mismatch causes deserialization failure or silent data corruption. No validation exists in PackEnvelope to detect or reject feature-flag mismatches.

Options:

  • Skip epoch from the serialized form (always serialize as a single integer, reconstruct epoch = 0 on deserialize), so the wire format is feature-invariant.
  • Or make the pack/envelope format explicitly reject mismatched feature flags with a clear error.
💡 Sketch: make the wire format feature-invariant
 #[cfg(feature = "heap_debug")]
-#[derive(Clone, Copy, Eq, PartialEq, Ord, PartialOrd, Hash, Serialize, Deserialize)]
+#[derive(Clone, Copy, Eq, PartialEq, Ord, PartialOrd, Hash)]
 pub struct ObjectIndex {
     raw: usize,
     epoch: u32,
 }
+
+#[cfg(feature = "heap_debug")]
+impl Serialize for ObjectIndex {
+    fn serialize<S: serde::Serializer>(&self, s: S) -> Result<S::Ok, S::Error> {
+        (self.raw as u64).serialize(s)
+    }
+}
+
+#[cfg(feature = "heap_debug")]
+impl<'de> Deserialize<'de> for ObjectIndex {
+    fn deserialize<D: serde::Deserializer<'de>>(d: D) -> Result<Self, D::Error> {
+        use serde::de::Error as _;
+        let raw = u64::deserialize(d)?;
+        let raw = usize::try_from(raw).map_err(D::Error::custom)?;
+        Ok(Self { raw, epoch: 0 })
+    }
+}

(Combine with the u64 fix suggested for Index<K> so both variants share the same bare-integer encoding.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/indexable.rs` around lines 274 - 346,
ObjectIndex currently serializes differently with and without heap_debug,
breaking the wire format for Instruction
(AllocInstance/AllocVariant/MakeClosure) and ConstValue::Object inside
PackEnvelope; make the wire format feature-invariant by adding a custom
Serialize/Deserialize impl for ObjectIndex (the #[cfg(feature = "heap_debug")]
struct) that serializes/deserializes as a single integer (the raw usize) and
reconstructs epoch = 0 on deserialize so it matches Index<ObjectKind>'s
bare-integer representation, then run serialization tests of
Program/PackEnvelope to confirm compatibility; alternatively (if you prefer
rejecting mismatches) add explicit feature-flag validation inside PackEnvelope
deserialization to fail with a clear error when heap_debug presence differs.
baml_language/crates/baml_cli/src/run_command.rs (1)

227-246: ⚠️ Potential issue | 🟠 Major

Keep standalone .baml runs hermetic.

After load_and_compile_standalone, this path still reads self.from/baml.toml, validates [scripts], and uses them for --list. That means a standalone file can fail because of unrelated script errors in the caller's project, which breaks the single-file contract documented above.

Suggested fix
-        let toml_content = std::fs::read_to_string(self.from.join("baml.toml")).unwrap_or_default();
-        let scripts = Self::parse_scripts(&toml_content);
+        let standalone_target = self.target.as_ref().is_some_and(|t| t.ends_with(".baml"));
+        let (toml_content, scripts) = if standalone_target {
+            (String::new(), HashMap::new())
+        } else {
+            let toml_content =
+                std::fs::read_to_string(self.from.join("baml.toml")).unwrap_or_default();
+            let scripts = Self::parse_scripts(&toml_content);
+            (toml_content, scripts)
+        };
         let namespaces = collect_namespaces(&engine);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/run_command.rs` around lines 227 - 246, The
current flow still reads self.from.join("baml.toml") and validates scripts even
when load_and_compile_standalone(...) was used, breaking hermetic single-file
runs; move the baml.toml read/parse/validation behind the non-standalone branch
so that when target.as_ref().is_some_and(|t| t.ends_with(".baml")) you skip
reading self.from/baml.toml (e.g. set toml_content to empty and scripts to an
empty list or parse only embedded scripts) and do not call validate_scripts on
external project files; keep collect_namespaces(&engine) and print_list(&engine,
&scripts, &namespaces, &self.output) working with the standalone scripts value.
♻️ Duplicate comments (5)
baml_language/stow.toml (1)

88-89: ⚠️ Potential issue | 🟡 Minor

Reason string is stale after expanding the allow-list.

Line 88 now additionally permits baml_cli, baml_exec, and baml_pack_host, but Line 89 still claims only compiler_emit and baml_project may depend on bex_vm_types. Update the reason to reflect the surface execution crates so the error message matches reality.

Suggested wording
-reason = "Only compiler_emit and baml_project crates should depend on bex_vm_types (pure data layer)."
+reason = "Only compiler emit, baml_project, and surface execution crates (baml_cli, baml_exec, baml_pack_host) should depend on bex_vm_types (pure data layer)."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/stow.toml` around lines 88 - 89, The reason string for the
bex_vm_types allow-list is stale: update the "reason" value used with
allowed_crates to reflect the added crates (baml_cli, baml_exec, baml_pack_host)
so the message matches the actual allow-list; modify the reason text (near
allowed_crates and reason) to mention the surface execution/CLI/host crates in
addition to baml_compiler_emit and baml_project so it accurately explains why
those crates may depend on bex_vm_types.
baml_language/crates/baml_exec/src/auto_cli.rs (1)

199-203: ⚠️ Potential issue | 🟡 Minor

Mention JSON input for union parameters in target help.

parse_cli_value() routes Ty::Union(..) through JSON parsing, but the help text omits the same (pass JSON) hint it gives for class/map/list params. That makes the generated usage text misleading.

Suggested fix
-                Ty::Class(..) | Ty::Map { .. } | Ty::List(..) => " (pass JSON)".to_string(),
+                Ty::Class(..) | Ty::Map { .. } | Ty::List(..) | Ty::Union(..) => {
+                    " (pass JSON)".to_string()
+                }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/auto_cli.rs` around lines 199 - 203, The
help text generator omits the JSON hint for union types even though
parse_cli_value() handles Ty::Union via JSON parsing; update the type hint logic
where `type_hint` is computed (the match on `ty`) to include `Ty::Union(..)` in
the same arm as `Ty::Class`, `Ty::Map`, and `Ty::List` (i.e., return " (pass
JSON)"), so union parameters show the correct usage hint in help output.
baml_language/crates/baml_cli/src/run_command.rs (1)

170-172: ⚠️ Potential issue | 🟠 Major

Don't drop location data from verbose warnings.

This still logs only w.message, which strips the file/line/column context that makes warnings actionable. Verbose mode should emit the rendered/display form of the diagnostic here, not just the bare message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_cli/src/run_command.rs` around lines 170 - 172, The
verbose warning currently logs only w.message (in the self.vlog call inside the
run command after check_project_diagnostics), which discards file/line/column
context; update that call to print the diagnostic's full rendered/display form
(or its formatted string that includes location and context) instead of just
w.message so verbose mode emits the complete diagnostic (e.g., use the
diagnostic's display/to_string/render method or include w.location/file/line/col
fields) when iterating over warnings.
baml_language/crates/baml_exec/src/output.rs (2)

126-136: ⚠️ Potential issue | 🟡 Minor

Use parseable placeholders for structured help examples.

print_target_help() feeds these directly into example invocations, but ... is not valid input for class/map/list/union parameters. The generated examples are therefore misleading for exactly the cases that require JSON.

Suggested fix
     match ty {
         Ty::String { .. } => "\"value\"",
         Ty::Int { .. } => "42",
         Ty::Float { .. } => "3.14",
         Ty::Bool { .. } => "true",
         Ty::Null { .. } => "null",
         Ty::Enum(..) => "VariantName",
+        Ty::Class(..) | Ty::Map { .. } => "{}",
+        Ty::List(..) => "[]",
+        Ty::Union(..) => "null",
         _ => "...",
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/output.rs` around lines 126 - 136, The
current example_value(ty: &Ty) returns "..." for non-scalar types which is not
parseable JSON; update example_value to return valid, parseable placeholders for
structured types used by print_target_help(): add specific match arms for
Ty::Class (e.g. "{\"field\": \"value\"}"), Ty::Map (e.g. "{\"key\":
\"value\"}"), Ty::List (e.g. "[\"value\"]"), Ty::Union (e.g. "{\"Variant\":
{}}"/or a small JSON object matching union shape) and any other structured Ty
variants instead of falling back to "_ => \"...\""; ensure each returned string
is valid input for the parser that consumes the help examples so example
invocations are runnable.

58-68: ⚠️ Potential issue | 🟠 Major

Preserve user fields named __type.

This still overwrites a real instance field with the synthetic type tag, so JSON output can silently lose user data.

Suggested fix
-            if !class_name.is_empty() {
-                map.insert("__type".to_string(), serde_json::json!(class_name));
+            if !class_name.is_empty() && !map.contains_key("__type") {
+                map.insert("__type".to_string(), serde_json::json!(class_name));
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/output.rs` around lines 58 - 68, The
current serializer in BexExternalValue::Instance inserts a synthetic "__type"
into the JSON object and unconditionally overwrites any user field named
"__type"; change the logic in the block that builds the serde_json::Map (the
code that uses class_name, fields and external_to_json) so it preserves an
existing user "__type" field—e.g., check map.contains_key("__type") and if true,
store the synthetic tag under an alternate reserved key like "__baml_type" (or
skip inserting entirely); ensure the code inserts the synthetic tag only when
class_name is non-empty and does not clobber an existing "__type" entry.
🧹 Nitpick comments (2)
baml_language/crates/bex_vm_types/src/bytecode.rs (1)

861-873: Document the resolved_constants invariant after deserialization.

#[serde(skip)] on resolved_constants correctly keeps runtime HeapPtrs out of the serialized envelope, but any code path that loads a Bytecode via Deserialize will get an empty resolved_constants vector. Executing LoadConst on such a Bytecode without first calling resolve_constants(...) will silently yield the wrong value (index into an empty/stale vec). Consider a brief doc note on the field (or on Bytecode) stating that callers must invoke resolve_constants after deserialization before execution — this will help prevent subtle bugs in the new baml_exec envelope path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bex_vm_types/src/bytecode.rs` around lines 861 - 873,
Add documentation to the Bytecode type explaining the post-deserialization
invariant: because resolved_constants is marked #[serde(skip)] it will be empty
after Deserialize and must be populated by calling
Bytecode::resolve_constants(...) before executing instructions that use
LoadConst; update the doc comment on the resolved_constants field (or top-level
Bytecode struct) to state that callers loading Bytecode from Deserialize must
call resolve_constants (or an equivalent initializer) prior to execution to
avoid indexing into an empty/stale resolved_constants vector.
baml_language/crates/baml_builtins2_codegen/src/codegen.rs (1)

24-35: Add a targeted regression test for baml.sys.exit fallibility.

This behavior is important and currently not directly asserted in the tests below.

Proposed unit-test addition
@@
     #[test]
+    fn test_sys_exit_is_generated_as_fallible() {
+        let (builtins, _io_builtins, class_defs) = extract_native_builtins().unwrap();
+        let output = generate_native_trait(&builtins, &class_defs);
+
+        assert!(
+            output.contains("fn exit(code: i64) -> Result<(), VmRustFnError>;"),
+            "baml.sys.exit should be generated as fallible:\n{output}"
+        );
+    }
+
+    #[test]
     fn test_generate_produces_class_traits() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_builtins2_codegen/src/codegen.rs` around lines 24 -
35, The tests are missing a targeted regression asserting that "baml.sys.exit"
is considered fallible by is_fallible; add a unit test that calls the
is_fallible function (or the public wrapper used in tests) with the string
"baml.sys.exit" and asserts true. Place the test alongside the existing
builtins/codegen tests, name it to indicate it's a regression for baml.sys.exit
fallibility, and ensure it directly references the is_fallible behavior so
future changes to is_fallible (or its match list) will fail the test if
"baml.sys.exit" is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_exec/src/dispatch.rs`:
- Around line 12-14: Update the stale module-level comment to reflect the new
return type of dispatch_target (Result<DispatchResult>) instead of the old
Ok(true)/Ok(false) boolean contract: describe the three possible outcomes
returned via the Ok variant (DispatchResult::Ok, DispatchResult::TargetError,
DispatchResult::Exit(i64)) and indicate that infrastructure errors are still
propagated via Err; reference dispatch_target and the DispatchResult enum to
ensure the doc matches the current API.

---

Outside diff comments:
In `@baml_language/crates/baml_cli/src/run_command.rs`:
- Around line 227-246: The current flow still reads self.from.join("baml.toml")
and validates scripts even when load_and_compile_standalone(...) was used,
breaking hermetic single-file runs; move the baml.toml read/parse/validation
behind the non-standalone branch so that when target.as_ref().is_some_and(|t|
t.ends_with(".baml")) you skip reading self.from/baml.toml (e.g. set
toml_content to empty and scripts to an empty list or parse only embedded
scripts) and do not call validate_scripts on external project files; keep
collect_namespaces(&engine) and print_list(&engine, &scripts, &namespaces,
&self.output) working with the standalone scripts value.

In `@baml_language/crates/bex_vm_types/src/indexable.rs`:
- Around line 274-346: ObjectIndex currently serializes differently with and
without heap_debug, breaking the wire format for Instruction
(AllocInstance/AllocVariant/MakeClosure) and ConstValue::Object inside
PackEnvelope; make the wire format feature-invariant by adding a custom
Serialize/Deserialize impl for ObjectIndex (the #[cfg(feature = "heap_debug")]
struct) that serializes/deserializes as a single integer (the raw usize) and
reconstructs epoch = 0 on deserialize so it matches Index<ObjectKind>'s
bare-integer representation, then run serialization tests of
Program/PackEnvelope to confirm compatibility; alternatively (if you prefer
rejecting mismatches) add explicit feature-flag validation inside PackEnvelope
deserialization to fail with a clear error when heap_debug presence differs.

---

Duplicate comments:
In `@baml_language/crates/baml_cli/src/run_command.rs`:
- Around line 170-172: The verbose warning currently logs only w.message (in the
self.vlog call inside the run command after check_project_diagnostics), which
discards file/line/column context; update that call to print the diagnostic's
full rendered/display form (or its formatted string that includes location and
context) instead of just w.message so verbose mode emits the complete diagnostic
(e.g., use the diagnostic's display/to_string/render method or include
w.location/file/line/col fields) when iterating over warnings.

In `@baml_language/crates/baml_exec/src/auto_cli.rs`:
- Around line 199-203: The help text generator omits the JSON hint for union
types even though parse_cli_value() handles Ty::Union via JSON parsing; update
the type hint logic where `type_hint` is computed (the match on `ty`) to include
`Ty::Union(..)` in the same arm as `Ty::Class`, `Ty::Map`, and `Ty::List` (i.e.,
return " (pass JSON)"), so union parameters show the correct usage hint in help
output.

In `@baml_language/crates/baml_exec/src/output.rs`:
- Around line 126-136: The current example_value(ty: &Ty) returns "..." for
non-scalar types which is not parseable JSON; update example_value to return
valid, parseable placeholders for structured types used by print_target_help():
add specific match arms for Ty::Class (e.g. "{\"field\": \"value\"}"), Ty::Map
(e.g. "{\"key\": \"value\"}"), Ty::List (e.g. "[\"value\"]"), Ty::Union (e.g.
"{\"Variant\": {}}"/or a small JSON object matching union shape) and any other
structured Ty variants instead of falling back to "_ => \"...\""; ensure each
returned string is valid input for the parser that consumes the help examples so
example invocations are runnable.
- Around line 58-68: The current serializer in BexExternalValue::Instance
inserts a synthetic "__type" into the JSON object and unconditionally overwrites
any user field named "__type"; change the logic in the block that builds the
serde_json::Map (the code that uses class_name, fields and external_to_json) so
it preserves an existing user "__type" field—e.g., check
map.contains_key("__type") and if true, store the synthetic tag under an
alternate reserved key like "__baml_type" (or skip inserting entirely); ensure
the code inserts the synthetic tag only when class_name is non-empty and does
not clobber an existing "__type" entry.

In `@baml_language/stow.toml`:
- Around line 88-89: The reason string for the bex_vm_types allow-list is stale:
update the "reason" value used with allowed_crates to reflect the added crates
(baml_cli, baml_exec, baml_pack_host) so the message matches the actual
allow-list; modify the reason text (near allowed_crates and reason) to mention
the surface execution/CLI/host crates in addition to baml_compiler_emit and
baml_project so it accurately explains why those crates may depend on
bex_vm_types.

---

Nitpick comments:
In `@baml_language/crates/baml_builtins2_codegen/src/codegen.rs`:
- Around line 24-35: The tests are missing a targeted regression asserting that
"baml.sys.exit" is considered fallible by is_fallible; add a unit test that
calls the is_fallible function (or the public wrapper used in tests) with the
string "baml.sys.exit" and asserts true. Place the test alongside the existing
builtins/codegen tests, name it to indicate it's a regression for baml.sys.exit
fallibility, and ensure it directly references the is_fallible behavior so
future changes to is_fallible (or its match list) will fail the test if
"baml.sys.exit" is removed.

In `@baml_language/crates/bex_vm_types/src/bytecode.rs`:
- Around line 861-873: Add documentation to the Bytecode type explaining the
post-deserialization invariant: because resolved_constants is marked
#[serde(skip)] it will be empty after Deserialize and must be populated by
calling Bytecode::resolve_constants(...) before executing instructions that use
LoadConst; update the doc comment on the resolved_constants field (or top-level
Bytecode struct) to state that callers loading Bytecode from Deserialize must
call resolve_constants (or an equivalent initializer) prior to execution to
avoid indexing into an empty/stale resolved_constants vector.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e73cdfc0-8fdb-4352-9ea7-3281c117e303

📥 Commits

Reviewing files that changed from the base of the PR and between 3d7023e and c3d19af.

⛔ Files ignored due to path filters (8)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (40)
  • baml_language/Cargo.toml
  • baml_language/crates/baml_base/Cargo.toml
  • baml_language/crates/baml_base/src/attr.rs
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_panics/panics.baml
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.baml
  • baml_language/crates/baml_builtins2_codegen/src/codegen.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_cli/Cargo.toml
  • baml_language/crates/baml_cli/src/commands.rs
  • baml_language/crates/baml_cli/src/lib.rs
  • baml_language/crates/baml_cli/src/main.rs
  • baml_language/crates/baml_cli/src/pack_command.rs
  • baml_language/crates/baml_cli/src/project_load.rs
  • baml_language/crates/baml_cli/src/run_command.rs
  • baml_language/crates/baml_cli/tests/common/mod.rs
  • baml_language/crates/baml_cli/tests/pack_e2e.rs
  • baml_language/crates/baml_exec/Cargo.toml
  • baml_language/crates/baml_exec/src/auto_cli.rs
  • baml_language/crates/baml_exec/src/dispatch.rs
  • baml_language/crates/baml_exec/src/envelope.rs
  • baml_language/crates/baml_exec/src/json_coerce.rs
  • baml_language/crates/baml_exec/src/lib.rs
  • baml_language/crates/baml_exec/src/output.rs
  • baml_language/crates/baml_pack_host/Cargo.toml
  • baml_language/crates/baml_pack_host/build.rs
  • baml_language/crates/baml_pack_host/src/main.rs
  • baml_language/crates/baml_tests/tests/exceptions.rs
  • baml_language/crates/baml_type/Cargo.toml
  • baml_language/crates/baml_type/src/lib.rs
  • baml_language/crates/bex_engine/src/lib.rs
  • baml_language/crates/bex_vm/src/errors.rs
  • baml_language/crates/bex_vm/src/package_baml/sys.rs
  • baml_language/crates/bex_vm/src/vm.rs
  • baml_language/crates/bex_vm_types/Cargo.toml
  • baml_language/crates/bex_vm_types/src/bytecode.rs
  • baml_language/crates/bex_vm_types/src/heap_ptr.rs
  • baml_language/crates/bex_vm_types/src/indexable.rs
  • baml_language/crates/bex_vm_types/src/types.rs
  • baml_language/stow.toml
✅ Files skipped from review due to trivial changes (15)
  • baml_language/crates/baml_type/Cargo.toml
  • baml_language/crates/bex_vm_types/Cargo.toml
  • baml_language/crates/baml_base/Cargo.toml
  • baml_language/crates/baml_pack_host/build.rs
  • baml_language/crates/baml_exec/src/envelope.rs
  • baml_language/crates/baml_exec/Cargo.toml
  • baml_language/crates/baml_cli/src/main.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_pack_host/Cargo.toml
  • baml_language/crates/baml_base/src/attr.rs
  • baml_language/crates/baml_base/src/core_types.rs
  • baml_language/crates/baml_type/src/lib.rs
  • baml_language/crates/baml_cli/tests/common/mod.rs
  • baml_language/crates/baml_pack_host/src/main.rs
  • baml_language/crates/baml_cli/Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (8)
  • baml_language/crates/bex_vm/src/errors.rs
  • baml_language/crates/baml_cli/src/commands.rs
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_sys/sys.baml
  • baml_language/crates/bex_vm_types/src/heap_ptr.rs
  • baml_language/crates/baml_tests/tests/exceptions.rs
  • baml_language/crates/baml_cli/src/pack_command.rs
  • baml_language/crates/bex_vm_types/src/types.rs
  • baml_language/crates/baml_cli/tests/pack_e2e.rs

Comment on lines +12 to +14
// Returns `Ok(true)` on success, `Ok(false)` on a caller-visible target
// error (e.g. missing argument, target runtime error), and propagates
// infrastructure errors via `Err`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale module doc — Ok(true)/Ok(false) no longer reflects the signature.

dispatch_target now returns Result<DispatchResult> (Ok / TargetError / Exit(i64)), but the module-level comment still describes the old boolean return contract. Worth updating so future readers don't conflate "target error" with the Err arm.

📝 Proposed doc fix
-// Returns `Ok(true)` on success, `Ok(false)` on a caller-visible target
-// error (e.g. missing argument, target runtime error), and propagates
-// infrastructure errors via `Err`.
+// Returns `Ok(DispatchResult::Ok)` on success, `Ok(DispatchResult::TargetError)`
+// on a caller-visible target error (e.g. missing argument, target runtime
+// error), `Ok(DispatchResult::Exit(code))` when the target called
+// `baml.sys.exit(code)`, and propagates infrastructure errors via `Err`.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Returns `Ok(true)` on success, `Ok(false)` on a caller-visible target
// error (e.g. missing argument, target runtime error), and propagates
// infrastructure errors via `Err`.
// Returns `Ok(DispatchResult::Ok)` on success, `Ok(DispatchResult::TargetError)`
// on a caller-visible target error (e.g. missing argument, target runtime
// error), `Ok(DispatchResult::Exit(code))` when the target called
// `baml.sys.exit(code)`, and propagates infrastructure errors via `Err`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_exec/src/dispatch.rs` around lines 12 - 14, Update
the stale module-level comment to reflect the new return type of dispatch_target
(Result<DispatchResult>) instead of the old Ok(true)/Ok(false) boolean contract:
describe the three possible outcomes returned via the Ok variant
(DispatchResult::Ok, DispatchResult::TargetError, DispatchResult::Exit(i64)) and
indicate that infrastructure errors are still propagated via Err; reference
dispatch_target and the DispatchResult enum to ensure the doc matches the
current API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant