Skip to content

Add baml.fs.read_dir, mkdir, baml.glob namespace, and fix parser string literal bug#3400

Open
aaronvg wants to merge 3 commits intocanaryfrom
aaron/add-readdir-function-to-bamlfs-module
Open

Add baml.fs.read_dir, mkdir, baml.glob namespace, and fix parser string literal bug#3400
aaronvg wants to merge 3 commits intocanaryfrom
aaron/add-readdir-function-to-bamlfs-module

Conversation

@aaronvg
Copy link
Copy Markdown
Contributor

@aaronvg aaronvg commented Apr 23, 2026

Artifacts | Task

What problems was I solving

BAML had no built-in way to list directory contents or create directories from user code. Users needed read_dir, mkdir, and glob pattern matching (baml.glob.new("**/*.ts").scan(root)) to build file-processing workflows. Additionally, the BAML parser had a long-standing bug where */ inside string literals (e.g. "**/*.ts") was incorrectly tokenized as a block-comment end, making glob patterns with recursive wildcards impossible to express.

What user-facing changes did I ship

  • baml.fs.read_dir(path) - Lists directory entries, returning DirEntry[] with name, is_dir, is_file, is_symlink fields
  • baml.fs.mkdir(path, options) - Creates directories with optional recursive: true for nested creation
  • baml.glob.new(pattern) - Creates a glob pattern object supporting *, **, ?, [...], {a,b} syntax
  • Glob.scan(root) / Glob.matches(path) - Scans filesystem or tests paths against glob patterns
  • Parser fix - String literals containing */ or // now parse correctly (previously broke block-comment tokenization)

How I implemented it

Codegen infrastructure (Phase 1)

  • crates/baml_builtins2_codegen/src/codegen_io.rs - Added emit_into_result_call helper for Vec<ClassName> return types that can't implement AsBexExternalValue due to orphan rules. Uses new into_result_mapped on SysOpOutput to convert element-by-element. Added BamlType::List arm to emit_result_conversion for deserializing arrays back from BexExternalValue::Array.
  • crates/sys_types/src/lib.rs - Added SysOpOutput::into_result_mapped() for custom value mapping when the return type doesn't directly implement AsBexExternalValue.

BAML declarations (Phase 2-3)

  • crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml - Added DirEntry class, MkdirOptions class, read_dir function, and mkdir function declarations.
  • crates/baml_builtins2/baml_std/baml/ns_glob/glob.baml - New file: ScanOptions class, Glob class with _handle, scan/matches methods, and new function.
  • crates/baml_builtins2/src/lib.rs - Registered ns_glob/glob.baml as a builtin module.

Native implementations (Phase 2-3)

  • crates/sys_native/src/io_impls.rs - read_dir using tokio::fs::read_dir, mkdir using tokio::fs::create_dir[_all], glob new/scan/matches using walkdir + custom GlobPattern.
  • crates/sys_native/src/glob_utils.rs - New file: GlobPattern struct converting glob syntax to regex, supporting **, *, ?, [...], {a,b} patterns. Includes unit tests.
  • crates/sys_ops/src/lib.rs - DefaultIoOps stubs returning Unsupported for all new operations. IoSysOpsBuilder wiring for with_fs_instance (read_dir, mkdir) and new with_glob_instance method.

WASM implementations (Phase 4)

  • crates/bridge_wasm/src/wasm_io_fs.rs - New file: WasmIoFs delegating fs operations to the JS VFS layer.
  • crates/bridge_wasm/src/wasm_io_glob.rs - New file: WASM glob implementation with inline glob-to-regex converter.
  • crates/bridge_wasm/src/wasm_fs.rs - Added pub(crate) accessor methods on WasmVfs for cross-module use. Changed WasmFs::new to accept Arc<WasmVfs>.
  • crates/bridge_wasm/src/lib.rs - Wired .with_fs_instance() and .with_glob_instance() into SysOpsBuilder.

Parser bugfix

  • crates/baml_compiler_parser/src/parser.rs - Added at_end_raw() method. Changed parse_string, parse_byte_string, and parse_raw_string_content to use at_end_raw() instead of at_end(), preventing */ and // inside string literals from being misinterpreted as comment delimiters.

Tests

  • crates/baml_tests/tests/fs.rs - 8 new tests: fs_read_dir_returns_entries, fs_read_dir_type_flags, fs_read_dir_nonexistent_errors, fs_read_dir_empty_dir, fs_mkdir_creates_directory, fs_mkdir_recursive_creates_parents, fs_mkdir_non_recursive_errors_when_parent_missing, fs_mkdir_recursive_is_idempotent.
  • crates/baml_tests/tests/glob.rs - 9 glob tests including glob_matches_recursive_wildcard (exercises the parser fix with "**/*.ts").
  • crates/baml_tests/projects/builtin_io/fs_ops.baml - Added ReadDir() function for CST/pipeline snapshot coverage.
  • All snapshot tests updated across __baml_std__, builtin_io, and other test projects.

Deviations from the plan

Implemented as planned

  • All four phases (codegen fix, fs operations, glob operations, WASM bridge) were implemented as outlined
  • DirEntry, MkdirOptions, read_dir, mkdir declarations and implementations match the plan
  • GlobPattern with glob-to-regex conversion, walkdir-based scan, native + WASM implementations

Deviations/surprises

  • Glob.match renamed to Glob.matches to avoid Rust keyword escaping (r#match) throughout the codebase
  • Added emit_into_result_call codegen helper for the serialization direction (Vec<owned::T> -> BexExternalValue::Array), which the plan only addressed for the deserialization direction
  • WASM WasmVfs sharing via Arc instead of clone, requiring a signature change to WasmFs::new
  • WasmVfs accessor methods added as pub(crate) wrappers since wasm_bindgen-generated methods are module-private
  • GlobPattern drops original field — only stores the compiled Regex, not the source pattern string
  • Test paths use fully-qualified BAML names (baml.fs.DirEntry[], baml.fs.MkdirOptions) rather than bare names

Additions not in plan

  • Parser bugfix for */ inside string literals (at_end_raw() in parse_string, parse_byte_string, parse_raw_string_content)
  • glob_utils.rs unit tests (4 tests for the glob-to-regex converter)
  • ReadDir() snapshot test function in builtin_io/fs_ops.baml for full pipeline coverage
  • glob_utils.rs handles [!...] character class negation (POSIX-compatible edge case)

Items planned but not implemented

  • Snapshot assertion in fs_read_dir_returns_entries — omitted in favor of runtime-only assertions
  • Recursive **/*.txt glob scan integration test — recursive matching covered by glob_utils.rs unit tests and glob_matches_recursive_wildcard but not a full scan integration test

How to verify it

Setup

git fetch
git checkout aaron/add-readdir-function-to-bamlfs-module
cd baml_language

Automated Tests

# All snapshot tests (1351 tests)
cargo test -p baml_tests --lib

# fs integration tests (read_dir + mkdir)
cargo test -p baml_tests --test fs

# Glob integration tests (including **/*.ts parser fix)
cargo test -p baml_tests --test glob

# Full test suite
cargo test -p baml_tests

Manual Testing

  • Verify baml.fs.read_dir returns correct entries with type flags
  • Verify baml.fs.mkdir creates nested directories with recursive: true
  • Verify baml.glob.new("**/*.ts").matches("src/index.ts") returns true
  • Verify string literals containing */ parse correctly (was previously broken)

Description for the changelog

Add baml.fs.read_dir, baml.fs.mkdir, and baml.glob namespace with pattern matching and filesystem scanning. Fix parser bug where */ inside string literals was incorrectly treated as block-comment end.


Note

Medium Risk
Adds new filesystem- and glob-related sysops across native and WASM runtimes and changes parser behavior for string literals, which could affect file I/O behavior and parsing edge-cases. Coverage is improved with new integration/unit tests and updated snapshots, but the runtime-wiring/codegen changes broaden the blast radius.

Overview
Adds new BAML builtins for filesystem and glob workflows: baml.fs.read_dir() (returns DirEntry[]) and baml.fs.mkdir() (with MkdirOptions.recursive), plus a new baml.glob namespace with new(pattern), Glob.scan(...), and Glob.matches(...).

Implements these ops in sys_native (Tokio + walkdir + glob-to-regex via regex) and wires them through sys_ops/builder plumbing; bridge_wasm gains JS-VFS-backed fs support and a WASM glob implementation, with WasmVfs sharing switched to Arc for reuse across namespaces.

Fixes a parser bug where comment delimiters like */ or // inside string/byte/raw string literals could prematurely terminate parsing by using raw-token end checks. Updates codegen/value-conversion to support returning lists of generated classes from IO ops, and adds/updates extensive tests and snapshots for the new APIs.

Reviewed by Cursor Bugbot for commit 68d9108. Bugbot is set up for automated code reviews on this repo. Configure here.

Summary by CodeRabbit

  • New Features

    • Added filesystem directory APIs: read_dir() returns directory entries (name, is_dir/is_file/is_symlink) and mkdir(path, { recursive }) supports recursive creation.
    • Added globbing APIs: Glob type (constructor), ScanOptions, Glob.scan() to list matches and Glob.matches() to test paths.
  • Tests

    • Added comprehensive native and WASM tests covering filesystem and glob behavior.

…ser string literal bug

Implement directory listing (read_dir) and creation (mkdir) in the baml.fs
module, add a new baml.glob namespace with pattern matching and filesystem
scanning, and fix a parser bug where `*/` inside string literals was
incorrectly tokenized as a block-comment end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
beps Ready Ready Preview, Comment Apr 28, 2026 7:22pm
promptfiddle Ready Ready Preview, Comment Apr 28, 2026 7:22pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

📝 Walkthrough

Walkthrough

Adds filesystem directory APIs (DirEntry, MkdirOptions, read_dir, mkdir) and a globbing API (ScanOptions, Glob with new/scan/matches); implements native and WASM adapters, updates sys-op result mapping and codegen for list returns, parser raw-string fixes, and adds extensive tests for FS and glob behavior.

Changes

Cohort / File(s) Summary
BAML std defs
baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml, baml_language/crates/baml_builtins2/baml_std/baml/ns_glob/glob.baml, baml_language/crates/baml_builtins2/src/lib.rs
Adds DirEntry, MkdirOptions, ScanOptions, Glob types; exports read_dir/mkdir and glob.new/methods; registers glob builtin.
Codegen for IO results
baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
Introduces emit_into_result_call and per-element conversion helpers; supports List/Named returns and deeper element conversions for generated sys-op glue.
Parser fixes
baml_language/crates/baml_compiler_parser/src/parser.rs
Adds raw-EOF helpers and switches string/raw-string/Jinja parsing to raw bumping to avoid mis-parsing comment-like sequences; adds unit tests.
Native IO & glob
baml_language/crates/sys_native/src/glob_utils.rs, baml_language/crates/sys_native/src/io_impls.rs, baml_language/crates/sys_native/src/lib.rs, baml_language/crates/sys_native/Cargo.toml
Adds glob-to-regex compiler, implements read_dir and mkdir (recursive option) and glob scan/match using WalkDir; adds regex and walkdir deps.
WASM bridge & adapters
baml_language/crates/bridge_wasm/src/lib.rs, .../wasm_fs.rs, .../wasm_io_fs.rs, .../wasm_io_glob.rs, baml_language/crates/bridge_wasm/Cargo.toml, .../tests/wasm_fs_glob.rs
Adds WasmVfs helpers, WasmIoFs and WasmIoGlob adapters that call JS VFS callbacks (readDir, readMany, etc.); refactors WasmFs::new to accept Arc<WasmVfs>; adds tests and regex dep.
Sys-op infra
baml_language/crates/sys_ops/src/lib.rs, baml_language/crates/sys_types/src/lib.rs
Extends DefaultIoOps stubs for fs/glob hooks; adds builder APIs with_glob_instance/with_glob; adds into_result_mapped() for SysOpOutput result mapping.
Tests & typecheck fixtures
baml_language/crates/baml_tests/projects/builtin_io/fs_ops.baml, baml_language/crates/baml_tests/tests/fs.rs, baml_language/crates/baml_tests/tests/glob.rs
Adds typecheck fixture and extensive unit/integration tests covering read_dir, mkdir (recursive behavior, exists handling) and glob matches/scan behaviors.
Bridge WASM VFS wiring
baml_language/crates/bridge_wasm/src/wasm_fs.rs
Refactors to accept Arc<WasmVfs> and exposes crate-visible VFS helpers for JS binding calls.
Misc Cargo
baml_language/crates/bridge_wasm/Cargo.toml, baml_language/crates/sys_native/Cargo.toml
Adds regex (workspace) to bridge_wasm and regex/walkdir to sys_native dependencies.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BAML_Runtime as "BAML Runtime"
    participant SysOps as "SysOps (io namespace)"
    participant NativeFS as "Native FS / VFS"

    Client->>BAML_Runtime: call baml.fs.read_dir(path)
    BAML_Runtime->>SysOps: invoke io::read_dir(path)
    SysOps->>NativeFS: read_dir / metadata calls
    NativeFS-->>SysOps: Vec<DirEntry>
    SysOps->>BAML_Runtime: SysOpResult (array)
    BAML_Runtime->>Client: return DirEntry[] (BexExternalValue::Array)
Loading
sequenceDiagram
    participant Client
    participant BAML_Runtime as "BAML Runtime"
    participant Glob_NS as "Glob Namespace"
    participant VFS as "Filesystem / readMany"
    participant Regex as "Regex Engine"

    Client->>BAML_Runtime: glob.new(pattern)
    BAML_Runtime->>Glob_NS: create handle (store pattern)
    Glob_NS-->>BAML_Runtime: Glob handle

    Client->>BAML_Runtime: glob.scan(root/options)
    BAML_Runtime->>VFS: enumerate candidates (walk/readMany)
    VFS-->>BAML_Runtime: candidate paths
    BAML_Runtime->>Regex: compile/apply glob_to_regex
    Regex-->>BAML_Runtime: match results
    BAML_Runtime->>Client: filtered path list (string[])
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through paths both near and far,

read_dir named leaves like tiny star.
Mkdir dug burrows, parent paths made neat,
Glob chased patterns on little feet.
A rabbit hums — the files all meet!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.34% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main additions: filesystem APIs (read_dir, mkdir), glob namespace, and a parser string literal bug fix. It is concise and covers all significant changes in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aaron/add-readdir-function-to-bamlfs-module

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 68d9108. Configure here.

re.push_str(&alts);
if i < bytes.len() {
i += 1; // consume closing '}'
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glob wildcards inside brace alternation produce invalid regex

Medium Severity

The {...} alternation handler in glob_to_regex treats all characters inside braces via the default ch => arm, which only escapes a fixed set of regex metacharacters. Glob wildcards like * and ? are not in that escape set and are not converted to their glob equivalents ([^/]*, [^/]). Since * and ? are regex quantifiers, a pattern like {*.txt,*.rs} produces the regex (?:*\.txt|*\.rs) where * has no preceding expression, causing regex::Regex::new to fail at runtime with an "invalid regex" error.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 68d9108. Configure here.

SysOpOutput::err(OpErrorKind::Other(msg))
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WASM glob scan ignores root directory parameter

Medium Severity

The WASM Glob.scan() implementation ignores the _root parameter entirely and delegates only the glob pattern to vfs_read_many. The native implementation uses root as the walk directory, extracts ScanOptions fields (cwd, dot, absolute, follow_symlinks, only_files, etc.), and filters results accordingly. In WASM, calling g.scan("/some/dir") disregards the directory argument, producing incorrect results.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 68d9108. Configure here.

} else {
Ok(regex)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated glob-to-regex code across two crates

Low Severity

The glob_to_regex function is fully duplicated between sys_native/src/glob_utils.rs and bridge_wasm/src/wasm_io_glob.rs. Both copies share the same brace-alternation wildcard bug, illustrating the maintenance risk: any fix to one copy must be manually replicated to the other. A shared utility crate (as the inline comment itself suggests) would eliminate this duplication.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 68d9108. Configure here.

Ok(()) => SysOpOutput::ok(()),
Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))),
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WASM mkdir ignores recursive option flag

Low Severity

The WASM mkdir implementation ignores _options: owned::fs::MkdirOptions (including recursive: bool), always delegating to vfs_create_dir. The native implementation checks options.recursive to choose between create_dir and create_dir_all. With recursive: false, WASM may succeed in creating nested paths when it semantically shouldn't, breaking behavioral parity with native.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 68d9108. Configure here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
baml_language/crates/baml_compiler_parser/src/parser.rs (1)

1506-1536: ⚠️ Potential issue | 🟠 Major

Propagate raw EOF checks into raw-string content helpers.

Line 1506 uses at_end_raw(), but the delegated raw-string consumers still loop on at_end(). That means content like #"//"# can still be treated as a skipped comment before bump_raw() consumes it, causing an unclosed raw string / iteration-limit error.

🐛 Proposed fix
-            while !p.at_end() && depth > 0 {
+            while !p.at_end_raw() && depth > 0 {
                 if p.at_raw(TokenKind::Quote)
                     && p.count_consecutive_hashes_after_quote() == opening_hashes
                 {
@@
-            while !p.at_end() {
+            while !p.at_end_raw() {
                 if p.at_raw(TokenKind::Quote)
                     && p.count_consecutive_hashes_after_quote() == opening_hashes
                 {
@@
-            while !p.at_end() {
+            while !p.at_end_raw() {
                 if p.at_raw(TokenKind::Quote)
                     && p.count_consecutive_hashes_after_quote() == opening_hashes
                 {
@@
-            while !p.at_end() {
+            while !p.at_end_raw() {
                 // Check for closing delimiter
                 if p.at_raw(TokenKind::Quote) {

Please also add a parser unit test for raw strings containing // and */, e.g. #"//"# and #"*/"#. As per coding guidelines, **/*.rs: “Prefer writing Rust unit tests over integration tests where possible” and “Always run cargo test --lib if you changed any Rust code”.

Also applies to: 1567-1593, 1609-1624, 1635-1650, 1660-1675

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_compiler_parser/src/parser.rs` around lines 1506 -
1536, The delegated raw-string content consumers (parse_jinja_expression,
parse_jinja_statement, parse_jinja_comment, parse_prompt_text and any helpers
they call) must check at_end_raw() instead of at_end() so they stop on the
raw-string EOF delimiter semantics; update those methods to use
at_end_raw()/bump_raw() (or propagate a raw-context flag) wherever they
currently loop on at_end(), and ensure any places that currently call bump()
inside raw parsing use bump_raw(); then add unit tests exercising raw strings
that contain comment sequences (e.g. #"//"# and #"*/"#) to confirm the parser
does not treat those as comments and that raw strings close correctly.
🧹 Nitpick comments (1)
baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml (1)

64-82: LGTM.

API surface matches the native + WASM implementations and the new tests. One optional thought: making MkdirOptions.recursive optional with a documented default would avoid forcing every caller to spell MkdirOptions { recursive: false }, but that's a UX call and can be deferred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml` around lines
64 - 82, Make MkdirOptions.recursive optional and default to false so callers
don't have to always pass MkdirOptions { recursive: false }; update the
MkdirOptions definition (symbol MkdirOptions and its field recursive) to be
optional/nullable and adjust the mkdir binding/implementation (symbol mkdir) to
treat a missing recursive as false and document the default in the API
comment/definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs`:
- Around line 2144-2210: emit_result_conversion_inner currently falls back to
Ok(__val) for non-scalar/Named element types which lets Vec<BexExternalValue>
flow through and causes type mismatches; update emit_result_conversion_inner to
handle the same BamlType variants as emit_result_conversion (at minimum Float
and Uint8Array and recursively handle Optional, Map and nested List) or extract
the scalar/Named/Optional conversion into a shared helper used by both
emit_result_conversion and emit_result_conversion_inner so each list element
maps to the correct Rust type (and keep the list-specific error messages like
"expected X in list, got ..."); refer to the functions
emit_result_conversion_inner and emit_result_conversion and the BamlType
variants (Float, Uint8Array, Optional, Map, List, Named, String, Int, Bool) when
implementing the fixes.

In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs`:
- Around line 254-277: In read_dir, normalize the join by trimming a trailing
'/' from path (or use a small join helper) before building full so
format!("{path}/{name}") cannot produce double slashes; when calling
self.vfs().vfs_metadata(&full) treat Err(_) as unknown rather than a file (set
is_dir=false, is_file=false, is_symlink=false) and emit a log/warn with the
metadata error so failures (permission, symlink, etc.) aren’t silently
misclassified; update the loop around self.vfs().vfs_read_dir, the full
construction, the vfs_metadata match, and the owned::fs::DirEntry population
accordingly in read_dir.
- Around line 279-293: The mkdir implementation currently ignores
owned::fs::MkdirOptions; update the mkdir function in wasm_io_fs.rs to enforce
the documented semantics: if options.recursive is false, first check the parent
path exists (using vfs().vfs_exists or vfs().vfs_metadata) and ensure the target
does not already exist, returning an appropriate OpError when preconditions
fail, then call vfs_create_dir; if options.recursive is true, allow idempotency
by treating "already exists" as success (probe target with vfs_exists and return
SysOpOutput::ok(()) if it exists) and otherwise call vfs_create_dir to create
parents as the host supports or create missing parents iteratively via
vfs_create_dir on ancestor paths; keep error wrapping using the existing
SysOpOutput::err pattern.

In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs`:
- Around line 61-125: scan currently relies solely on WasmVfs.vfs_read_many (TS
globToRegex) while matches uses Rust glob_to_regex, causing inconsistent results
for advanced patterns; fix by compiling the same pattern with glob_to_regex
inside WasmIoGlob::scan (same as in matches) and filter the paths returned from
self.vfs.vfs_read_many(&pattern) with re.is_match(path) before returning,
returning an OpErrorKind::Other if glob_to_regex fails so scan and matches share
identical matching semantics (refer to functions WasmIoGlob::scan,
WasmIoGlob::matches, glob_to_regex and method WasmVfs.vfs_read_many).

In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 380-401: The test runtime_creates_with_fs_and_glob_wired is a
no-op: it only Reflect::get's the BamlWasmRuntime constructor and drops values,
so it never calls BamlWasmRuntime::create or asserts behavior; either update the
test to actually call BamlWasmRuntime::create(mock_callbacks(), mock_vfs(...))
and assert the result is Ok (and optionally perform one FS/glob operation to
exercise IoNamespaceFs/IoNamespaceGlob), or delete the test entirely and rely on
the existing mock_vfs/mock_callbacks tests that already assert contract
behavior; locate the test function runtime_creates_with_fs_and_glob_wired and
replace the no-op with a real create+assert or remove it.

In `@baml_language/crates/sys_native/src/glob_utils.rs`:
- Around line 85-118: The brace-handling code in glob_utils.rs (the '{' match
branch) currently treats nested braces as literal characters causing patterns
like "{a,{b,c}}.txt" to compile to incorrect regex; update the handler to either
return a parse error for nested braces or recursively parse inner brace groups
and emit nested non-capturing alternation groups (i.e., produce
"(?:a|(?:b|c))"), ensure commas at any depth are converted to '|' for the
correct depth when recursively building the alts string, and mirror the same fix
in the duplicate implementation in bridge_wasm/src/wasm_io_glob.rs; add unit
tests using GlobPattern::new for nested_alternation and negation as suggested to
validate both behaviors.

In `@baml_language/crates/sys_native/src/io_impls.rs`:
- Around line 567-641: The ScanOptions parsing in the SysOpOutput::async_op
block duplicates six identical fields.get(...).and_then(...).unwrap_or(...)
patterns and misuses throw_error_on_broken_symlink: extract string/bool fields
via small helpers (e.g., get_string(fields, "cwd", ".") and get_bool(fields,
"dot", false)) and replace the repeated matches in the root match arm; then
adjust the walkdir error handling so throw_error_on_broken_symlink only triggers
for genuine broken-symlink errors by inspecting walkdir::Error::io_error() and
matching io::ErrorKind::NotFound (or else rename the flag to throw_on_walk_error
if you intend to cover all walk errors), updating the throw_on_broken variable
and its checks accordingly.

---

Outside diff comments:
In `@baml_language/crates/baml_compiler_parser/src/parser.rs`:
- Around line 1506-1536: The delegated raw-string content consumers
(parse_jinja_expression, parse_jinja_statement, parse_jinja_comment,
parse_prompt_text and any helpers they call) must check at_end_raw() instead of
at_end() so they stop on the raw-string EOF delimiter semantics; update those
methods to use at_end_raw()/bump_raw() (or propagate a raw-context flag)
wherever they currently loop on at_end(), and ensure any places that currently
call bump() inside raw parsing use bump_raw(); then add unit tests exercising
raw strings that contain comment sequences (e.g. #"//"# and #"*/"#) to confirm
the parser does not treat those as comments and that raw strings close
correctly.

---

Nitpick comments:
In `@baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml`:
- Around line 64-82: Make MkdirOptions.recursive optional and default to false
so callers don't have to always pass MkdirOptions { recursive: false }; update
the MkdirOptions definition (symbol MkdirOptions and its field recursive) to be
optional/nullable and adjust the mkdir binding/implementation (symbol mkdir) to
treat a missing recursive as false and document the default in the API
comment/definition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8281391c-019c-4277-8b82-3bf1b468d46a

📥 Commits

Reviewing files that changed from the base of the PR and between ea9547a and 68d9108.

⛔ Files ignored due to path filters (74)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__01_lexer__fs_ops.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__02_parser__fs_ops.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__10_formatter__fs_ops.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (20)
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml
  • baml_language/crates/baml_builtins2/baml_std/baml/ns_glob/glob.baml
  • baml_language/crates/baml_builtins2/src/lib.rs
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_compiler_parser/src/parser.rs
  • baml_language/crates/baml_tests/projects/builtin_io/fs_ops.baml
  • baml_language/crates/baml_tests/tests/fs.rs
  • baml_language/crates/baml_tests/tests/glob.rs
  • baml_language/crates/bridge_wasm/Cargo.toml
  • baml_language/crates/bridge_wasm/src/lib.rs
  • baml_language/crates/bridge_wasm/src/wasm_fs.rs
  • baml_language/crates/bridge_wasm/src/wasm_io_fs.rs
  • baml_language/crates/bridge_wasm/src/wasm_io_glob.rs
  • baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs
  • baml_language/crates/sys_native/Cargo.toml
  • baml_language/crates/sys_native/src/glob_utils.rs
  • baml_language/crates/sys_native/src/io_impls.rs
  • baml_language/crates/sys_native/src/lib.rs
  • baml_language/crates/sys_ops/src/lib.rs
  • baml_language/crates/sys_types/src/lib.rs

Comment thread baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs Outdated
Comment on lines +254 to +277
match self.vfs().vfs_read_dir(&path) {
Ok(arr) => {
let mut entries = Vec::new();
for v in arr.iter() {
let Some(name) = v.as_string() else { continue };
// WasmVfs.readDir returns string[] of entry names.
// Probe metadata to distinguish files from directories.
let full = format!("{path}/{name}");
let (is_dir, is_file) = match self.vfs().vfs_metadata(&full) {
Ok(meta) => (meta.file_type == "directory", meta.file_type == "file"),
Err(_) => (false, true),
};
entries.push(owned::fs::DirEntry {
name,
is_dir,
is_file,
is_symlink: false,
});
}
SysOpOutput::ok(entries)
}
Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

read_dir has a couple of rough edges.

  • Line 261: format!("{path}/{name}") can produce "//name" when path ends with / (common for roots like /). Most JS VFS implementations will tolerate this, but worth normalizing (e.g., trim a trailing / off path, or use a small join helper).
  • Line 264: On vfs_metadata failure the entry is classified as (is_dir=false, is_file=true). That silently mislabels symlinks / permission-denied entries as regular files and also leaves is_symlink hard-coded to false with no path to ever set it true (the WasmVfs fileType protocol only knows "file" / "directory"). Consider at least logging the metadata error, and/or extending the JS contract to report "symlink".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 254 - 277,
In read_dir, normalize the join by trimming a trailing '/' from path (or use a
small join helper) before building full so format!("{path}/{name}") cannot
produce double slashes; when calling self.vfs().vfs_metadata(&full) treat Err(_)
as unknown rather than a file (set is_dir=false, is_file=false,
is_symlink=false) and emit a log/warn with the metadata error so failures
(permission, symlink, etc.) aren’t silently misclassified; update the loop
around self.vfs().vfs_read_dir, the full construction, the vfs_metadata match,
and the owned::fs::DirEntry population accordingly in read_dir.

Comment on lines +279 to +293
fn mkdir(
&self,
_h: &Arc<BexHeap>,
_c: CallId,
path: String,
_options: owned::fs::MkdirOptions,
_ctx: &SysOpContext,
) -> SysOpOutput<()> {
// WasmVfs.createDir doesn't expose a recursive flag — the JS host
// is expected to handle recursive creation transparently.
match self.vfs().vfs_create_dir(&path) {
Ok(()) => SysOpOutput::ok(()),
Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

mkdir silently ignores options.recursive, diverging from the documented API.

The ns_fs/fs.baml contract for mkdir(path, options) — and the native implementation — require:

  • recursive: false → error if any parent is missing, and error if the target already exists.
  • recursive: true → create parents as needed, idempotent when target exists.

Here options is dropped on the floor and every call is forwarded to vfs_create_dir unconditionally. If a JS host implements createDir as "create this path, making parents if needed" (as your mock VFS does in tests/wasm_fs_glob.rs), then in WASM:

  • baml.fs.mkdir("/no/parent", MkdirOptions { recursive: false }) will succeed instead of erroring.
  • baml.fs.mkdir(existing, MkdirOptions { recursive: false }) will likely succeed instead of erroring.

That contradicts fs_mkdir_non_recursive_errors_when_parent_missing / fs_mkdir_recursive_is_idempotent in baml_tests/tests/fs.rs and makes baml.fs.mkdir behave differently across targets. Consider either (a) enforcing the semantics in Rust (probe with vfs_exists / vfs_metadata on parent and target before calling vfs_create_dir), or (b) extending the WasmVfs JS interface to accept a recursive flag and delegating enforcement there.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 279 - 293,
The mkdir implementation currently ignores owned::fs::MkdirOptions; update the
mkdir function in wasm_io_fs.rs to enforce the documented semantics: if
options.recursive is false, first check the parent path exists (using
vfs().vfs_exists or vfs().vfs_metadata) and ensure the target does not already
exist, returning an appropriate OpError when preconditions fail, then call
vfs_create_dir; if options.recursive is true, allow idempotency by treating
"already exists" as success (probe target with vfs_exists and return
SysOpOutput::ok(()) if it exists) and otherwise call vfs_create_dir to create
parents as the host supports or create missing parents iteratively via
vfs_create_dir on ancestor paths; keep error wrapping using the existing
SysOpOutput::err pattern.

Comment thread baml_language/crates/bridge_wasm/src/wasm_io_glob.rs
Comment thread baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs
Comment on lines +85 to +118
b'{' => {
let mut depth = 1usize;
let mut alts = String::from("(?:");
i += 1;
while i < bytes.len() && depth > 0 {
match bytes[i] {
b'{' => {
depth += 1;
alts.push('{');
}
b'}' => {
depth -= 1;
if depth == 0 {
break;
}
alts.push('}');
}
b',' if depth == 1 => alts.push('|'),
ch => {
let c = ch as char;
if ".+^${}()|[]\\".contains(c) {
alts.push('\\');
}
alts.push(c);
}
}
i += 1;
}
alts.push(')');
re.push_str(&alts);
if i < bytes.len() {
i += 1; // consume closing '}'
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm Rust's `regex` 1.10 treats bare '{' literally (i.e. the current code
# doesn't fail to compile on nested-brace patterns) — to rule out a crash.
cat <<'EOF'
// Sanity check - this regex compile behavior governs the severity.
// "(?:a|{b,c})" should compile in `regex = "1.10"` and match "a" or literal "{b,c}".
EOF
rg -nP 'regex\s*=\s*"1\.' --type=toml

Repository: BoundaryML/baml

Length of output: 251


🏁 Script executed:

#!/bin/bash
# Locate and inspect the glob_utils.rs file
fd -t f glob_utils.rs

Repository: BoundaryML/baml

Length of output: 108


🏁 Script executed:

#!/bin/bash
# Read the brace handling code in glob_utils.rs
cat -n baml_language/crates/sys_native/src/glob_utils.rs | head -150

Repository: BoundaryML/baml

Length of output: 5936


🏁 Script executed:

#!/bin/bash
# Check for wasm_io_glob.rs and glob_to_regex implementation
fd -t f wasm_io_glob.rs

Repository: BoundaryML/baml

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Read the rest of glob_utils.rs to see all tests
cat -n baml_language/crates/sys_native/src/glob_utils.rs | tail -n +150

Repository: BoundaryML/baml

Length of output: 1081


🏁 Script executed:

#!/bin/bash
# Read wasm_io_glob.rs to check for duplicate code
cat -n baml_language/crates/bridge_wasm/src/wasm_io_glob.rs | head -200

Repository: BoundaryML/baml

Length of output: 8751


🏁 Script executed:

#!/bin/bash
# Continue reading wasm_io_glob.rs brace handling
cat -n baml_language/crates/bridge_wasm/src/wasm_io_glob.rs | sed -n '195,240p'

Repository: BoundaryML/baml

Length of output: 1874


🏁 Script executed:

#!/bin/bash
# Check if there are any tests for nested braces or negation in glob_utils.rs
rg -A 5 "nested|negation" baml_language/crates/sys_native/src/glob_utils.rs

Repository: BoundaryML/baml

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Check if negation feature is actually tested or mentioned
rg -n "negat" baml_language/crates/sys_native/src/glob_utils.rs

Repository: BoundaryML/baml

Length of output: 280


🏁 Script executed:

#!/bin/bash
# Test the actual behavior of nested braces by creating a small test
# to trace what regex pattern gets generated for {a,{b,c}}
python3 << 'EOF'
# Simulate the glob_to_regex logic for {a,{b,c}}.txt
glob = "{a,{b,c}}.txt"
bytes_list = list(glob.encode())
i = 0
re = "^"

while i < len(bytes_list):
    if bytes_list[i:i+1] == b'{':
        depth = 1
        alts = "(?:"
        i += 1
        while i < len(bytes_list) and depth > 0:
            b = bytes_list[i]
            if b == ord('{'):
                depth += 1
                alts += '{'
            elif b == ord('}'):
                depth -= 1
                if depth == 0:
                    break
                alts += '}'
            elif b == ord(',') and depth == 1:
                alts += '|'
            else:
                c = chr(b)
                if c in ".+^${}()|[]\\":
                    alts += '\\'
                alts += c
            i += 1
        alts += ')'
        re += alts
        if i < len(bytes_list):
            i += 1  # consume closing }
    else:
        c = chr(bytes_list[i])
        if c in ".+^${}()|[]\\":
            re += '\\'
        re += c
        i += 1

re += "$"
print(f"Input glob: {glob}")
print(f"Generated regex: {re}")
print(f"Expected regex for correct nested handling: ^(?:a|(?:b|c))\\.txt$")
print(f"This regex would incorrectly match: {'{b,c}'}.txt (literal braces)")
EOF

Repository: BoundaryML/baml

Length of output: 250


🏁 Script executed:

#!/bin/bash
# One more check: does the negation implementation work correctly?
# Just verify the negation code path exists and is syntactically correct
sed -n '132,139p' baml_language/crates/sys_native/src/glob_utils.rs

Repository: BoundaryML/baml

Length of output: 345


Nested {...} alternation silently produces incorrect regex.

The brace handler at lines 91–93 and 100 pushes inner { and } as literal characters (when depth > 1), and only converts , to | at depth == 1. This means a pattern like {a,{b,c}}.txt generates the regex (?:a|{b,c})\.txt, which matches the literal string "{b,c}.txt" instead of "a.txt", "b.txt", or "c.txt".

The identical code also appears in baml_language/crates/bridge_wasm/src/wasm_io_glob.rs (lines 195–227). The TODO comment there (lines 128–133) already suggests consolidating both implementations into a shared crate to avoid duplication and reduce maintenance burden.

Recommended fixes:

  • Reject nested braces with a clear parse error, or
  • Recursively emit nested (?:b|c) groups for inner alternations.

Add unit tests covering nested braces and negation (currently untested):

💚 Suggested tests
#[test]
fn nested_alternation() {
    let g = GlobPattern::new("{a,{b,c}}.txt").unwrap();
    assert!(g.is_match("a.txt"));
    assert!(g.is_match("b.txt"));
    assert!(g.is_match("c.txt"));
    assert!(!g.is_match("d.txt"));
}

#[test]
fn negation() {
    let g = GlobPattern::new("!*.ts").unwrap();
    assert!(g.is_match("foo.rs"));
    assert!(!g.is_match("foo.ts"));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/sys_native/src/glob_utils.rs` around lines 85 - 118, The
brace-handling code in glob_utils.rs (the '{' match branch) currently treats
nested braces as literal characters causing patterns like "{a,{b,c}}.txt" to
compile to incorrect regex; update the handler to either return a parse error
for nested braces or recursively parse inner brace groups and emit nested
non-capturing alternation groups (i.e., produce "(?:a|(?:b|c))"), ensure commas
at any depth are converted to '|' for the correct depth when recursively
building the alts string, and mirror the same fix in the duplicate
implementation in bridge_wasm/src/wasm_io_glob.rs; add unit tests using
GlobPattern::new for nested_alternation and negation as suggested to validate
both behaviors.

Comment on lines +567 to +641
SysOpOutput::async_op(async move {
let handle = downcast_glob_handle(&glob)?;

// Parse root: either a string or a ScanOptions instance
let (cwd, dot, absolute, follow_symlinks, throw_on_broken, only_files) = match &root {
BexExternalValue::String(s) => (s.clone(), false, false, false, false, true),
BexExternalValue::Instance { fields, .. } => {
let cwd = fields
.get("cwd")
.and_then(|v| {
if let BexExternalValue::String(s) = v {
Some(s.clone())
} else {
None
}
})
.unwrap_or_else(|| ".".to_string());
let dot = fields
.get("dot")
.and_then(|v| {
if let BexExternalValue::Bool(b) = v {
Some(*b)
} else {
None
}
})
.unwrap_or(false);
let absolute = fields
.get("absolute")
.and_then(|v| {
if let BexExternalValue::Bool(b) = v {
Some(*b)
} else {
None
}
})
.unwrap_or(false);
let follow_symlinks = fields
.get("follow_symlinks")
.and_then(|v| {
if let BexExternalValue::Bool(b) = v {
Some(*b)
} else {
None
}
})
.unwrap_or(false);
let throw_on_broken = fields
.get("throw_error_on_broken_symlink")
.and_then(|v| {
if let BexExternalValue::Bool(b) = v {
Some(*b)
} else {
None
}
})
.unwrap_or(false);
let only_files = fields
.get("only_files")
.and_then(|v| {
if let BexExternalValue::Bool(b) = v {
Some(*b)
} else {
None
}
})
.unwrap_or(true);
(cwd, dot, absolute, follow_symlinks, throw_on_broken, only_files)
}
_ => {
return Err(OpErrorKind::Other(
"scan argument must be a string or ScanOptions".into(),
))
}
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ScanOptions parsing: verbose duplication + misleading throw_on_broken semantics.

Two minor observations on the scan implementation:

  1. The six fields.get(...).and_then(...).unwrap_or(...) blocks are structurally identical and hurt readability. Consider small helpers for bool/string field extraction:
♻️ Optional refactor
fn get_bool(fields: &IndexMap<String, BexExternalValue>, key: &str, default: bool) -> bool {
    match fields.get(key) {
        Some(BexExternalValue::Bool(b)) => *b,
        _ => default,
    }
}
fn get_string(fields: &IndexMap<String, BexExternalValue>, key: &str, default: &str) -> String {
    match fields.get(key) {
        Some(BexExternalValue::String(s)) => s.clone(),
        _ => default.to_string(),
    }
}
  1. throw_error_on_broken_symlink is wired to if throw_on_broken { return Err(...) } for every walkdir error, not just broken-symlink errors. Permission-denied, IO failures, etc. will all surface under this flag. Additionally, with follow_links=false, a dangling symlink typically does not produce a walkdir::Error at all — the entry is yielded successfully with file_type().is_symlink() == true — so the option may not actually fire on broken symlinks in its most natural usage. Either rename (e.g. throw_on_walk_error) or detect broken-symlink errors specifically via e.io_error() + io::ErrorKind::NotFound and target the option accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/sys_native/src/io_impls.rs` around lines 567 - 641, The
ScanOptions parsing in the SysOpOutput::async_op block duplicates six identical
fields.get(...).and_then(...).unwrap_or(...) patterns and misuses
throw_error_on_broken_symlink: extract string/bool fields via small helpers
(e.g., get_string(fields, "cwd", ".") and get_bool(fields, "dot", false)) and
replace the repeated matches in the root match arm; then adjust the walkdir
error handling so throw_error_on_broken_symlink only triggers for genuine
broken-symlink errors by inspecting walkdir::Error::io_error() and matching
io::ErrorKind::NotFound (or else rename the flag to throw_on_walk_error if you
intend to cover all walk errors), updating the throw_on_broken variable and its
checks accordingly.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

Binary size checks passed

7 passed

Artifact Platform Gzip Baseline Delta Status
bridge_cffi Linux 6.1 MB 5.7 MB +407.3 KB (+7.2%) OK
bridge_cffi-stripped Linux 6.1 MB 5.7 MB +374.9 KB (+6.6%) OK
bridge_cffi macOS 5.0 MB 4.6 MB +377.5 KB (+8.2%) OK
bridge_cffi-stripped macOS 5.0 MB 4.7 MB +313.4 KB (+6.7%) OK
bridge_cffi Windows 5.0 MB 4.6 MB +385.0 KB (+8.3%) OK
bridge_cffi-stripped Windows 5.0 MB 4.7 MB +325.5 KB (+7.0%) OK
bridge_wasm WASM 3.3 MB 3.2 MB +65.3 KB (+2.0%) OK

Generated by cargo size-gate · workflow run

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 23, 2026

Merging this PR will improve performance by 15.33%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 14 untouched benchmarks
⏩ 105 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime bench_lexer_only_simple 24.3 µs 21.1 µs +15.33%

Comparing aaron/add-readdir-function-to-bamlfs-module (f54a106) with canary (96b6edb)

Open in CodSpeed

Footnotes

  1. 105 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (3)
baml_language/crates/baml_compiler_parser/src/parser.rs (1)

5210-5233: Add parity tests for normal/byte strings, and confirm full lib test run.

Nice regression tests for raw strings. Since parse_string and parse_byte_string were also changed, add equivalent tests for "//", "*/", b"//", and b"*/" to lock behavior across all touched paths. Also please confirm cargo test --lib was run for this Rust change.

Suggested test additions
+    #[test]
+    fn string_keeps_comment_markers_as_text() {
+        let source = r#"
+function Demo() -> string {
+  "//"
+  "*/"
+}
+"#;
+
+        let (_root, errors) = parse_source(source);
+        assert_no_errors(&errors);
+    }
+
+    #[test]
+    fn byte_string_keeps_comment_markers_as_text() {
+        let source = r#"
+function Demo() -> string {
+  b"//"
+  b"*/"
+}
+"#;
+
+        let (_root, errors) = parse_source(source);
+        assert_no_errors(&errors);
+    }

As per coding guidelines: "Always run cargo test --lib if you changed any Rust code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_compiler_parser/src/parser.rs` around lines 5210 -
5233, Add matching unit tests for normal and byte strings mirroring the raw
string tests: create tests similar to
raw_string_keeps_line_comment_marker_as_text and
raw_string_keeps_block_comment_end_marker_as_text but using normal string
literals and byte string literals (e.g., assert parse_source on "\"//\"" and
"\"*/\"" and b"\"//\"" and b"\"*/\"" or equivalent tests that exercise
parse_string and parse_byte_string) so the behavior of parse_string and
parse_byte_string is covered; place them near the existing tests that call
parse_source and assert_no_errors to keep parity, and then run cargo test --lib
to verify the full library test suite passes.
baml_language/crates/sys_types/src/lib.rs (1)

337-355: Add a unit test for into_result_mapped's async/error path.

This helper now sits on the serialization path for list returns, but there’s no coverage here that the Async branch both applies f and wraps failures with OpError::new(op, ...). A small local test would lock that down cheaply.

As per coding guidelines, **/*.rs: Prefer writing Rust unit tests over integration tests where possible.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/sys_types/src/lib.rs` around lines 337 - 355, Add a unit
test that exercises SysOpOutput::into_result_mapped's Async branch to ensure the
mapping function f is applied to successful values and failures are wrapped with
OpError::new(op, ...); specifically, create an async-ready SysOpOutput::Async
(pin a future that returns Ok or Err), call into_result_mapped with a simple f
-> BexExternalValue conversion and a concrete SysOp, await the resulting
SysOpResult::Async, and assert that Ok paths produce the mapped BexExternalValue
and Err paths produce an OpError whose op and kind match expectations; place the
test in the same lib.rs tests module so it runs as a unit test.
baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs (1)

1425-1461: Use emit_into_result_call on the class-method glue path too.

The new helper only gets wired into emit_free_fn_glue; emit_glue_method() still hardcodes .into_result(...). The next class method returning Class[] will fall back to the old orphan-rule problem, so it's worth routing both generators through the same helper now.

Also applies to: 1473-1547

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs` around lines
1425 - 1461, emit_glue_method currently hardcodes `.into_result(...)`, so
class-method glue doesn't use the new list-mapping logic in
emit_into_result_call and will hit the orphan-rule bug for Class[] returns;
update the class-method path to call emit_into_result_call instead of appending
`.into_result(...)`. Specifically, in emit_glue_method replace the hardcoded
call that uses SysOp::`#variant_ident` (the same pattern used in
emit_free_fn_glue) with a call to emit_into_result_call(return_type,
variant_ident, call_expr, class_ns_map, paths) so List(Named(...)) returns are
handled via into_result_mapped; ensure the same variables (return_type,
variant_ident, call_expr, class_ns_map, paths) are available/forwarded to match
the free-fn glue usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs`:
- Around line 330-356: The mkdir implementation currently treats any existing
path as "exists" without checking whether it's a directory; change checks that
call self.vfs().vfs_exists(&path) to instead query the actual type (e.g.,
self.vfs().vfs_is_dir(&path) or vfs_metadata and inspect the file type) so that:
for recursive=true you return success only if the existing path is a directory
and otherwise return an error indicating "Not a directory" (or similar), and for
recursive=false you still error when the path exists but make the error specific
if it exists and is not a directory; likewise when validating the parent path
(the parent_path(&path) check) ensure you verify the parent is an existing
directory (not just exists). Update the same logic in the other block referenced
(lines ~359-380) to use these directory-aware checks (replace vfs_exists usages
with directory/type checks and adjust error messages accordingly).

In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs`:
- Around line 61-104: The scan implementation ignores the _root parameter
causing matches outside the requested root to leak; fix by extracting a root
path string from the _root BexExternalValue (similar to how you downcast
glob._handle to String) and enforce the root when filtering results from
self.vfs.vfs_read_many(&pattern): either call a vfs_read_many variant that
accepts a root if available, or normalize both root and each returned path and
skip any path that is not under the root before applying glob_regex_matches;
update scan to perform this root-prefix/path-contains check (use
functions/values glob._handle, _root, self.vfs.vfs_read_many,
glob_regex_matches, and the local pattern/re variables) so output matches native
behavior.

In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 170-186: The mock readMany currently ignores its "glob" parameter;
update the JS function created in wasm_fs_glob.rs (the Function passed to
js_sys::Reflect::set for "readMany") to filter Object.keys(this._files) by the
incoming glob argument before pushing results. Convert the glob string into a
RegExp (e.g., replace "**" -> ".*", "*" -> "[^/]*", escape other regex
metacharacters) or otherwise match the pattern, and only push [key,
TextEncoder().encode(value)] when the key matches the compiled pattern so tests
for "**/*.txt" vs "**/*.rs" exercise distinct behavior.

In `@baml_language/crates/sys_native/src/io_impls.rs`:
- Around line 669-679: The Glob.matches implementation currently forwards the
raw path to downcast_glob_handle/...handle.is_match which skips the
normalization done in scan; modify the matches function to normalize Windows
separators by replacing '\' with '/' on the incoming path before calling
handle.is_match (preserve ownership of String), so downcast_glob_handle(&glob)
and SysOpOutput::ok(handle.is_match(&normalized_path)) are used with the
normalized path and existing error handling remains unchanged.

---

Nitpick comments:
In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs`:
- Around line 1425-1461: emit_glue_method currently hardcodes
`.into_result(...)`, so class-method glue doesn't use the new list-mapping logic
in emit_into_result_call and will hit the orphan-rule bug for Class[] returns;
update the class-method path to call emit_into_result_call instead of appending
`.into_result(...)`. Specifically, in emit_glue_method replace the hardcoded
call that uses SysOp::`#variant_ident` (the same pattern used in
emit_free_fn_glue) with a call to emit_into_result_call(return_type,
variant_ident, call_expr, class_ns_map, paths) so List(Named(...)) returns are
handled via into_result_mapped; ensure the same variables (return_type,
variant_ident, call_expr, class_ns_map, paths) are available/forwarded to match
the free-fn glue usage.

In `@baml_language/crates/baml_compiler_parser/src/parser.rs`:
- Around line 5210-5233: Add matching unit tests for normal and byte strings
mirroring the raw string tests: create tests similar to
raw_string_keeps_line_comment_marker_as_text and
raw_string_keeps_block_comment_end_marker_as_text but using normal string
literals and byte string literals (e.g., assert parse_source on "\"//\"" and
"\"*/\"" and b"\"//\"" and b"\"*/\"" or equivalent tests that exercise
parse_string and parse_byte_string) so the behavior of parse_string and
parse_byte_string is covered; place them near the existing tests that call
parse_source and assert_no_errors to keep parity, and then run cargo test --lib
to verify the full library test suite passes.

In `@baml_language/crates/sys_types/src/lib.rs`:
- Around line 337-355: Add a unit test that exercises
SysOpOutput::into_result_mapped's Async branch to ensure the mapping function f
is applied to successful values and failures are wrapped with OpError::new(op,
...); specifically, create an async-ready SysOpOutput::Async (pin a future that
returns Ok or Err), call into_result_mapped with a simple f -> BexExternalValue
conversion and a concrete SysOp, await the resulting SysOpResult::Async, and
assert that Ok paths produce the mapped BexExternalValue and Err paths produce
an OpError whose op and kind match expectations; place the test in the same
lib.rs tests module so it runs as a unit test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: b82e0397-d7fd-474b-9f89-58d7e67b720e

📥 Commits

Reviewing files that changed from the base of the PR and between 68d9108 and 1b6afd7.

⛔ Files ignored due to path filters (16)
  • baml_language/Cargo.lock is excluded by !**/*.lock
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__03_hir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_tir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__06_codegen.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snap is excluded by !**/*.snap
📒 Files selected for processing (10)
  • baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs
  • baml_language/crates/baml_compiler_parser/src/parser.rs
  • baml_language/crates/bridge_wasm/src/wasm_fs.rs
  • baml_language/crates/bridge_wasm/src/wasm_io_fs.rs
  • baml_language/crates/bridge_wasm/src/wasm_io_glob.rs
  • baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs
  • baml_language/crates/sys_native/src/glob_utils.rs
  • baml_language/crates/sys_native/src/io_impls.rs
  • baml_language/crates/sys_ops/src/lib.rs
  • baml_language/crates/sys_types/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • baml_language/crates/bridge_wasm/src/wasm_fs.rs
  • baml_language/crates/sys_native/src/glob_utils.rs
  • baml_language/crates/sys_ops/src/lib.rs

Comment on lines +330 to +356
if !options.recursive {
match self.vfs().vfs_exists(&path) {
Ok(true) => {
return SysOpOutput::err(OpErrorKind::Other(format!(
"Directory already exists: {path}"
)));
}
Ok(false) => {}
Err(e) => return SysOpOutput::err(js_err(&e)),
}

if let Some(parent) = parent_path(&path) {
match self.vfs().vfs_exists(&parent) {
Ok(true) => {}
Ok(false) => {
return SysOpOutput::err(OpErrorKind::Other(format!(
"Parent directory does not exist: {parent}"
)));
}
Err(e) => return SysOpOutput::err(js_err(&e)),
}
}

return match self.vfs().vfs_create_dir(&path) {
Ok(()) => SysOpOutput::ok(()),
Err(e) => SysOpOutput::err(js_err(&e)),
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't treat any existing path as a directory in mkdir.

Both branches short-circuit on vfs_exists, so an existing file at path is treated as success for recursive: true and as "Directory already exists" for recursive: false. That diverges from native mkdir semantics and can let callers continue with a non-directory target.

Also applies to: 359-380

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 330 - 356,
The mkdir implementation currently treats any existing path as "exists" without
checking whether it's a directory; change checks that call
self.vfs().vfs_exists(&path) to instead query the actual type (e.g.,
self.vfs().vfs_is_dir(&path) or vfs_metadata and inspect the file type) so that:
for recursive=true you return success only if the existing path is a directory
and otherwise return an error indicating "Not a directory" (or similar), and for
recursive=false you still error when the path exists but make the error specific
if it exists and is not a directory; likewise when validating the parent path
(the parent_path(&path) check) ensure you verify the parent is an existing
directory (not just exists). Update the same logic in the other block referenced
(lines ~359-380) to use these directory-aware checks (replace vfs_exists usages
with directory/type checks and adjust error messages accordingly).

Comment on lines +61 to +104
fn scan(
&self,
_h: &Arc<BexHeap>,
_c: CallId,
glob: owned::glob::Glob,
_root: BexExternalValue,
_ctx: &SysOpContext,
) -> SysOpOutput<Vec<String>> {
// Downcast the handle to get the raw pattern string.
let pattern = match glob._handle.clone().downcast::<String>() {
Ok(p) => (*p).clone(),
Err(_) => {
return SysOpOutput::err(OpErrorKind::Other(
"Invalid glob handle: expected String pattern".into(),
));
}
};
let (re, negated) = match glob_to_regex(&pattern) {
Ok(glob) => glob,
Err(e) => return SysOpOutput::err(OpErrorKind::Other(e)),
};

// Delegate to WasmVfs.readMany which accepts a glob string and returns
// `[string, Uint8Array][]`. Filter with the same Rust matcher used by
// matches() so the host and native implementations agree on patterns.
match self.vfs.vfs_read_many(&pattern) {
Ok(arr) => {
let paths: Vec<String> = arr
.iter()
.filter_map(|item| {
// Each item is [path: string, data: Uint8Array]
let pair: js_sys::Array = item.dyn_into().ok()?;
pair.get(0).as_string()
})
.filter(|path| glob_regex_matches(&re, negated, path))
.collect();
SysOpOutput::ok(paths)
}
Err(e) => {
let msg = e.as_string().unwrap_or_else(|| format!("{e:?}"));
SysOpOutput::err(OpErrorKind::Other(msg))
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Glob.scan(root) is currently ignoring root.

scan() never uses the _root argument; it calls vfs_read_many(&pattern) and then regex-filters whatever comes back. That means matches outside the requested root can leak into the result set, unlike the native implementation that walks from root.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs` around lines 61 - 104,
The scan implementation ignores the _root parameter causing matches outside the
requested root to leak; fix by extracting a root path string from the _root
BexExternalValue (similar to how you downcast glob._handle to String) and
enforce the root when filtering results from self.vfs.vfs_read_many(&pattern):
either call a vfs_read_many variant that accepts a root if available, or
normalize both root and each returned path and skip any path that is not under
the root before applying glob_regex_matches; update scan to perform this
root-prefix/path-contains check (use functions/values glob._handle, _root,
self.vfs.vfs_read_many, glob_regex_matches, and the local pattern/re variables)
so output matches native behavior.

Comment thread baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs
Comment on lines +669 to +679
fn matches(
&self,
_heap: &Arc<BexHeap>,
_call_id: CallId,
glob: owned::glob::Glob,
path: String,
_ctx: &SysOpContext,
) -> SysOpOutput<bool> {
match downcast_glob_handle(&glob) {
Ok(handle) => SysOpOutput::ok(handle.is_match(&path)),
Err(e) => SysOpOutput::err(e),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize path separators in Glob.matches for parity with scan

scan normalizes \/ before matching, but matches does not. This can make Glob.matches(path) fail for Windows-style input paths that would otherwise match in scan.

💡 Proposed fix
     fn matches(
         &self,
         _heap: &Arc<BexHeap>,
         _call_id: CallId,
         glob: owned::glob::Glob,
         path: String,
         _ctx: &SysOpContext,
     ) -> SysOpOutput<bool> {
         match downcast_glob_handle(&glob) {
-            Ok(handle) => SysOpOutput::ok(handle.is_match(&path)),
+            Ok(handle) => {
+                let normalized = path.replace('\\', "/");
+                SysOpOutput::ok(handle.is_match(&normalized))
+            }
             Err(e) => SysOpOutput::err(e),
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/sys_native/src/io_impls.rs` around lines 669 - 679, The
Glob.matches implementation currently forwards the raw path to
downcast_glob_handle/...handle.is_match which skips the normalization done in
scan; modify the matches function to normalize Windows separators by replacing
'\' with '/' on the incoming path before calling handle.is_match (preserve
ownership of String), so downcast_glob_handle(&glob) and
SysOpOutput::ok(handle.is_match(&normalized_path)) are used with the normalized
path and existing error handling remains unchanged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 439-442: The test function MkdirRecursive currently only checks
baml.fs.exists("/generated/nested"), which won't fail if the bridge creates just
the child; update the test to also assert that the parent directory exists by
calling baml.fs.exists("/generated") (or equivalent assert on baml.fs.exists for
"/generated") after the mkdir call so the recursive:true behavior is actually
verified; locate MkdirRecursive and the baml.fs.mkdir / baml.fs.exists calls to
add the additional existence check.
- Around line 454-457: The test uses an absolute pattern and scan("/") so it
will pass even if the wasm bridge ignores the scan(root) argument; update the
fixture in GlobScanTxt to use a relative pattern and a non-root scan target so
scan(root) must be forwarded: change the pattern passed to baml.glob.new from
"/workspace/data/**/*.txt" to "data/**/*.txt" and call glob.scan with a non-root
path such as "/workspace" (reference: function GlobScanTxt, baml.glob.new,
glob.scan).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: dbae7a94-2745-46ab-9e08-16d8dd9551b2

📥 Commits

Reviewing files that changed from the base of the PR and between 1b6afd7 and f54a106.

⛔ Files ignored due to path filters (3)
  • baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snap is excluded by !**/*.snap
  • baml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snap is excluded by !**/*.snap
📒 Files selected for processing (2)
  • baml_language/crates/bridge_wasm/src/lib.rs
  • baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs

Comment on lines +439 to +442
function MkdirRecursive() -> bool {
baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true });
baml.fs.exists("/generated/nested")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert parent creation so recursive is actually covered.

This still passes if the bridge only creates "/generated/nested" and ignores the recursive parent-creation behavior. Check "/generated" too so the test fails when recursive: true is dropped.

🧪 Minimal test tweak
 function MkdirRecursive() -> bool {
   baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true });
-  baml.fs.exists("/generated/nested")
+  baml.fs.exists("/generated") && baml.fs.exists("/generated/nested")
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function MkdirRecursive() -> bool {
baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true });
baml.fs.exists("/generated/nested")
}
function MkdirRecursive() -> bool {
baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true });
baml.fs.exists("/generated") && baml.fs.exists("/generated/nested")
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs` around lines 439 -
442, The test function MkdirRecursive currently only checks
baml.fs.exists("/generated/nested"), which won't fail if the bridge creates just
the child; update the test to also assert that the parent directory exists by
calling baml.fs.exists("/generated") (or equivalent assert on baml.fs.exists for
"/generated") after the mkdir call so the recursive:true behavior is actually
verified; locate MkdirRecursive and the baml.fs.mkdir / baml.fs.exists calls to
add the additional existence check.

Comment on lines +454 to +457
function GlobScanTxt() -> string[] {
let glob = baml.glob.new("/workspace/data/**/*.txt");
glob.scan("/")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This fixture does not prove scan(root) is forwarded.

With an absolute pattern plus scan("/"), the test can still pass if the wasm bridge ignores the root argument entirely. Use a relative glob and a non-root scan target so this exercises the new API contract.

🧪 Minimal test tweak
 function GlobScanTxt() -> string[] {
-  let glob = baml.glob.new("/workspace/data/**/*.txt");
-  glob.scan("/")
+  let glob = baml.glob.new("**/*.txt");
+  glob.scan("/workspace/data")
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs` around lines 454 -
457, The test uses an absolute pattern and scan("/") so it will pass even if the
wasm bridge ignores the scan(root) argument; update the fixture in GlobScanTxt
to use a relative pattern and a non-root scan target so scan(root) must be
forwarded: change the pattern passed to baml.glob.new from
"/workspace/data/**/*.txt" to "data/**/*.txt" and call glob.scan with a non-root
path such as "/workspace" (reference: function GlobScanTxt, baml.glob.new,
glob.scan).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants