Add baml.fs.read_dir, mkdir, baml.glob namespace, and fix parser string literal bug#3400
Add baml.fs.read_dir, mkdir, baml.glob namespace, and fix parser string literal bug#3400
Conversation
…ser string literal bug Implement directory listing (read_dir) and creation (mkdir) in the baml.fs module, add a new baml.glob namespace with pattern matching and filesystem scanning, and fix a parser bug where `*/` inside string literals was incorrectly tokenized as a block-comment end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughWalkthroughAdds filesystem directory APIs ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant BAML_Runtime as "BAML Runtime"
participant SysOps as "SysOps (io namespace)"
participant NativeFS as "Native FS / VFS"
Client->>BAML_Runtime: call baml.fs.read_dir(path)
BAML_Runtime->>SysOps: invoke io::read_dir(path)
SysOps->>NativeFS: read_dir / metadata calls
NativeFS-->>SysOps: Vec<DirEntry>
SysOps->>BAML_Runtime: SysOpResult (array)
BAML_Runtime->>Client: return DirEntry[] (BexExternalValue::Array)
sequenceDiagram
participant Client
participant BAML_Runtime as "BAML Runtime"
participant Glob_NS as "Glob Namespace"
participant VFS as "Filesystem / readMany"
participant Regex as "Regex Engine"
Client->>BAML_Runtime: glob.new(pattern)
BAML_Runtime->>Glob_NS: create handle (store pattern)
Glob_NS-->>BAML_Runtime: Glob handle
Client->>BAML_Runtime: glob.scan(root/options)
BAML_Runtime->>VFS: enumerate candidates (walk/readMany)
VFS-->>BAML_Runtime: candidate paths
BAML_Runtime->>Regex: compile/apply glob_to_regex
Regex-->>BAML_Runtime: match results
BAML_Runtime->>Client: filtered path list (string[])
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 68d9108. Configure here.
| re.push_str(&alts); | ||
| if i < bytes.len() { | ||
| i += 1; // consume closing '}' | ||
| } |
There was a problem hiding this comment.
Glob wildcards inside brace alternation produce invalid regex
Medium Severity
The {...} alternation handler in glob_to_regex treats all characters inside braces via the default ch => arm, which only escapes a fixed set of regex metacharacters. Glob wildcards like * and ? are not in that escape set and are not converted to their glob equivalents ([^/]*, [^/]). Since * and ? are regex quantifiers, a pattern like {*.txt,*.rs} produces the regex (?:*\.txt|*\.rs) where * has no preceding expression, causing regex::Regex::new to fail at runtime with an "invalid regex" error.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 68d9108. Configure here.
| SysOpOutput::err(OpErrorKind::Other(msg)) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
WASM glob scan ignores root directory parameter
Medium Severity
The WASM Glob.scan() implementation ignores the _root parameter entirely and delegates only the glob pattern to vfs_read_many. The native implementation uses root as the walk directory, extracts ScanOptions fields (cwd, dot, absolute, follow_symlinks, only_files, etc.), and filters results accordingly. In WASM, calling g.scan("/some/dir") disregards the directory argument, producing incorrect results.
Reviewed by Cursor Bugbot for commit 68d9108. Configure here.
| } else { | ||
| Ok(regex) | ||
| } | ||
| } |
There was a problem hiding this comment.
Duplicated glob-to-regex code across two crates
Low Severity
The glob_to_regex function is fully duplicated between sys_native/src/glob_utils.rs and bridge_wasm/src/wasm_io_glob.rs. Both copies share the same brace-alternation wildcard bug, illustrating the maintenance risk: any fix to one copy must be manually replicated to the other. A shared utility crate (as the inline comment itself suggests) would eliminate this duplication.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 68d9108. Configure here.
| Ok(()) => SysOpOutput::ok(()), | ||
| Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))), | ||
| } | ||
| } |
There was a problem hiding this comment.
WASM mkdir ignores recursive option flag
Low Severity
The WASM mkdir implementation ignores _options: owned::fs::MkdirOptions (including recursive: bool), always delegating to vfs_create_dir. The native implementation checks options.recursive to choose between create_dir and create_dir_all. With recursive: false, WASM may succeed in creating nested paths when it semantically shouldn't, breaking behavioral parity with native.
Reviewed by Cursor Bugbot for commit 68d9108. Configure here.
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
baml_language/crates/baml_compiler_parser/src/parser.rs (1)
1506-1536:⚠️ Potential issue | 🟠 MajorPropagate raw EOF checks into raw-string content helpers.
Line 1506 uses
at_end_raw(), but the delegated raw-string consumers still loop onat_end(). That means content like#"//"#can still be treated as a skipped comment beforebump_raw()consumes it, causing an unclosed raw string / iteration-limit error.🐛 Proposed fix
- while !p.at_end() && depth > 0 { + while !p.at_end_raw() && depth > 0 { if p.at_raw(TokenKind::Quote) && p.count_consecutive_hashes_after_quote() == opening_hashes { @@ - while !p.at_end() { + while !p.at_end_raw() { if p.at_raw(TokenKind::Quote) && p.count_consecutive_hashes_after_quote() == opening_hashes { @@ - while !p.at_end() { + while !p.at_end_raw() { if p.at_raw(TokenKind::Quote) && p.count_consecutive_hashes_after_quote() == opening_hashes { @@ - while !p.at_end() { + while !p.at_end_raw() { // Check for closing delimiter if p.at_raw(TokenKind::Quote) {Please also add a parser unit test for raw strings containing
//and*/, e.g.#"//"#and#"*/"#. As per coding guidelines,**/*.rs: “Prefer writing Rust unit tests over integration tests where possible” and “Always runcargo test --libif you changed any Rust code”.Also applies to: 1567-1593, 1609-1624, 1635-1650, 1660-1675
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_compiler_parser/src/parser.rs` around lines 1506 - 1536, The delegated raw-string content consumers (parse_jinja_expression, parse_jinja_statement, parse_jinja_comment, parse_prompt_text and any helpers they call) must check at_end_raw() instead of at_end() so they stop on the raw-string EOF delimiter semantics; update those methods to use at_end_raw()/bump_raw() (or propagate a raw-context flag) wherever they currently loop on at_end(), and ensure any places that currently call bump() inside raw parsing use bump_raw(); then add unit tests exercising raw strings that contain comment sequences (e.g. #"//"# and #"*/"#) to confirm the parser does not treat those as comments and that raw strings close correctly.
🧹 Nitpick comments (1)
baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml (1)
64-82: LGTM.API surface matches the native + WASM implementations and the new tests. One optional thought: making
MkdirOptions.recursiveoptional with a documented default would avoid forcing every caller to spellMkdirOptions { recursive: false }, but that's a UX call and can be deferred.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml` around lines 64 - 82, Make MkdirOptions.recursive optional and default to false so callers don't have to always pass MkdirOptions { recursive: false }; update the MkdirOptions definition (symbol MkdirOptions and its field recursive) to be optional/nullable and adjust the mkdir binding/implementation (symbol mkdir) to treat a missing recursive as false and document the default in the API comment/definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs`:
- Around line 2144-2210: emit_result_conversion_inner currently falls back to
Ok(__val) for non-scalar/Named element types which lets Vec<BexExternalValue>
flow through and causes type mismatches; update emit_result_conversion_inner to
handle the same BamlType variants as emit_result_conversion (at minimum Float
and Uint8Array and recursively handle Optional, Map and nested List) or extract
the scalar/Named/Optional conversion into a shared helper used by both
emit_result_conversion and emit_result_conversion_inner so each list element
maps to the correct Rust type (and keep the list-specific error messages like
"expected X in list, got ..."); refer to the functions
emit_result_conversion_inner and emit_result_conversion and the BamlType
variants (Float, Uint8Array, Optional, Map, List, Named, String, Int, Bool) when
implementing the fixes.
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs`:
- Around line 254-277: In read_dir, normalize the join by trimming a trailing
'/' from path (or use a small join helper) before building full so
format!("{path}/{name}") cannot produce double slashes; when calling
self.vfs().vfs_metadata(&full) treat Err(_) as unknown rather than a file (set
is_dir=false, is_file=false, is_symlink=false) and emit a log/warn with the
metadata error so failures (permission, symlink, etc.) aren’t silently
misclassified; update the loop around self.vfs().vfs_read_dir, the full
construction, the vfs_metadata match, and the owned::fs::DirEntry population
accordingly in read_dir.
- Around line 279-293: The mkdir implementation currently ignores
owned::fs::MkdirOptions; update the mkdir function in wasm_io_fs.rs to enforce
the documented semantics: if options.recursive is false, first check the parent
path exists (using vfs().vfs_exists or vfs().vfs_metadata) and ensure the target
does not already exist, returning an appropriate OpError when preconditions
fail, then call vfs_create_dir; if options.recursive is true, allow idempotency
by treating "already exists" as success (probe target with vfs_exists and return
SysOpOutput::ok(()) if it exists) and otherwise call vfs_create_dir to create
parents as the host supports or create missing parents iteratively via
vfs_create_dir on ancestor paths; keep error wrapping using the existing
SysOpOutput::err pattern.
In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs`:
- Around line 61-125: scan currently relies solely on WasmVfs.vfs_read_many (TS
globToRegex) while matches uses Rust glob_to_regex, causing inconsistent results
for advanced patterns; fix by compiling the same pattern with glob_to_regex
inside WasmIoGlob::scan (same as in matches) and filter the paths returned from
self.vfs.vfs_read_many(&pattern) with re.is_match(path) before returning,
returning an OpErrorKind::Other if glob_to_regex fails so scan and matches share
identical matching semantics (refer to functions WasmIoGlob::scan,
WasmIoGlob::matches, glob_to_regex and method WasmVfs.vfs_read_many).
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 380-401: The test runtime_creates_with_fs_and_glob_wired is a
no-op: it only Reflect::get's the BamlWasmRuntime constructor and drops values,
so it never calls BamlWasmRuntime::create or asserts behavior; either update the
test to actually call BamlWasmRuntime::create(mock_callbacks(), mock_vfs(...))
and assert the result is Ok (and optionally perform one FS/glob operation to
exercise IoNamespaceFs/IoNamespaceGlob), or delete the test entirely and rely on
the existing mock_vfs/mock_callbacks tests that already assert contract
behavior; locate the test function runtime_creates_with_fs_and_glob_wired and
replace the no-op with a real create+assert or remove it.
In `@baml_language/crates/sys_native/src/glob_utils.rs`:
- Around line 85-118: The brace-handling code in glob_utils.rs (the '{' match
branch) currently treats nested braces as literal characters causing patterns
like "{a,{b,c}}.txt" to compile to incorrect regex; update the handler to either
return a parse error for nested braces or recursively parse inner brace groups
and emit nested non-capturing alternation groups (i.e., produce
"(?:a|(?:b|c))"), ensure commas at any depth are converted to '|' for the
correct depth when recursively building the alts string, and mirror the same fix
in the duplicate implementation in bridge_wasm/src/wasm_io_glob.rs; add unit
tests using GlobPattern::new for nested_alternation and negation as suggested to
validate both behaviors.
In `@baml_language/crates/sys_native/src/io_impls.rs`:
- Around line 567-641: The ScanOptions parsing in the SysOpOutput::async_op
block duplicates six identical fields.get(...).and_then(...).unwrap_or(...)
patterns and misuses throw_error_on_broken_symlink: extract string/bool fields
via small helpers (e.g., get_string(fields, "cwd", ".") and get_bool(fields,
"dot", false)) and replace the repeated matches in the root match arm; then
adjust the walkdir error handling so throw_error_on_broken_symlink only triggers
for genuine broken-symlink errors by inspecting walkdir::Error::io_error() and
matching io::ErrorKind::NotFound (or else rename the flag to throw_on_walk_error
if you intend to cover all walk errors), updating the throw_on_broken variable
and its checks accordingly.
---
Outside diff comments:
In `@baml_language/crates/baml_compiler_parser/src/parser.rs`:
- Around line 1506-1536: The delegated raw-string content consumers
(parse_jinja_expression, parse_jinja_statement, parse_jinja_comment,
parse_prompt_text and any helpers they call) must check at_end_raw() instead of
at_end() so they stop on the raw-string EOF delimiter semantics; update those
methods to use at_end_raw()/bump_raw() (or propagate a raw-context flag)
wherever they currently loop on at_end(), and ensure any places that currently
call bump() inside raw parsing use bump_raw(); then add unit tests exercising
raw strings that contain comment sequences (e.g. #"//"# and #"*/"#) to confirm
the parser does not treat those as comments and that raw strings close
correctly.
---
Nitpick comments:
In `@baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml`:
- Around line 64-82: Make MkdirOptions.recursive optional and default to false
so callers don't have to always pass MkdirOptions { recursive: false }; update
the MkdirOptions definition (symbol MkdirOptions and its field recursive) to be
optional/nullable and adjust the mkdir binding/implementation (symbol mkdir) to
treat a missing recursive as false and document the default in the API
comment/definition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 8281391c-019c-4277-8b82-3bf1b468d46a
⛔ Files ignored due to path filters (74)
baml_language/Cargo.lockis excluded by!**/*.lockbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__01_lexer__fs_ops.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__02_parser__fs_ops.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__10_formatter__fs_ops.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_basic/baml_tests__test_expr_basic__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_concat/baml_tests__test_expr_name_concat__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_name_type_error/baml_tests__test_expr_name_type_error__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_throwing_body/baml_tests__test_expr_throwing_body__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_with_runner/baml_tests__test_expr_with_runner__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_expr_wrong_runner/baml_tests__test_expr_wrong_runner__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_invalid_contexts/baml_tests__test_invalid_contexts__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_old_and_new/baml_tests__test_old_and_new__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_raw_string_name/baml_tests__test_raw_string_name__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_not_keyword/baml_tests__test_with_not_keyword__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/test_with_runner_ambiguity/baml_tests__test_with_runner_ambiguity__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_basic/baml_tests__testset_basic__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_nested/baml_tests__testset_nested__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snapis excluded by!**/*.snap
📒 Files selected for processing (20)
baml_language/crates/baml_builtins2/baml_std/baml/ns_fs/fs.bamlbaml_language/crates/baml_builtins2/baml_std/baml/ns_glob/glob.bamlbaml_language/crates/baml_builtins2/src/lib.rsbaml_language/crates/baml_builtins2_codegen/src/codegen_io.rsbaml_language/crates/baml_compiler_parser/src/parser.rsbaml_language/crates/baml_tests/projects/builtin_io/fs_ops.bamlbaml_language/crates/baml_tests/tests/fs.rsbaml_language/crates/baml_tests/tests/glob.rsbaml_language/crates/bridge_wasm/Cargo.tomlbaml_language/crates/bridge_wasm/src/lib.rsbaml_language/crates/bridge_wasm/src/wasm_fs.rsbaml_language/crates/bridge_wasm/src/wasm_io_fs.rsbaml_language/crates/bridge_wasm/src/wasm_io_glob.rsbaml_language/crates/bridge_wasm/tests/wasm_fs_glob.rsbaml_language/crates/sys_native/Cargo.tomlbaml_language/crates/sys_native/src/glob_utils.rsbaml_language/crates/sys_native/src/io_impls.rsbaml_language/crates/sys_native/src/lib.rsbaml_language/crates/sys_ops/src/lib.rsbaml_language/crates/sys_types/src/lib.rs
| match self.vfs().vfs_read_dir(&path) { | ||
| Ok(arr) => { | ||
| let mut entries = Vec::new(); | ||
| for v in arr.iter() { | ||
| let Some(name) = v.as_string() else { continue }; | ||
| // WasmVfs.readDir returns string[] of entry names. | ||
| // Probe metadata to distinguish files from directories. | ||
| let full = format!("{path}/{name}"); | ||
| let (is_dir, is_file) = match self.vfs().vfs_metadata(&full) { | ||
| Ok(meta) => (meta.file_type == "directory", meta.file_type == "file"), | ||
| Err(_) => (false, true), | ||
| }; | ||
| entries.push(owned::fs::DirEntry { | ||
| name, | ||
| is_dir, | ||
| is_file, | ||
| is_symlink: false, | ||
| }); | ||
| } | ||
| SysOpOutput::ok(entries) | ||
| } | ||
| Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))), | ||
| } | ||
| } |
There was a problem hiding this comment.
read_dir has a couple of rough edges.
- Line 261:
format!("{path}/{name}")can produce"//name"whenpathends with/(common for roots like/). Most JS VFS implementations will tolerate this, but worth normalizing (e.g., trim a trailing/offpath, or use a small join helper). - Line 264: On
vfs_metadatafailure the entry is classified as(is_dir=false, is_file=true). That silently mislabels symlinks / permission-denied entries as regular files and also leavesis_symlinkhard-coded tofalsewith no path to ever set it true (the WasmVfsfileTypeprotocol only knows"file"/"directory"). Consider at least logging the metadata error, and/or extending the JS contract to report"symlink".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 254 - 277,
In read_dir, normalize the join by trimming a trailing '/' from path (or use a
small join helper) before building full so format!("{path}/{name}") cannot
produce double slashes; when calling self.vfs().vfs_metadata(&full) treat Err(_)
as unknown rather than a file (set is_dir=false, is_file=false,
is_symlink=false) and emit a log/warn with the metadata error so failures
(permission, symlink, etc.) aren’t silently misclassified; update the loop
around self.vfs().vfs_read_dir, the full construction, the vfs_metadata match,
and the owned::fs::DirEntry population accordingly in read_dir.
| fn mkdir( | ||
| &self, | ||
| _h: &Arc<BexHeap>, | ||
| _c: CallId, | ||
| path: String, | ||
| _options: owned::fs::MkdirOptions, | ||
| _ctx: &SysOpContext, | ||
| ) -> SysOpOutput<()> { | ||
| // WasmVfs.createDir doesn't expose a recursive flag — the JS host | ||
| // is expected to handle recursive creation transparently. | ||
| match self.vfs().vfs_create_dir(&path) { | ||
| Ok(()) => SysOpOutput::ok(()), | ||
| Err(e) => SysOpOutput::err(OpErrorKind::Other(format!("{e:?}"))), | ||
| } | ||
| } |
There was a problem hiding this comment.
mkdir silently ignores options.recursive, diverging from the documented API.
The ns_fs/fs.baml contract for mkdir(path, options) — and the native implementation — require:
recursive: false→ error if any parent is missing, and error if the target already exists.recursive: true→ create parents as needed, idempotent when target exists.
Here options is dropped on the floor and every call is forwarded to vfs_create_dir unconditionally. If a JS host implements createDir as "create this path, making parents if needed" (as your mock VFS does in tests/wasm_fs_glob.rs), then in WASM:
baml.fs.mkdir("/no/parent", MkdirOptions { recursive: false })will succeed instead of erroring.baml.fs.mkdir(existing, MkdirOptions { recursive: false })will likely succeed instead of erroring.
That contradicts fs_mkdir_non_recursive_errors_when_parent_missing / fs_mkdir_recursive_is_idempotent in baml_tests/tests/fs.rs and makes baml.fs.mkdir behave differently across targets. Consider either (a) enforcing the semantics in Rust (probe with vfs_exists / vfs_metadata on parent and target before calling vfs_create_dir), or (b) extending the WasmVfs JS interface to accept a recursive flag and delegating enforcement there.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 279 - 293,
The mkdir implementation currently ignores owned::fs::MkdirOptions; update the
mkdir function in wasm_io_fs.rs to enforce the documented semantics: if
options.recursive is false, first check the parent path exists (using
vfs().vfs_exists or vfs().vfs_metadata) and ensure the target does not already
exist, returning an appropriate OpError when preconditions fail, then call
vfs_create_dir; if options.recursive is true, allow idempotency by treating
"already exists" as success (probe target with vfs_exists and return
SysOpOutput::ok(()) if it exists) and otherwise call vfs_create_dir to create
parents as the host supports or create missing parents iteratively via
vfs_create_dir on ancestor paths; keep error wrapping using the existing
SysOpOutput::err pattern.
| b'{' => { | ||
| let mut depth = 1usize; | ||
| let mut alts = String::from("(?:"); | ||
| i += 1; | ||
| while i < bytes.len() && depth > 0 { | ||
| match bytes[i] { | ||
| b'{' => { | ||
| depth += 1; | ||
| alts.push('{'); | ||
| } | ||
| b'}' => { | ||
| depth -= 1; | ||
| if depth == 0 { | ||
| break; | ||
| } | ||
| alts.push('}'); | ||
| } | ||
| b',' if depth == 1 => alts.push('|'), | ||
| ch => { | ||
| let c = ch as char; | ||
| if ".+^${}()|[]\\".contains(c) { | ||
| alts.push('\\'); | ||
| } | ||
| alts.push(c); | ||
| } | ||
| } | ||
| i += 1; | ||
| } | ||
| alts.push(')'); | ||
| re.push_str(&alts); | ||
| if i < bytes.len() { | ||
| i += 1; // consume closing '}' | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Confirm Rust's `regex` 1.10 treats bare '{' literally (i.e. the current code
# doesn't fail to compile on nested-brace patterns) — to rule out a crash.
cat <<'EOF'
// Sanity check - this regex compile behavior governs the severity.
// "(?:a|{b,c})" should compile in `regex = "1.10"` and match "a" or literal "{b,c}".
EOF
rg -nP 'regex\s*=\s*"1\.' --type=tomlRepository: BoundaryML/baml
Length of output: 251
🏁 Script executed:
#!/bin/bash
# Locate and inspect the glob_utils.rs file
fd -t f glob_utils.rsRepository: BoundaryML/baml
Length of output: 108
🏁 Script executed:
#!/bin/bash
# Read the brace handling code in glob_utils.rs
cat -n baml_language/crates/sys_native/src/glob_utils.rs | head -150Repository: BoundaryML/baml
Length of output: 5936
🏁 Script executed:
#!/bin/bash
# Check for wasm_io_glob.rs and glob_to_regex implementation
fd -t f wasm_io_glob.rsRepository: BoundaryML/baml
Length of output: 111
🏁 Script executed:
#!/bin/bash
# Read the rest of glob_utils.rs to see all tests
cat -n baml_language/crates/sys_native/src/glob_utils.rs | tail -n +150Repository: BoundaryML/baml
Length of output: 1081
🏁 Script executed:
#!/bin/bash
# Read wasm_io_glob.rs to check for duplicate code
cat -n baml_language/crates/bridge_wasm/src/wasm_io_glob.rs | head -200Repository: BoundaryML/baml
Length of output: 8751
🏁 Script executed:
#!/bin/bash
# Continue reading wasm_io_glob.rs brace handling
cat -n baml_language/crates/bridge_wasm/src/wasm_io_glob.rs | sed -n '195,240p'Repository: BoundaryML/baml
Length of output: 1874
🏁 Script executed:
#!/bin/bash
# Check if there are any tests for nested braces or negation in glob_utils.rs
rg -A 5 "nested|negation" baml_language/crates/sys_native/src/glob_utils.rsRepository: BoundaryML/baml
Length of output: 166
🏁 Script executed:
#!/bin/bash
# Check if negation feature is actually tested or mentioned
rg -n "negat" baml_language/crates/sys_native/src/glob_utils.rsRepository: BoundaryML/baml
Length of output: 280
🏁 Script executed:
#!/bin/bash
# Test the actual behavior of nested braces by creating a small test
# to trace what regex pattern gets generated for {a,{b,c}}
python3 << 'EOF'
# Simulate the glob_to_regex logic for {a,{b,c}}.txt
glob = "{a,{b,c}}.txt"
bytes_list = list(glob.encode())
i = 0
re = "^"
while i < len(bytes_list):
if bytes_list[i:i+1] == b'{':
depth = 1
alts = "(?:"
i += 1
while i < len(bytes_list) and depth > 0:
b = bytes_list[i]
if b == ord('{'):
depth += 1
alts += '{'
elif b == ord('}'):
depth -= 1
if depth == 0:
break
alts += '}'
elif b == ord(',') and depth == 1:
alts += '|'
else:
c = chr(b)
if c in ".+^${}()|[]\\":
alts += '\\'
alts += c
i += 1
alts += ')'
re += alts
if i < len(bytes_list):
i += 1 # consume closing }
else:
c = chr(bytes_list[i])
if c in ".+^${}()|[]\\":
re += '\\'
re += c
i += 1
re += "$"
print(f"Input glob: {glob}")
print(f"Generated regex: {re}")
print(f"Expected regex for correct nested handling: ^(?:a|(?:b|c))\\.txt$")
print(f"This regex would incorrectly match: {'{b,c}'}.txt (literal braces)")
EOFRepository: BoundaryML/baml
Length of output: 250
🏁 Script executed:
#!/bin/bash
# One more check: does the negation implementation work correctly?
# Just verify the negation code path exists and is syntactically correct
sed -n '132,139p' baml_language/crates/sys_native/src/glob_utils.rsRepository: BoundaryML/baml
Length of output: 345
Nested {...} alternation silently produces incorrect regex.
The brace handler at lines 91–93 and 100 pushes inner { and } as literal characters (when depth > 1), and only converts , to | at depth == 1. This means a pattern like {a,{b,c}}.txt generates the regex (?:a|{b,c})\.txt, which matches the literal string "{b,c}.txt" instead of "a.txt", "b.txt", or "c.txt".
The identical code also appears in baml_language/crates/bridge_wasm/src/wasm_io_glob.rs (lines 195–227). The TODO comment there (lines 128–133) already suggests consolidating both implementations into a shared crate to avoid duplication and reduce maintenance burden.
Recommended fixes:
- Reject nested braces with a clear parse error, or
- Recursively emit nested
(?:b|c)groups for inner alternations.
Add unit tests covering nested braces and negation (currently untested):
💚 Suggested tests
#[test]
fn nested_alternation() {
let g = GlobPattern::new("{a,{b,c}}.txt").unwrap();
assert!(g.is_match("a.txt"));
assert!(g.is_match("b.txt"));
assert!(g.is_match("c.txt"));
assert!(!g.is_match("d.txt"));
}
#[test]
fn negation() {
let g = GlobPattern::new("!*.ts").unwrap();
assert!(g.is_match("foo.rs"));
assert!(!g.is_match("foo.ts"));
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/sys_native/src/glob_utils.rs` around lines 85 - 118, The
brace-handling code in glob_utils.rs (the '{' match branch) currently treats
nested braces as literal characters causing patterns like "{a,{b,c}}.txt" to
compile to incorrect regex; update the handler to either return a parse error
for nested braces or recursively parse inner brace groups and emit nested
non-capturing alternation groups (i.e., produce "(?:a|(?:b|c))"), ensure commas
at any depth are converted to '|' for the correct depth when recursively
building the alts string, and mirror the same fix in the duplicate
implementation in bridge_wasm/src/wasm_io_glob.rs; add unit tests using
GlobPattern::new for nested_alternation and negation as suggested to validate
both behaviors.
| SysOpOutput::async_op(async move { | ||
| let handle = downcast_glob_handle(&glob)?; | ||
|
|
||
| // Parse root: either a string or a ScanOptions instance | ||
| let (cwd, dot, absolute, follow_symlinks, throw_on_broken, only_files) = match &root { | ||
| BexExternalValue::String(s) => (s.clone(), false, false, false, false, true), | ||
| BexExternalValue::Instance { fields, .. } => { | ||
| let cwd = fields | ||
| .get("cwd") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::String(s) = v { | ||
| Some(s.clone()) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or_else(|| ".".to_string()); | ||
| let dot = fields | ||
| .get("dot") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::Bool(b) = v { | ||
| Some(*b) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or(false); | ||
| let absolute = fields | ||
| .get("absolute") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::Bool(b) = v { | ||
| Some(*b) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or(false); | ||
| let follow_symlinks = fields | ||
| .get("follow_symlinks") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::Bool(b) = v { | ||
| Some(*b) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or(false); | ||
| let throw_on_broken = fields | ||
| .get("throw_error_on_broken_symlink") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::Bool(b) = v { | ||
| Some(*b) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or(false); | ||
| let only_files = fields | ||
| .get("only_files") | ||
| .and_then(|v| { | ||
| if let BexExternalValue::Bool(b) = v { | ||
| Some(*b) | ||
| } else { | ||
| None | ||
| } | ||
| }) | ||
| .unwrap_or(true); | ||
| (cwd, dot, absolute, follow_symlinks, throw_on_broken, only_files) | ||
| } | ||
| _ => { | ||
| return Err(OpErrorKind::Other( | ||
| "scan argument must be a string or ScanOptions".into(), | ||
| )) | ||
| } | ||
| }; |
There was a problem hiding this comment.
ScanOptions parsing: verbose duplication + misleading throw_on_broken semantics.
Two minor observations on the scan implementation:
- The six
fields.get(...).and_then(...).unwrap_or(...)blocks are structurally identical and hurt readability. Consider small helpers for bool/string field extraction:
♻️ Optional refactor
fn get_bool(fields: &IndexMap<String, BexExternalValue>, key: &str, default: bool) -> bool {
match fields.get(key) {
Some(BexExternalValue::Bool(b)) => *b,
_ => default,
}
}
fn get_string(fields: &IndexMap<String, BexExternalValue>, key: &str, default: &str) -> String {
match fields.get(key) {
Some(BexExternalValue::String(s)) => s.clone(),
_ => default.to_string(),
}
}throw_error_on_broken_symlinkis wired toif throw_on_broken { return Err(...) }for everywalkdirerror, not just broken-symlink errors. Permission-denied, IO failures, etc. will all surface under this flag. Additionally, withfollow_links=false, a dangling symlink typically does not produce awalkdir::Errorat all — the entry is yielded successfully withfile_type().is_symlink() == true— so the option may not actually fire on broken symlinks in its most natural usage. Either rename (e.g.throw_on_walk_error) or detect broken-symlink errors specifically viae.io_error()+io::ErrorKind::NotFoundand target the option accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/sys_native/src/io_impls.rs` around lines 567 - 641, The
ScanOptions parsing in the SysOpOutput::async_op block duplicates six identical
fields.get(...).and_then(...).unwrap_or(...) patterns and misuses
throw_error_on_broken_symlink: extract string/bool fields via small helpers
(e.g., get_string(fields, "cwd", ".") and get_bool(fields, "dot", false)) and
replace the repeated matches in the root match arm; then adjust the walkdir
error handling so throw_error_on_broken_symlink only triggers for genuine
broken-symlink errors by inspecting walkdir::Error::io_error() and matching
io::ErrorKind::NotFound (or else rename the flag to throw_on_walk_error if you
intend to cover all walk errors), updating the throw_on_broken variable and its
checks accordingly.
Binary size checks passed✅ 7 passed
Generated by |
Merging this PR will improve performance by 15.33%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | bench_lexer_only_simple |
24.3 µs | 21.1 µs | +15.33% |
Comparing aaron/add-readdir-function-to-bamlfs-module (f54a106) with canary (96b6edb)
Footnotes
-
105 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
baml_language/crates/baml_compiler_parser/src/parser.rs (1)
5210-5233: Add parity tests for normal/byte strings, and confirm full lib test run.Nice regression tests for raw strings. Since
parse_stringandparse_byte_stringwere also changed, add equivalent tests for"//","*/",b"//", andb"*/"to lock behavior across all touched paths. Also please confirmcargo test --libwas run for this Rust change.Suggested test additions
+ #[test] + fn string_keeps_comment_markers_as_text() { + let source = r#" +function Demo() -> string { + "//" + "*/" +} +"#; + + let (_root, errors) = parse_source(source); + assert_no_errors(&errors); + } + + #[test] + fn byte_string_keeps_comment_markers_as_text() { + let source = r#" +function Demo() -> string { + b"//" + b"*/" +} +"#; + + let (_root, errors) = parse_source(source); + assert_no_errors(&errors); + }As per coding guidelines: "Always run
cargo test --libif you changed any Rust code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_compiler_parser/src/parser.rs` around lines 5210 - 5233, Add matching unit tests for normal and byte strings mirroring the raw string tests: create tests similar to raw_string_keeps_line_comment_marker_as_text and raw_string_keeps_block_comment_end_marker_as_text but using normal string literals and byte string literals (e.g., assert parse_source on "\"//\"" and "\"*/\"" and b"\"//\"" and b"\"*/\"" or equivalent tests that exercise parse_string and parse_byte_string) so the behavior of parse_string and parse_byte_string is covered; place them near the existing tests that call parse_source and assert_no_errors to keep parity, and then run cargo test --lib to verify the full library test suite passes.baml_language/crates/sys_types/src/lib.rs (1)
337-355: Add a unit test forinto_result_mapped's async/error path.This helper now sits on the serialization path for list returns, but there’s no coverage here that the
Asyncbranch both appliesfand wraps failures withOpError::new(op, ...). A small local test would lock that down cheaply.As per coding guidelines,
**/*.rs: Prefer writing Rust unit tests over integration tests where possible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/sys_types/src/lib.rs` around lines 337 - 355, Add a unit test that exercises SysOpOutput::into_result_mapped's Async branch to ensure the mapping function f is applied to successful values and failures are wrapped with OpError::new(op, ...); specifically, create an async-ready SysOpOutput::Async (pin a future that returns Ok or Err), call into_result_mapped with a simple f -> BexExternalValue conversion and a concrete SysOp, await the resulting SysOpResult::Async, and assert that Ok paths produce the mapped BexExternalValue and Err paths produce an OpError whose op and kind match expectations; place the test in the same lib.rs tests module so it runs as a unit test.baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs (1)
1425-1461: Useemit_into_result_callon the class-method glue path too.The new helper only gets wired into
emit_free_fn_glue;emit_glue_method()still hardcodes.into_result(...). The next class method returningClass[]will fall back to the old orphan-rule problem, so it's worth routing both generators through the same helper now.Also applies to: 1473-1547
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs` around lines 1425 - 1461, emit_glue_method currently hardcodes `.into_result(...)`, so class-method glue doesn't use the new list-mapping logic in emit_into_result_call and will hit the orphan-rule bug for Class[] returns; update the class-method path to call emit_into_result_call instead of appending `.into_result(...)`. Specifically, in emit_glue_method replace the hardcoded call that uses SysOp::`#variant_ident` (the same pattern used in emit_free_fn_glue) with a call to emit_into_result_call(return_type, variant_ident, call_expr, class_ns_map, paths) so List(Named(...)) returns are handled via into_result_mapped; ensure the same variables (return_type, variant_ident, call_expr, class_ns_map, paths) are available/forwarded to match the free-fn glue usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs`:
- Around line 330-356: The mkdir implementation currently treats any existing
path as "exists" without checking whether it's a directory; change checks that
call self.vfs().vfs_exists(&path) to instead query the actual type (e.g.,
self.vfs().vfs_is_dir(&path) or vfs_metadata and inspect the file type) so that:
for recursive=true you return success only if the existing path is a directory
and otherwise return an error indicating "Not a directory" (or similar), and for
recursive=false you still error when the path exists but make the error specific
if it exists and is not a directory; likewise when validating the parent path
(the parent_path(&path) check) ensure you verify the parent is an existing
directory (not just exists). Update the same logic in the other block referenced
(lines ~359-380) to use these directory-aware checks (replace vfs_exists usages
with directory/type checks and adjust error messages accordingly).
In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs`:
- Around line 61-104: The scan implementation ignores the _root parameter
causing matches outside the requested root to leak; fix by extracting a root
path string from the _root BexExternalValue (similar to how you downcast
glob._handle to String) and enforce the root when filtering results from
self.vfs.vfs_read_many(&pattern): either call a vfs_read_many variant that
accepts a root if available, or normalize both root and each returned path and
skip any path that is not under the root before applying glob_regex_matches;
update scan to perform this root-prefix/path-contains check (use
functions/values glob._handle, _root, self.vfs.vfs_read_many,
glob_regex_matches, and the local pattern/re variables) so output matches native
behavior.
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 170-186: The mock readMany currently ignores its "glob" parameter;
update the JS function created in wasm_fs_glob.rs (the Function passed to
js_sys::Reflect::set for "readMany") to filter Object.keys(this._files) by the
incoming glob argument before pushing results. Convert the glob string into a
RegExp (e.g., replace "**" -> ".*", "*" -> "[^/]*", escape other regex
metacharacters) or otherwise match the pattern, and only push [key,
TextEncoder().encode(value)] when the key matches the compiled pattern so tests
for "**/*.txt" vs "**/*.rs" exercise distinct behavior.
In `@baml_language/crates/sys_native/src/io_impls.rs`:
- Around line 669-679: The Glob.matches implementation currently forwards the
raw path to downcast_glob_handle/...handle.is_match which skips the
normalization done in scan; modify the matches function to normalize Windows
separators by replacing '\' with '/' on the incoming path before calling
handle.is_match (preserve ownership of String), so downcast_glob_handle(&glob)
and SysOpOutput::ok(handle.is_match(&normalized_path)) are used with the
normalized path and existing error handling remains unchanged.
---
Nitpick comments:
In `@baml_language/crates/baml_builtins2_codegen/src/codegen_io.rs`:
- Around line 1425-1461: emit_glue_method currently hardcodes
`.into_result(...)`, so class-method glue doesn't use the new list-mapping logic
in emit_into_result_call and will hit the orphan-rule bug for Class[] returns;
update the class-method path to call emit_into_result_call instead of appending
`.into_result(...)`. Specifically, in emit_glue_method replace the hardcoded
call that uses SysOp::`#variant_ident` (the same pattern used in
emit_free_fn_glue) with a call to emit_into_result_call(return_type,
variant_ident, call_expr, class_ns_map, paths) so List(Named(...)) returns are
handled via into_result_mapped; ensure the same variables (return_type,
variant_ident, call_expr, class_ns_map, paths) are available/forwarded to match
the free-fn glue usage.
In `@baml_language/crates/baml_compiler_parser/src/parser.rs`:
- Around line 5210-5233: Add matching unit tests for normal and byte strings
mirroring the raw string tests: create tests similar to
raw_string_keeps_line_comment_marker_as_text and
raw_string_keeps_block_comment_end_marker_as_text but using normal string
literals and byte string literals (e.g., assert parse_source on "\"//\"" and
"\"*/\"" and b"\"//\"" and b"\"*/\"" or equivalent tests that exercise
parse_string and parse_byte_string) so the behavior of parse_string and
parse_byte_string is covered; place them near the existing tests that call
parse_source and assert_no_errors to keep parity, and then run cargo test --lib
to verify the full library test suite passes.
In `@baml_language/crates/sys_types/src/lib.rs`:
- Around line 337-355: Add a unit test that exercises
SysOpOutput::into_result_mapped's Async branch to ensure the mapping function f
is applied to successful values and failures are wrapped with OpError::new(op,
...); specifically, create an async-ready SysOpOutput::Async (pin a future that
returns Ok or Err), call into_result_mapped with a simple f -> BexExternalValue
conversion and a concrete SysOp, await the resulting SysOpResult::Async, and
assert that Ok paths produce the mapped BexExternalValue and Err paths produce
an OpError whose op and kind match expectations; place the test in the same
lib.rs tests module so it runs as a unit test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: b82e0397-d7fd-474b-9f89-58d7e67b720e
⛔ Files ignored due to path filters (16)
baml_language/Cargo.lockis excluded by!**/*.lockbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/__baml_std__/baml_tests____baml_std____06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/builtin_io/baml_tests__builtin_io__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__03_hir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_tir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__06_codegen.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/src/compiler2_tir/snapshots/baml_tests__compiler2_tir__phase5__snapshot_baml_package_items.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/tests/bytecode_format/snapshots/bytecode_format__bytecode_display_expanded_unoptimized.snapis excluded by!**/*.snap
📒 Files selected for processing (10)
baml_language/crates/baml_builtins2_codegen/src/codegen_io.rsbaml_language/crates/baml_compiler_parser/src/parser.rsbaml_language/crates/bridge_wasm/src/wasm_fs.rsbaml_language/crates/bridge_wasm/src/wasm_io_fs.rsbaml_language/crates/bridge_wasm/src/wasm_io_glob.rsbaml_language/crates/bridge_wasm/tests/wasm_fs_glob.rsbaml_language/crates/sys_native/src/glob_utils.rsbaml_language/crates/sys_native/src/io_impls.rsbaml_language/crates/sys_ops/src/lib.rsbaml_language/crates/sys_types/src/lib.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- baml_language/crates/bridge_wasm/src/wasm_fs.rs
- baml_language/crates/sys_native/src/glob_utils.rs
- baml_language/crates/sys_ops/src/lib.rs
| if !options.recursive { | ||
| match self.vfs().vfs_exists(&path) { | ||
| Ok(true) => { | ||
| return SysOpOutput::err(OpErrorKind::Other(format!( | ||
| "Directory already exists: {path}" | ||
| ))); | ||
| } | ||
| Ok(false) => {} | ||
| Err(e) => return SysOpOutput::err(js_err(&e)), | ||
| } | ||
|
|
||
| if let Some(parent) = parent_path(&path) { | ||
| match self.vfs().vfs_exists(&parent) { | ||
| Ok(true) => {} | ||
| Ok(false) => { | ||
| return SysOpOutput::err(OpErrorKind::Other(format!( | ||
| "Parent directory does not exist: {parent}" | ||
| ))); | ||
| } | ||
| Err(e) => return SysOpOutput::err(js_err(&e)), | ||
| } | ||
| } | ||
|
|
||
| return match self.vfs().vfs_create_dir(&path) { | ||
| Ok(()) => SysOpOutput::ok(()), | ||
| Err(e) => SysOpOutput::err(js_err(&e)), | ||
| }; |
There was a problem hiding this comment.
Don't treat any existing path as a directory in mkdir.
Both branches short-circuit on vfs_exists, so an existing file at path is treated as success for recursive: true and as "Directory already exists" for recursive: false. That diverges from native mkdir semantics and can let callers continue with a non-directory target.
Also applies to: 359-380
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/src/wasm_io_fs.rs` around lines 330 - 356,
The mkdir implementation currently treats any existing path as "exists" without
checking whether it's a directory; change checks that call
self.vfs().vfs_exists(&path) to instead query the actual type (e.g.,
self.vfs().vfs_is_dir(&path) or vfs_metadata and inspect the file type) so that:
for recursive=true you return success only if the existing path is a directory
and otherwise return an error indicating "Not a directory" (or similar), and for
recursive=false you still error when the path exists but make the error specific
if it exists and is not a directory; likewise when validating the parent path
(the parent_path(&path) check) ensure you verify the parent is an existing
directory (not just exists). Update the same logic in the other block referenced
(lines ~359-380) to use these directory-aware checks (replace vfs_exists usages
with directory/type checks and adjust error messages accordingly).
| fn scan( | ||
| &self, | ||
| _h: &Arc<BexHeap>, | ||
| _c: CallId, | ||
| glob: owned::glob::Glob, | ||
| _root: BexExternalValue, | ||
| _ctx: &SysOpContext, | ||
| ) -> SysOpOutput<Vec<String>> { | ||
| // Downcast the handle to get the raw pattern string. | ||
| let pattern = match glob._handle.clone().downcast::<String>() { | ||
| Ok(p) => (*p).clone(), | ||
| Err(_) => { | ||
| return SysOpOutput::err(OpErrorKind::Other( | ||
| "Invalid glob handle: expected String pattern".into(), | ||
| )); | ||
| } | ||
| }; | ||
| let (re, negated) = match glob_to_regex(&pattern) { | ||
| Ok(glob) => glob, | ||
| Err(e) => return SysOpOutput::err(OpErrorKind::Other(e)), | ||
| }; | ||
|
|
||
| // Delegate to WasmVfs.readMany which accepts a glob string and returns | ||
| // `[string, Uint8Array][]`. Filter with the same Rust matcher used by | ||
| // matches() so the host and native implementations agree on patterns. | ||
| match self.vfs.vfs_read_many(&pattern) { | ||
| Ok(arr) => { | ||
| let paths: Vec<String> = arr | ||
| .iter() | ||
| .filter_map(|item| { | ||
| // Each item is [path: string, data: Uint8Array] | ||
| let pair: js_sys::Array = item.dyn_into().ok()?; | ||
| pair.get(0).as_string() | ||
| }) | ||
| .filter(|path| glob_regex_matches(&re, negated, path)) | ||
| .collect(); | ||
| SysOpOutput::ok(paths) | ||
| } | ||
| Err(e) => { | ||
| let msg = e.as_string().unwrap_or_else(|| format!("{e:?}")); | ||
| SysOpOutput::err(OpErrorKind::Other(msg)) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Glob.scan(root) is currently ignoring root.
scan() never uses the _root argument; it calls vfs_read_many(&pattern) and then regex-filters whatever comes back. That means matches outside the requested root can leak into the result set, unlike the native implementation that walks from root.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/src/wasm_io_glob.rs` around lines 61 - 104,
The scan implementation ignores the _root parameter causing matches outside the
requested root to leak; fix by extracting a root path string from the _root
BexExternalValue (similar to how you downcast glob._handle to String) and
enforce the root when filtering results from self.vfs.vfs_read_many(&pattern):
either call a vfs_read_many variant that accepts a root if available, or
normalize both root and each returned path and skip any path that is not under
the root before applying glob_regex_matches; update scan to perform this
root-prefix/path-contains check (use functions/values glob._handle, _root,
self.vfs.vfs_read_many, glob_regex_matches, and the local pattern/re variables)
so output matches native behavior.
| fn matches( | ||
| &self, | ||
| _heap: &Arc<BexHeap>, | ||
| _call_id: CallId, | ||
| glob: owned::glob::Glob, | ||
| path: String, | ||
| _ctx: &SysOpContext, | ||
| ) -> SysOpOutput<bool> { | ||
| match downcast_glob_handle(&glob) { | ||
| Ok(handle) => SysOpOutput::ok(handle.is_match(&path)), | ||
| Err(e) => SysOpOutput::err(e), |
There was a problem hiding this comment.
Normalize path separators in Glob.matches for parity with scan
scan normalizes \ → / before matching, but matches does not. This can make Glob.matches(path) fail for Windows-style input paths that would otherwise match in scan.
💡 Proposed fix
fn matches(
&self,
_heap: &Arc<BexHeap>,
_call_id: CallId,
glob: owned::glob::Glob,
path: String,
_ctx: &SysOpContext,
) -> SysOpOutput<bool> {
match downcast_glob_handle(&glob) {
- Ok(handle) => SysOpOutput::ok(handle.is_match(&path)),
+ Ok(handle) => {
+ let normalized = path.replace('\\', "/");
+ SysOpOutput::ok(handle.is_match(&normalized))
+ }
Err(e) => SysOpOutput::err(e),
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/sys_native/src/io_impls.rs` around lines 669 - 679, The
Glob.matches implementation currently forwards the raw path to
downcast_glob_handle/...handle.is_match which skips the normalization done in
scan; modify the matches function to normalize Windows separators by replacing
'\' with '/' on the incoming path before calling handle.is_match (preserve
ownership of String), so downcast_glob_handle(&glob) and
SysOpOutput::ok(handle.is_match(&normalized_path)) are used with the normalized
path and existing error handling remains unchanged.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs`:
- Around line 439-442: The test function MkdirRecursive currently only checks
baml.fs.exists("/generated/nested"), which won't fail if the bridge creates just
the child; update the test to also assert that the parent directory exists by
calling baml.fs.exists("/generated") (or equivalent assert on baml.fs.exists for
"/generated") after the mkdir call so the recursive:true behavior is actually
verified; locate MkdirRecursive and the baml.fs.mkdir / baml.fs.exists calls to
add the additional existence check.
- Around line 454-457: The test uses an absolute pattern and scan("/") so it
will pass even if the wasm bridge ignores the scan(root) argument; update the
fixture in GlobScanTxt to use a relative pattern and a non-root scan target so
scan(root) must be forwarded: change the pattern passed to baml.glob.new from
"/workspace/data/**/*.txt" to "data/**/*.txt" and call glob.scan with a non-root
path such as "/workspace" (reference: function GlobScanTxt, baml.glob.new,
glob.scan).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: dbae7a94-2745-46ab-9e08-16d8dd9551b2
⛔ Files ignored due to path filters (3)
baml_language/crates/baml_tests/snapshots/testset_dynamic/baml_tests__testset_dynamic__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_vibes_nested/baml_tests__testset_vibes_nested__04_5_mir.snapis excluded by!**/*.snapbaml_language/crates/baml_tests/snapshots/testset_with_setup/baml_tests__testset_with_setup__04_5_mir.snapis excluded by!**/*.snap
📒 Files selected for processing (2)
baml_language/crates/bridge_wasm/src/lib.rsbaml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs
| function MkdirRecursive() -> bool { | ||
| baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true }); | ||
| baml.fs.exists("/generated/nested") | ||
| } |
There was a problem hiding this comment.
Assert parent creation so recursive is actually covered.
This still passes if the bridge only creates "/generated/nested" and ignores the recursive parent-creation behavior. Check "/generated" too so the test fails when recursive: true is dropped.
🧪 Minimal test tweak
function MkdirRecursive() -> bool {
baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true });
- baml.fs.exists("/generated/nested")
+ baml.fs.exists("/generated") && baml.fs.exists("/generated/nested")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function MkdirRecursive() -> bool { | |
| baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true }); | |
| baml.fs.exists("/generated/nested") | |
| } | |
| function MkdirRecursive() -> bool { | |
| baml.fs.mkdir("/generated/nested", baml.fs.MkdirOptions { recursive: true }); | |
| baml.fs.exists("/generated") && baml.fs.exists("/generated/nested") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs` around lines 439 -
442, The test function MkdirRecursive currently only checks
baml.fs.exists("/generated/nested"), which won't fail if the bridge creates just
the child; update the test to also assert that the parent directory exists by
calling baml.fs.exists("/generated") (or equivalent assert on baml.fs.exists for
"/generated") after the mkdir call so the recursive:true behavior is actually
verified; locate MkdirRecursive and the baml.fs.mkdir / baml.fs.exists calls to
add the additional existence check.
| function GlobScanTxt() -> string[] { | ||
| let glob = baml.glob.new("/workspace/data/**/*.txt"); | ||
| glob.scan("/") | ||
| } |
There was a problem hiding this comment.
This fixture does not prove scan(root) is forwarded.
With an absolute pattern plus scan("/"), the test can still pass if the wasm bridge ignores the root argument entirely. Use a relative glob and a non-root scan target so this exercises the new API contract.
🧪 Minimal test tweak
function GlobScanTxt() -> string[] {
- let glob = baml.glob.new("/workspace/data/**/*.txt");
- glob.scan("/")
+ let glob = baml.glob.new("**/*.txt");
+ glob.scan("/workspace/data")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@baml_language/crates/bridge_wasm/tests/wasm_fs_glob.rs` around lines 454 -
457, The test uses an absolute pattern and scan("/") so it will pass even if the
wasm bridge ignores the scan(root) argument; update the fixture in GlobScanTxt
to use a relative pattern and a non-root scan target so scan(root) must be
forwarded: change the pattern passed to baml.glob.new from
"/workspace/data/**/*.txt" to "data/**/*.txt" and call glob.scan with a non-root
path such as "/workspace" (reference: function GlobScanTxt, baml.glob.new,
glob.scan).


Artifacts | Task
What problems was I solving
BAML had no built-in way to list directory contents or create directories from user code. Users needed
read_dir,mkdir, and glob pattern matching (baml.glob.new("**/*.ts").scan(root)) to build file-processing workflows. Additionally, the BAML parser had a long-standing bug where*/inside string literals (e.g."**/*.ts") was incorrectly tokenized as a block-comment end, making glob patterns with recursive wildcards impossible to express.What user-facing changes did I ship
baml.fs.read_dir(path)- Lists directory entries, returningDirEntry[]withname,is_dir,is_file,is_symlinkfieldsbaml.fs.mkdir(path, options)- Creates directories with optionalrecursive: truefor nested creationbaml.glob.new(pattern)- Creates a glob pattern object supporting*,**,?,[...],{a,b}syntaxGlob.scan(root)/Glob.matches(path)- Scans filesystem or tests paths against glob patterns*/or//now parse correctly (previously broke block-comment tokenization)How I implemented it
Codegen infrastructure (Phase 1)
crates/baml_builtins2_codegen/src/codegen_io.rs- Addedemit_into_result_callhelper forVec<ClassName>return types that can't implementAsBexExternalValuedue to orphan rules. Uses newinto_result_mappedonSysOpOutputto convert element-by-element. AddedBamlType::Listarm toemit_result_conversionfor deserializing arrays back fromBexExternalValue::Array.crates/sys_types/src/lib.rs- AddedSysOpOutput::into_result_mapped()for custom value mapping when the return type doesn't directly implementAsBexExternalValue.BAML declarations (Phase 2-3)
crates/baml_builtins2/baml_std/baml/ns_fs/fs.baml- AddedDirEntryclass,MkdirOptionsclass,read_dirfunction, andmkdirfunction declarations.crates/baml_builtins2/baml_std/baml/ns_glob/glob.baml- New file:ScanOptionsclass,Globclass with_handle,scan/matchesmethods, andnewfunction.crates/baml_builtins2/src/lib.rs- Registeredns_glob/glob.bamlas a builtin module.Native implementations (Phase 2-3)
crates/sys_native/src/io_impls.rs-read_dirusingtokio::fs::read_dir,mkdirusingtokio::fs::create_dir[_all], globnew/scan/matchesusingwalkdir+ customGlobPattern.crates/sys_native/src/glob_utils.rs- New file:GlobPatternstruct converting glob syntax to regex, supporting**,*,?,[...],{a,b}patterns. Includes unit tests.crates/sys_ops/src/lib.rs-DefaultIoOpsstubs returningUnsupportedfor all new operations.IoSysOpsBuilderwiring forwith_fs_instance(read_dir, mkdir) and newwith_glob_instancemethod.WASM implementations (Phase 4)
crates/bridge_wasm/src/wasm_io_fs.rs- New file:WasmIoFsdelegating fs operations to the JS VFS layer.crates/bridge_wasm/src/wasm_io_glob.rs- New file: WASM glob implementation with inline glob-to-regex converter.crates/bridge_wasm/src/wasm_fs.rs- Addedpub(crate)accessor methods onWasmVfsfor cross-module use. ChangedWasmFs::newto acceptArc<WasmVfs>.crates/bridge_wasm/src/lib.rs- Wired.with_fs_instance()and.with_glob_instance()intoSysOpsBuilder.Parser bugfix
crates/baml_compiler_parser/src/parser.rs- Addedat_end_raw()method. Changedparse_string,parse_byte_string, andparse_raw_string_contentto useat_end_raw()instead ofat_end(), preventing*/and//inside string literals from being misinterpreted as comment delimiters.Tests
crates/baml_tests/tests/fs.rs- 8 new tests:fs_read_dir_returns_entries,fs_read_dir_type_flags,fs_read_dir_nonexistent_errors,fs_read_dir_empty_dir,fs_mkdir_creates_directory,fs_mkdir_recursive_creates_parents,fs_mkdir_non_recursive_errors_when_parent_missing,fs_mkdir_recursive_is_idempotent.crates/baml_tests/tests/glob.rs- 9 glob tests includingglob_matches_recursive_wildcard(exercises the parser fix with"**/*.ts").crates/baml_tests/projects/builtin_io/fs_ops.baml- AddedReadDir()function for CST/pipeline snapshot coverage.__baml_std__,builtin_io, and other test projects.Deviations from the plan
Implemented as planned
DirEntry,MkdirOptions,read_dir,mkdirdeclarations and implementations match the planGlobPatternwith glob-to-regex conversion,walkdir-based scan, native + WASM implementationsDeviations/surprises
Glob.matchrenamed toGlob.matchesto avoid Rust keyword escaping (r#match) throughout the codebaseemit_into_result_callcodegen helper for the serialization direction (Vec<owned::T>->BexExternalValue::Array), which the plan only addressed for the deserialization directionWasmVfssharing viaArcinstead of clone, requiring a signature change toWasmFs::newWasmVfsaccessor methods added aspub(crate)wrappers sincewasm_bindgen-generated methods are module-privateGlobPatterndropsoriginalfield — only stores the compiledRegex, not the source pattern stringbaml.fs.DirEntry[],baml.fs.MkdirOptions) rather than bare namesAdditions not in plan
*/inside string literals (at_end_raw()inparse_string,parse_byte_string,parse_raw_string_content)glob_utils.rsunit tests (4 tests for the glob-to-regex converter)ReadDir()snapshot test function inbuiltin_io/fs_ops.bamlfor full pipeline coverageglob_utils.rshandles[!...]character class negation (POSIX-compatible edge case)Items planned but not implemented
fs_read_dir_returns_entries— omitted in favor of runtime-only assertions**/*.txtglob scan integration test — recursive matching covered byglob_utils.rsunit tests andglob_matches_recursive_wildcardbut not a full scan integration testHow to verify it
Setup
git fetch git checkout aaron/add-readdir-function-to-bamlfs-module cd baml_languageAutomated Tests
Manual Testing
baml.fs.read_dirreturns correct entries with type flagsbaml.fs.mkdircreates nested directories withrecursive: truebaml.glob.new("**/*.ts").matches("src/index.ts")returns true*/parse correctly (was previously broken)Description for the changelog
Add
baml.fs.read_dir,baml.fs.mkdir, andbaml.globnamespace with pattern matching and filesystem scanning. Fix parser bug where*/inside string literals was incorrectly treated as block-comment end.Note
Medium Risk
Adds new filesystem- and glob-related sysops across native and WASM runtimes and changes parser behavior for string literals, which could affect file I/O behavior and parsing edge-cases. Coverage is improved with new integration/unit tests and updated snapshots, but the runtime-wiring/codegen changes broaden the blast radius.
Overview
Adds new BAML builtins for filesystem and glob workflows:
baml.fs.read_dir()(returnsDirEntry[]) andbaml.fs.mkdir()(withMkdirOptions.recursive), plus a newbaml.globnamespace withnew(pattern),Glob.scan(...), andGlob.matches(...).Implements these ops in
sys_native(Tokio +walkdir+ glob-to-regex viaregex) and wires them throughsys_ops/builder plumbing;bridge_wasmgains JS-VFS-backedfssupport and a WASMglobimplementation, withWasmVfssharing switched toArcfor reuse across namespaces.Fixes a parser bug where comment delimiters like
*/or//inside string/byte/raw string literals could prematurely terminate parsing by using raw-token end checks. Updates codegen/value-conversion to support returning lists of generated classes from IO ops, and adds/updates extensive tests and snapshots for the new APIs.Reviewed by Cursor Bugbot for commit 68d9108. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by CodeRabbit
New Features
Tests