feat(jsonish): handle special unicode quote chars by sxlijin · Pull Request #3381 · BoundaryML/baml

sxlijin · 2026-04-17T22:20:57Z

User report in #3307: opus-4.6 returned a JSON object with interestingly-quoted german, which in turn caused the outer parses to fail:

"intent": {
  "reasoning": "Blindtext „eins zwei drei", um den eigentlichen Inhalt zu verdecken.",
  <more fields>
},

The rule that jsonish uses is that if there are an even number of " chars, it ingests the entire string, but if there's an odd number of " chars, it's ambiguous where to terminate the string and therefore just chooses the first one. This however doesn't work in this case, where there's a German string start quote paired with an ascii string start/end quote.

Solution: change string value parsing to use either the existing strategy or a new string parsing strategy, where any ascii or unicode quote character (any of "«„ etc) is allowed to contribute to the parity count.

This will allow us to handle the user's case, but also allow jsonish to parse something like "items": ["„eins", "zwei"] (which, to a human, has a very obvious parse, and therefore should have an obvious parse in jsonish).

Alternatives considered

option A (the one I decided on)
- use one of either existing ascii-quote strategy or any-ascii-or-unicode-quote
- somewhat easy to explain to a user trying to understand a jsonish/SAP parse result
option B:
- any-ascii-or-unicode-quote
- doesn't handle the "items": ["„eins", "zwei"] case
option C:
- 2d table, here are chars that can start strings, here are chars that can end strings (e.g. «asdf« is invalid, but german-start-ascii-end is valid)
- problem: making decisions about the pairs is time-consuming and edge case intensive without obvious answers
- problem: parsing nested strings would be really brittle: need to maintain a stack, push start delimiters on, and popping while supporting malformed strings gets complicated
- we could create multiple candidates
  problem: when explaining a jsonish/SAP parse result to a user, this would be impossible to explain
option D:
- 2d table but with hierarchical parsing - "items": ["„eins", "zwei"] should parse correctly
- same complexity problems as option C
option E: don't support this

Summary by CodeRabbit

New Features
- Improved parsing and automatic recovery for strings containing Unicode quote characters using a Unicode-aware detection and two-pass strategy, preferring Unicode-aware fixes while preserving ASCII-safe behavior and existing error handling.
- Typographic apostrophes and other Unicode quote marks are preserved without breaking parsing.
Tests
- Added regression and coverage tests for Unicode-quote recovery across single values, lists, and class deserialization, plus ASCII-quote roundtrips.

vercel · 2026-04-17T22:21:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
beps	Ready	Preview, Comment	Apr 17, 2026 11:22pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
promptfiddle	Skipped		Apr 17, 2026 11:22pm

coderabbitai · 2026-04-17T22:24:25Z

📝 Walkthrough

Walkthrough

When fixes are allowed, parsing runs a strict ASCII-only pass and, if the input contains configured non-ASCII quote characters, a conditional Unicode-aware pass. Unicode candidates are prepended, deduplicated by structural Value equality, and if the strict pass errors the function will return Unicode results when present; otherwise the strict error is returned.

Changes

Cohort / File(s)	Summary
Entry / dual-pass logic `engine/baml-lib/jsonish/src/jsonish/parser/entry.rs`	Replace single fixing-parser call with two-pass strategy: always run `QuoteParityMode::AsciiOnly`, run `QuoteParityMode::AllUnicode` only when `contains_unicode_quote_char(str)` is true; merge Unicode-first, dedupe by `Value` equality, and apply existing branching/error fallback rules.
Quote-parity primitives & API `engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs`	Add public `QuoteParityMode` enum (`AsciiOnly`, `AllUnicode`), `UNICODE_QUOTE_CHARS` constant, and `contains_unicode_quote_char()`; change `parse` signature to accept `quote_parity` and thread it into token processing; update tests to call new API.
Parse state parity threading `engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs`	Thread `quote_parity` through `process_token`, `consume`, `should_close_unescaped_string`, and `update_quote_tracking`; under `AllUnicode`, count configured Unicode quote chars toward unescaped-quote parity.
Tests: reproduction & regressions `engine/baml-lib/jsonish/src/tests/test_class.rs`, `engine/baml-lib/jsonish/src/tests/test_lists.rs`	Add regression and coverage tests for malformed Unicode opener (U+201E), typographic apostrophe preservation, list parsing, and ASCII-quote cases validating the new two-pass/parity behavior.

Sequence Diagram

sequenceDiagram
    participant Entry as Entry Parser
    participant Detector as Unicode Detector
    participant Strict as Fixing Parser\n(AsciiOnly)
    participant Unicode as Fixing Parser\n(AllUnicode)
    participant Merger as Result Merger

    Entry->>Detector: contains_unicode_quote_char(input)?
    Detector-->>Entry: bool

    Entry->>Strict: parse(input, options, AsciiOnly)
    Strict-->>Entry: Result(strict_candidates / error)

    alt contains unicode quotes
        Entry->>Unicode: parse(input, options, AllUnicode)
        Unicode-->>Entry: Result(unicode_candidates / error)
    end

    Entry->>Merger: merge(unicode_candidates?, strict_candidates?)
    Merger->>Merger: prepend unicode, dedupe by Value equality
    Merger-->>Entry: merged candidates or chosen error

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Fix/streaming parser off by one #3202: Touches json_parse_state quote-closing logic and process_token, intersecting with this PR's parity-threading and quote-counting changes.

Poem

🐰 I hopped through quotes both plain and rare,
I counted marks in ASCII and those with flair,
I tried strict first, then Unicode on a whim,
I merged the wins and tossed duplicates slim,
Carrots and fixes, snug beneath the code-tree 🌙

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main objective of the pull request—adding support for special unicode quote character handling in the jsonish parser to fix parsing failures when Unicode and ASCII quotes are mixed.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch push-lukqolrwooux

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af82f854d1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs (1)
202-269: ⚠️ Potential issue | 🟠 Major

Don’t synthesize arrays from cross-pass alternatives.

After merging strict and unicode, items.len() > 1 can mean “alternative repairs of the same input”, not “multiple JSON objects were found”. The existing branch then adds Value::Array(items.clone(), ...), which can introduce a list candidate that was never present in the input and may be selected during list coercion. Preserve the “multiple JSON objects as a list” behavior per parser pass before merging, or track whether the merged items came from one pass before adding the array candidate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/entry.rs` around lines 202 - 269,
The bug is that after merging strict and unicode candidates (merged), the code
always synthesizes a Value::Array candidate for items.len() > 1 even when the
multiple items come from different passes; fix by preserving each item’s origin
and only synthesizing the array candidate when all items originated from the
same parser pass. Concretely: change the merged type to carry an origin tag
(e.g. enum Origin { Strict, Unicode }) so merged is Result<Vec<(Value,
Vec<Fixes>, Origin)>> (or keep parallel boolean flags), populate Origin when
constructing strict_items/unicode_items, and then in the multi-item branch only
create items_clone = Value::Array(...) and append it to items when
items.iter().all(|(_,_,o)| o == same_origin) (or when the original single-pass
vector was used). Keep the rest of the Value::FixedJson and Value::AnyOf
construction the same.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs`:
- Around line 85-94: The current AllUnicode branch counts any
UNICODE_QUOTE_CHARS (which includes single-curly quotes like U+2019) as
unescaped quote flips, causing ASCII double-quoted strings to treat apostrophes
like It’s as quote content and break parity; change the check so that when
tracking an ASCII double-quoted string we test membership against a
double-quote-specific unicode set (e.g. DOUBLE_QUOTE_UNICODE_CHARS) instead of
the broad UNICODE_QUOTE_CHARS. Add DOUBLE_QUOTE_UNICODE_CHARS (containing only
codepoints that should flip double-quote parity) in fixing_parser.rs and replace
the condition in the QuoteParityMode::AllUnicode branch that increments
string_quote_tracking.unescaped_quote_count to use
DOUBLE_QUOTE_UNICODE_CHARS.contains(&token) when the current ASCII quote is '"'
(leave UNICODE_QUOTE_CHARS for any fast-path uses that need broader detection).

---

Outside diff comments:
In `@engine/baml-lib/jsonish/src/jsonish/parser/entry.rs`:
- Around line 202-269: The bug is that after merging strict and unicode
candidates (merged), the code always synthesizes a Value::Array candidate for
items.len() > 1 even when the multiple items come from different passes; fix by
preserving each item’s origin and only synthesizing the array candidate when all
items originated from the same parser pass. Concretely: change the merged type
to carry an origin tag (e.g. enum Origin { Strict, Unicode }) so merged is
Result<Vec<(Value, Vec<Fixes>, Origin)>> (or keep parallel boolean flags),
populate Origin when constructing strict_items/unicode_items, and then in the
multi-item branch only create items_clone = Value::Array(...) and append it to
items when items.iter().all(|(_,_,o)| o == same_origin) (or when the original
single-pass vector was used). Keep the rest of the Value::FixedJson and
Value::AnyOf construction the same.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 15ff0bd4-278b-4fbc-9817-4076203992ae

📥 Commits

Reviewing files that changed from the base of the PR and between 4f68636 and af82f85.

📒 Files selected for processing (5)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs
engine/baml-lib/jsonish/src/tests/test_class.rs
engine/baml-lib/jsonish/src/tests/test_lists.rs

coderabbitai

🧹 Nitpick comments (2)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs (1)

185-192: Unicode-pass error is silently dropped.

fixing_parser::parse(..., AllUnicode).ok() discards any error from the unicode pass with no log breadcrumb, while the strict pass error is debug-logged downstream (Line 274). If the AllUnicode pass starts regressing (e.g., panics-turned-errors from new quote handling), you'll have no trace for inputs where strict still succeeds.

🔎 Suggested tweak

-            fixing_parser::parse(str, &options, QuoteParityMode::AllUnicode).ok()
+            fixing_parser::parse(str, &options, QuoteParityMode::AllUnicode)
+                .map_err(|e| {
+                    log::debug!("AllUnicode parity pass failed: {e:?}");
+                    e
+                })
+                .ok()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/entry.rs` around lines 185 - 192,
The AllUnicode parity parse currently swallows errors via
fixing_parser::parse(str, &options, QuoteParityMode::AllUnicode).ok(); change
this so parse's Err is not silently discarded: call fixing_parser::parse and, on
Err, emit a debug (or warn) log via log::debug!/log::warn! that includes the
error and context (e.g., the input indicator and that it was the AllUnicode
pass) before leaving unicode as None; on Ok keep the parsed result in the
unicode variable as before. Reference contains_unicode_quote_char,
fixing_parser::parse, QuoteParityMode::AllUnicode and the unicode variable to
locate where to add the error logging.

engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs (1)

18-22: Narrow visibility to pub(crate).

QuoteParityMode, contains_unicode_quote_char, and parse() are consumed only by entry.rs within the same crate. Exposing them as pub widens the crate's public API unnecessarily and makes future changes (e.g., adding QuoteParityMode variants) a breaking change.

🔧 Suggested changes

-pub enum QuoteParityMode {
+pub(crate) enum QuoteParityMode {
     AsciiOnly,
     AllUnicode,
 }

-pub fn contains_unicode_quote_char(s: &str) -> bool {
+pub(crate) fn contains_unicode_quote_char(s: &str) -> bool {
     s.chars().any(|c| UNICODE_QUOTE_CHARS.contains(&c))
 }

-pub fn parse(
+pub(crate) fn parse(
     str: &str,
     _options: &ParseOptions,
     quote_parity: QuoteParityMode,
 ) -> Result<Vec<(Value, Vec<Fixes>)>> {

Also applies to: 53-55

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs` around lines 18
- 22, The public visibility of internal items should be narrowed to crate-only:
change the enum QuoteParityMode to pub(crate) and likewise change the functions
contains_unicode_quote_char and parse to pub(crate) so they are only exposed
within the crate; update any references in the same module or entry.rs to use
the now-crate-visible names and run tests to ensure no external usage breaks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@engine/baml-lib/jsonish/src/jsonish/parser/entry.rs`:
- Around line 185-192: The AllUnicode parity parse currently swallows errors via
fixing_parser::parse(str, &options, QuoteParityMode::AllUnicode).ok(); change
this so parse's Err is not silently discarded: call fixing_parser::parse and, on
Err, emit a debug (or warn) log via log::debug!/log::warn! that includes the
error and context (e.g., the input indicator and that it was the AllUnicode
pass) before leaving unicode as None; on Ok keep the parsed result in the
unicode variable as before. Reference contains_unicode_quote_char,
fixing_parser::parse, QuoteParityMode::AllUnicode and the unicode variable to
locate where to add the error logging.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs`:
- Around line 18-22: The public visibility of internal items should be narrowed
to crate-only: change the enum QuoteParityMode to pub(crate) and likewise change
the functions contains_unicode_quote_char and parse to pub(crate) so they are
only exposed within the crate; update any references in the same module or
entry.rs to use the now-crate-visible names and run tests to ensure no external
usage breaks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d7891a84-3ae5-47d5-bba7-2eee6fdeb0ed

📥 Commits

Reviewing files that changed from the base of the PR and between af82f85 and cc06235.

📒 Files selected for processing (3)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs

🚧 Files skipped from review as they are similar to previous changes (1)

engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs

coderabbitai

🧹 Nitpick comments (3)

engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs (3)
45-61: Consider including U+2032/U+2033 (prime/double prime) — optional.

The table covers the major language conventions well. One minor gap: ″ (U+2033 DOUBLE PRIME) occasionally appears in LLM output as a substitute for ". Not required for the reported bug; mentioning only as a potential follow-up if you see further reports.

Also note ‘/’ (U+2018/U+2019) are single quotation marks — intentional inclusion since models do sometimes use them as ASCII ' stand-ins, but worth confirming this matches the should_close_unescaped_string semantics (where ASCII ' is not tracked). If single-quote parity is deliberately lumped into the "double-quoted-string parity" counter, a brief note in the doc comment above would help future readers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs` around lines 45
- 61, UNICODE_QUOTE_CHARS currently omits prime/double-prime characters; add
U+2033 (DOUBLE PRIME) and optionally U+2032 (PRIME) to the UNICODE_QUOTE_CHARS
array to better catch LLM output that uses ″/′ as substitutes for ASCII quotes,
and update the doc comment above UNICODE_QUOTE_CHARS to explicitly state why
U+2018/U+2019 (single quotes) are included and how that interacts with the
should_close_unescaped_string semantics so future readers understand whether
single-quote parity is treated as part of double-quoted-string parity.
163-283: Missing unit test for AllUnicode mode in this file.

All updated tests pass QuoteParityMode::AsciiOnly, so the new enum variant has no direct unit coverage in fixing_parser.rs itself — it's only exercised via higher-level tests in test_class.rs/test_lists.rs. A small unit test here (e.g., parsing "intent": { "reasoning": "Blindtext „eins zwei drei\", um …" } with AllUnicode) would pin the contract of this module and guard against regressions without relying on the entry cascade. As per coding guidelines: "Prefer writing Rust unit tests over integration tests where possible".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs` around lines 163
- 283, Add a unit test in the tests mod that covers QuoteParityMode::AllUnicode
by calling parse (the same way other tests do) with a JSON snippet containing
Unicode smart quotes and ellipses (e.g., a small object like {"intent":
{"reasoning": "Blindtext „eins zwei drei\", um …"}}) and assert the parsed Value
(use Value::Object / Value::String and CompletionState as appropriate); target
the parse function and use QuoteParityMode::AllUnicode instead of
QuoteParityMode::AsciiOnly to ensure the parser’s Unicode quote-handling branch
is exercised (name the test e.g. test_partial_unicode_mode or similar and follow
the existing pattern for assertions).
66-68: Micro-optimization available but not needed.

UNICODE_QUOTE_CHARS.contains(&c) is O(n) per char scan; with 15 entries and typical input sizes it's negligible, but if profiles ever show this hot, a matches!(c, '\u{00AB}' | '\u{00BB}' | …) generated alongside the const (or a small phf/sorted binary search) would inline better. Safe to defer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs` around lines 66
- 68, The current contains_unicode_quote_char function uses
UNICODE_QUOTE_CHARS.contains(&c) inside s.chars().any(...), which does an O(n)
scan per character; replace that check with a direct pattern match (e.g., use
matches!(c, '\u{00AB}' | '\u{00BB}' | ... ) listing the same 15 unicode quote
codepoints) so the per-char test in contains_unicode_quote_char is inlined and
constant-time; update the match to mirror the entries in UNICODE_QUOTE_CHARS and
keep the function signature contains_unicode_quote_char(s: &str) -> bool and its
callers unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs`:
- Around line 45-61: UNICODE_QUOTE_CHARS currently omits prime/double-prime
characters; add U+2033 (DOUBLE PRIME) and optionally U+2032 (PRIME) to the
UNICODE_QUOTE_CHARS array to better catch LLM output that uses ″/′ as
substitutes for ASCII quotes, and update the doc comment above
UNICODE_QUOTE_CHARS to explicitly state why U+2018/U+2019 (single quotes) are
included and how that interacts with the should_close_unescaped_string semantics
so future readers understand whether single-quote parity is treated as part of
double-quoted-string parity.
- Around line 163-283: Add a unit test in the tests mod that covers
QuoteParityMode::AllUnicode by calling parse (the same way other tests do) with
a JSON snippet containing Unicode smart quotes and ellipses (e.g., a small
object like {"intent": {"reasoning": "Blindtext „eins zwei drei\", um …"}}) and
assert the parsed Value (use Value::Object / Value::String and CompletionState
as appropriate); target the parse function and use QuoteParityMode::AllUnicode
instead of QuoteParityMode::AsciiOnly to ensure the parser’s Unicode
quote-handling branch is exercised (name the test e.g. test_partial_unicode_mode
or similar and follow the existing pattern for assertions).
- Around line 66-68: The current contains_unicode_quote_char function uses
UNICODE_QUOTE_CHARS.contains(&c) inside s.chars().any(...), which does an O(n)
scan per character; replace that check with a direct pattern match (e.g., use
matches!(c, '\u{00AB}' | '\u{00BB}' | ... ) listing the same 15 unicode quote
codepoints) so the per-char test in contains_unicode_quote_char is inlined and
constant-time; update the match to mirror the entries in UNICODE_QUOTE_CHARS and
keep the function signature contains_unicode_quote_char(s: &str) -> bool and its
callers unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: be24fd8b-7b0d-45b9-9dee-d63d26402a43

📥 Commits

Reviewing files that changed from the base of the PR and between cc06235 and 4dbe641.

📒 Files selected for processing (5)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs
engine/baml-lib/jsonish/src/tests/test_class.rs
engine/baml-lib/jsonish/src/tests/test_lists.rs

✅ Files skipped from review due to trivial changes (1)

engine/baml-lib/jsonish/src/tests/test_lists.rs

🚧 Files skipped from review as they are similar to previous changes (3)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
engine/baml-lib/jsonish/src/tests/test_class.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs (1)

462-490: ⚠️ Potential issue | 🟠 Major

Apply quote parity when the comma follows whitespace.

The immediate comma path checks closing_char_count, but the whitespace lookahead closes unconditionally on ,. Inputs like "Blindtext „eins" , um ..." still close early under AllUnicode.

Proposed fix

-                            ',' if in_object_value => return true,
-                            ',' | ']' if in_array => return true,
+                            ',' if in_object_value => return closing_char_count % 2 == 0,
+                            ',' if in_array => return closing_char_count % 2 == 0,
+                            ']' if in_array => return true,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs`
around lines 462 - 490, The whitespace lookahead branch incorrectly treats a
comma as an unconditional close; update the match arm inside the while-let in
the function/method that uses next and closing_char_count so that when
encountering ',' it applies the same parity check as the direct ',' arm (i.e.,
only return true if closing_char_count % 2 == 0) and still respects
in_object_value/in_array/in_object_key conditions; adjust the ',', ',' | ']' and
',' if in_array cases to use closing_char_count where appropriate so inputs like
`"Blindtext „eins" , um ..."` do not prematurely close.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs`:
- Around line 84-92: The AllUnicode branch (QuoteParityMode::AllUnicode)
incorrectly increments self.string_quote_tracking.unescaped_quote_count for
unicode quote characters from UNICODE_QUOTE_CHARS without verifying they are
unescaped; update the branch in the parser where UNICODE_QUOTE_CHARS is checked
to apply the same even-backslash guard used for the ASCII double-quote handling
(i.e., count preceding backslashes and only treat the unicode quote as unescaped
when the count is even) so escaped unicode quote marks like `\„` do not flip
parity.

---

Outside diff comments:
In
`@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs`:
- Around line 462-490: The whitespace lookahead branch incorrectly treats a
comma as an unconditional close; update the match arm inside the while-let in
the function/method that uses next and closing_char_count so that when
encountering ',' it applies the same parity check as the direct ',' arm (i.e.,
only return true if closing_char_count % 2 == 0) and still respects
in_object_value/in_array/in_object_key conditions; adjust the ',', ',' | ']' and
',' if in_array cases to use closing_char_count where appropriate so inputs like
`"Blindtext „eins" , um ..."` do not prematurely close.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c6a3b5c6-ff26-4847-a399-aa46b8d19be6

📥 Commits

Reviewing files that changed from the base of the PR and between 0a232cb and a195d87.

📒 Files selected for processing (5)

engine/baml-lib/jsonish/src/jsonish/parser/entry.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser.rs
engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs
engine/baml-lib/jsonish/src/tests/test_class.rs
engine/baml-lib/jsonish/src/tests/test_lists.rs

✅ Files skipped from review due to trivial changes (1)

engine/baml-lib/jsonish/src/tests/test_lists.rs

🚧 Files skipped from review as they are similar to previous changes (1)

engine/baml-lib/jsonish/src/tests/test_class.rs

coderabbitai · 2026-04-17T23:50:06Z

+            } else if quote_parity == QuoteParityMode::AllUnicode
+                && UNICODE_QUOTE_CHARS.contains(&token)
+            {
+                // Under AllUnicode, double-quote-role unicode marks (e.g.
+                // `„`, `"`, `»`, `「`) also flip parity so a stray opener
+                // inside an ASCII-quoted string prevents early close on
+                // the next `,`. Single-quote-role marks are intentionally
+                // excluded — see `UNICODE_QUOTE_CHARS` for why.
+                self.string_quote_tracking.unescaped_quote_count += 1;


⚠️ Potential issue | 🟡 Minor

Honor escaping before Unicode quote marks.

The AllUnicode branch increments unescaped_quote_count without the even-backslash guard used for ASCII ". In malformed-but-repairable strings, \„ will flip parity even though it is escaped.

Proposed fix

} else if quote_parity == QuoteParityMode::AllUnicode && UNICODE_QUOTE_CHARS.contains(&token) + && self + .string_quote_tracking + .trailing_backslashes + .is_multiple_of(2) {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs` around lines 84 - 92, The AllUnicode branch (QuoteParityMode::AllUnicode) incorrectly increments self.string_quote_tracking.unescaped_quote_count for unicode quote characters from UNICODE_QUOTE_CHARS without verifying they are unescaped; update the branch in the parser where UNICODE_QUOTE_CHARS is checked to apply the same even-backslash guard used for the ASCII double-quote handling (i.e., count preceding backslashes and only treat the unicode quote as unescaped when the count is even) so escaped unicode quote marks like `\„` do not flip parity.

vercel Bot deployed to Preview – beps April 17, 2026 22:21 View deployment

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs

coderabbitai Bot requested changes Apr 17, 2026

View reviewed changes

Comment thread engine/baml-lib/jsonish/src/jsonish/parser/fixing_parser/json_parse_state.rs

sxlijin force-pushed the push-lukqolrwooux branch from af82f85 to cc06235 Compare April 17, 2026 22:32

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 22:32 Inactive

vercel Bot deployed to Preview – beps April 17, 2026 22:33 View deployment

sxlijin changed the title ~~Push lukqolrwooux~~ feat(jsonish): handle special unicode quote chars Apr 17, 2026

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

sxlijin force-pushed the push-lukqolrwooux branch from cc06235 to c6db767 Compare April 17, 2026 22:43

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 22:43 Inactive

vercel Bot deployed to Preview – beps April 17, 2026 22:43 View deployment

sxlijin force-pushed the push-lukqolrwooux branch from c6db767 to 4dbe641 Compare April 17, 2026 22:44

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 22:44 Inactive

vercel Bot deployed to Preview – beps April 17, 2026 22:44 View deployment

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

sxlijin force-pushed the push-lukqolrwooux branch from 4dbe641 to 0a232cb Compare April 17, 2026 22:52

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 22:52 Inactive

vercel Bot deployed to Preview – beps April 17, 2026 22:53 View deployment

sxlijin force-pushed the push-lukqolrwooux branch from 0a232cb to bc28c41 Compare April 17, 2026 23:12

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 23:12 Inactive

vercel Bot deployed to Preview – beps April 17, 2026 23:13 View deployment

feat(jsonish): fix weird quoting

a195d87

sxlijin force-pushed the push-lukqolrwooux branch from bc28c41 to a195d87 Compare April 17, 2026 23:21

vercel Bot temporarily deployed to Preview – promptfiddle April 17, 2026 23:21 Inactive

coderabbitai Bot approved these changes Apr 17, 2026

View reviewed changes

vercel Bot deployed to Preview – beps April 17, 2026 23:22 View deployment

sxlijin added this pull request to the merge queue Apr 17, 2026

sxlijin removed this pull request from the merge queue due to a manual request Apr 17, 2026

sxlijin added this pull request to the merge queue Apr 17, 2026

coderabbitai Bot requested changes Apr 17, 2026

View reviewed changes

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(jsonish): handle special unicode quote chars#3381

feat(jsonish): handle special unicode quote chars#3381
sxlijin wants to merge 1 commit intocanaryfrom
push-lukqolrwooux

sxlijin commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sxlijin commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

vercel Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sxlijin commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Apr 17, 2026 •

edited

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading