fix(metrics): make convert_types handle pd.NA / pd.NaT (#1844)#1882
Open
jbbqqf wants to merge 1 commit into
Open
fix(metrics): make convert_types handle pd.NA / pd.NaT (#1844)#1882jbbqqf wants to merge 1 commit into
jbbqqf wants to merge 1 commit into
Conversation
) `convert_types` previously called `np.isnan(val)` unconditionally, which raises `TypeError: boolean value of NA is ambiguous` on `pd.NA` and a separate `TypeError` on `pd.NaT`. Any label dict reaching `ByLabelCountValue` from a nullable pandas dtype (e.g. `string`, `Int64`) crashed before the result could be serialized. Guard the NA branch with `pd.isna` so all pandas-flavored missing values collapse to `None`; keep `numpy.nan` flowing through as a float so the existing serializer (`test_by_label_count_value`) still maps it to the string `"nan"`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1844 (and unblocks #1616 — same root cause).
Why
convert_typesis called for every label key flowing intoByLabelCountValue.counts/.shares. Onmainit ends with:np.isnan(pd.NA)raisesTypeError: boolean value of NA is ambiguousand
np.isnan(pd.NaT)raisesTypeError: ufunc 'isnan' not supported for the input types .... Any preset (e.g.DataSummaryPreset,DataDriftPreset) that touches a column with a nullable pandas dtype(
"string","Int64", …) and anypd.NAvalue blows up before theresult can be built, surfacing as a confusing
pydantic.ValidationError: boolean value of NA is ambiguous.Fix
After the existing bool / int / str fast paths, guard the NA branch with
pd.isna(val)(the only NA-detection helper that handles all ofNone / np.nan / pd.NaT / pd.NA). Pandas-flavored NA collapses toNone;numpy.nanis preserved as a float so the existing serializerkeeps mapping it to the string
"nan"— see the preservedtest_by_label_count_valuetest.Reproduce BEFORE/AFTER yourself (copy-paste)
What I ran locally
Failing assertion before the fix (copied verbatim from
pytest -vonorigin/mainwith the regression test applied):Edge cases covered by the regression tests
pd.NATypeError: boolean value of NA is ambiguousNonepd.NaTTypeError: ufunc 'isnan' not supportedNonenp.nannan(serializer →"nan")NoneNoneTrue,42,"class_a"ByLabelCountValuewithpd.NAkey__init__Disclosure: I drafted this PR with help from Claude Code while triaging
older issues. The reproduction, fix, and test runs above were executed
locally; outputs are copied verbatim.