Skip to content

fix(metrics): make convert_types handle pd.NA / pd.NaT (#1844)#1882

Open
jbbqqf wants to merge 1 commit into
evidentlyai:mainfrom
jbbqqf:fix/1844-convert-types-pd-na
Open

fix(metrics): make convert_types handle pd.NA / pd.NaT (#1844)#1882
jbbqqf wants to merge 1 commit into
evidentlyai:mainfrom
jbbqqf:fix/1844-convert-types-pd-na

Conversation

@jbbqqf
Copy link
Copy Markdown

@jbbqqf jbbqqf commented May 21, 2026

Closes #1844 (and unblocks #1616 — same root cause).

Why

convert_types is called for every label key flowing into
ByLabelCountValue.counts / .shares. On main it ends with:

if val is None or np.isnan(val):
    return val
raise ValueError(f"type {type(val)} not supported as Label")

np.isnan(pd.NA) raises TypeError: boolean value of NA is ambiguous
and np.isnan(pd.NaT) raises TypeError: ufunc 'isnan' not supported for the input types .... Any preset (e.g. DataSummaryPreset,
DataDriftPreset) that touches a column with a nullable pandas dtype
("string", "Int64", …) and any pd.NA value blows up before the
result can be built, surfacing as a confusing
pydantic.ValidationError: boolean value of NA is ambiguous.

Fix

After the existing bool / int / str fast paths, guard the NA branch with
pd.isna(val) (the only NA-detection helper that handles all of
None / np.nan / pd.NaT / pd.NA). Pandas-flavored NA collapses to
None; numpy.nan is preserved as a float so the existing serializer
keeps mapping it to the string "nan" — see the preserved
test_by_label_count_value test.

Reproduce BEFORE/AFTER yourself (copy-paste)

# from a fresh checkout
git clone https://github.com/evidentlyai/evidently.git && cd evidently
uv venv .venv && . .venv/bin/activate
uv pip install -e . pytest pytest-timeout pytest-asyncio

# === BEFORE (origin/main) — expected: 3 failures ===
git checkout origin/main
git fetch https://github.com/jbbqqf/evidently.git fix/1844-convert-types-pd-na
git checkout FETCH_HEAD -- tests/future/test_metric_types.py
pytest tests/future/test_metric_types.py::test_convert_types_handles_pandas_na tests/future/test_metric_types.py::test_by_label_count_value_handles_pd_na_key -v
# Expected: 3 failed — TypeError: boolean value of NA is ambiguous

# === AFTER (this PR) — expected: all green ===
git checkout FETCH_HEAD -- src/evidently/core/metric_types.py
pytest tests/future/test_metric_types.py -v
# Expected: 5 passed

What I ran locally

$ pytest tests/future/test_metric_types.py -v
tests/future/test_metric_types.py::test_by_label_count_value[input0-output0]         PASSED
tests/future/test_metric_types.py::test_convert_types_handles_pandas_na[value0]      PASSED
tests/future/test_metric_types.py::test_convert_types_handles_pandas_na[value1]      PASSED
tests/future/test_metric_types.py::test_convert_types_preserves_existing_contracts   PASSED
tests/future/test_metric_types.py::test_by_label_count_value_handles_pd_na_key       PASSED
=========================== 5 passed, 4 warnings in 0.03s ============================

$ pytest tests/future/metrics tests/future/test_metric_types.py tests/future/presets -q
1458 passed, 1363 warnings in 48.53s

Failing assertion before the fix (copied verbatim from pytest -v on
origin/main with the regression test applied):

E   pydantic.v1.error_wrappers.ValidationError: 2 validation errors for ByLabelCountValue
E   counts
E     boolean value of NA is ambiguous (type=type_error)
E   shares
E     boolean value of NA is ambiguous (type=type_error)

Edge cases covered by the regression tests

Input Old behavior New behavior
pd.NA TypeError: boolean value of NA is ambiguous returns None
pd.NaT TypeError: ufunc 'isnan' not supported returns None
np.nan returns nan (serializer → "nan") unchanged
None returns None unchanged
True, 42, "class_a" returned untouched unchanged
ByLabelCountValue with pd.NA key crashes in __init__ builds + serializes fine

Disclosure: I drafted this PR with help from Claude Code while triaging
older issues. The reproduction, fix, and test runs above were executed
locally; outputs are copied verbatim.

)

`convert_types` previously called `np.isnan(val)` unconditionally, which
raises `TypeError: boolean value of NA is ambiguous` on `pd.NA` and a
separate `TypeError` on `pd.NaT`. Any label dict reaching
`ByLabelCountValue` from a nullable pandas dtype (e.g. `string`, `Int64`)
crashed before the result could be serialized.

Guard the NA branch with `pd.isna` so all pandas-flavored missing values
collapse to `None`; keep `numpy.nan` flowing through as a float so the
existing serializer (`test_by_label_count_value`) still maps it to the
string `"nan"`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convert_types raises TypeError on pd.NA labels

1 participant