feat: dataframe support via narwhals#3912
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | a62a381 | Commit Preview URL Branch Preview URL |
May 11 2026, 09:22 PM |
|
Test failures are unrelated |
rudolfix
left a comment
There was a problem hiding this comment.
I like it! we can add narwhals as main dep as this is pure python without any hard deps (nice!)
my main issue here is: changes touch a critical path - every extracted item goes through type check to tell the tabular data vs. Python objects. right now everything goes through narwhals which is ~50x slower than old method on devel. I include output from optimize-code skill below.
tell me what you want to do:
- I can just apply implementation plan that will fix do the speedups mentioned below
- you want to work on it further.
overall there's no problem with overall concept and structure. some minimal tests for ie. duckdb relation passed as tabular data would be nice PoC
Benchmark: per-item overhead of
|
| item | devel (3-check) | this PR (narwhals naive) | PR vs devel |
|---|---|---|---|
dict |
265 ns | 5,928 ns | 22.4× slower |
int |
263 ns | 5,794 ns | 22.0× slower |
None |
265 ns | 5,857 ns | 22.1× slower |
pandas.DataFrame |
179 ns | 11,468 ns | 64× slower |
polars.DataFrame |
253 ns | 5,976 ns | 23.6× slower |
polars.LazyFrame |
259 ns | 6,293 ns | 24.3× slower |
pyarrow.Table |
90 ns | 90 ns | par (short-circuits on is_arrow_object) |
Bare-process scenario (no DF library imported — most JSON/REST resources): old=155 ns/item → PR=5,566 ns/item → 36× regression.
Cost decomposition: narwhals.from_native(d, allow_series=False, pass_through=True) itself is ~6 µs/op; the isinstance after it is only 55 ns. Almost all the cost is from_native.
Real-world impact for a resource yielding 10M plain dict rows on the extract path:
- devel: 0.155 µs × 10M ≈ 1.6 s
- this PR: 5.6 µs × 10M ≈ 56 s (~54 s pure regression on detection alone)
Frame conversion (per-batch)
| kind | rows | direct (devel) | narwhals (PR) | PR vs devel |
|---|---|---|---|---|
| pandas DataFrame | 10 | 62 µs | 96 µs | 1.55× |
| pandas DataFrame | 1,000 | 79 µs | 103 µs | 1.30× |
| pandas DataFrame | 100,000 | 78 µs | 107 µs | 1.36× |
| polars DataFrame | 10 | 9.5 µs | 17.5 µs | 1.83× |
| polars DataFrame | 1,000 | 11.0 µs | 18.3 µs | 1.66× |
| polars DataFrame | 100,000 | 13.3 µs | 19.4 µs | 1.46× |
| polars LazyFrame | 100,000 | 13.2 µs | 23.6 µs | 1.79× |
df_to_arrow adds 30–96 % overhead per pandas/polars batch — narwhals routes through extra indirection rather than calling df.to_arrow() directly.
Proposed optimizations
1. Type-cache is_dataframe (per-item path)
# dlt/common/libs/narwhals.py
_BUILTIN_NON_FRAME = (dict, list, str, int, float, bool, tuple, bytes, type(None), set, frozenset)
_TYPE_IS_FRAME: dict[type, bool] = {}
def is_dataframe(obj: Any) -> bool:
t = type(obj)
if t in _BUILTIN_NON_FRAME:
return False
cached = _TYPE_IS_FRAME.get(t)
if cached is not None:
return cached
maybe = narwhals.from_native(obj, allow_series=False, pass_through=True)
result = isinstance(maybe, (narwhals.DataFrame, narwhals.LazyFrame))
_TYPE_IS_FRAME[t] = result
return result2. Type-dispatched df_to_arrow (per-batch path)
_CONVERTERS: dict[type, Callable[[Any], "pyarrow.Table"]] = {}
def df_to_arrow(df):
t = type(df)
fn = _CONVERTERS.get(t)
if fn is None:
mod = t.__module__
if mod.startswith("pandas."):
from dlt.common.libs.pandas import pandas_to_arrow
fn = pandas_to_arrow
elif mod.startswith("polars."):
from dlt.common.libs.polars import polars_to_arrow
fn = polars_to_arrow
else:
fn = _df_to_arrow_via_narwhals
_CONVERTERS[t] = fn
return fn(df)
def _df_to_arrow_via_narwhals(df):
nw_df = narwhals.from_native(df, allow_series=False)
if isinstance(nw_df, narwhals.LazyFrame):
nw_df = nw_df.collect()
return nw_df.to_arrow()Results with both patches applied (warm cache)
| item | devel | this PR | proposed | proposed vs PR | proposed vs devel |
|---|---|---|---|---|---|
dict |
265 ns | 5,928 ns | 126 ns | 47× faster | 2.1× faster |
int |
263 ns | 5,794 ns | 138 ns | 42× faster | 1.9× faster |
pandas.DataFrame |
179 ns | 11,468 ns | 187 ns | 61× faster | par |
polars.DataFrame |
253 ns | 5,976 ns | 171 ns | 35× faster | 1.5× faster |
| conversion | direct (devel) | narwhals (PR) | dispatched | dispatched vs PR |
|---|---|---|---|---|
| pandas (100k rows) | 78 µs | 107 µs | 67 µs | 1.60× faster |
| polars (100k rows) | 13.3 µs | 19.4 µs | 9.9 µs | 1.96× faster |
| polars LazyFrame (100k rows) | 13.2 µs | 23.6 µs | 12.0 µs | 1.97× faster |
Both patches together are ~25 LoC, preserve PR semantics, fully restore parity with devel for pandas/polars, and keep narwhals as the fallback for the new backends this PR is aiming to unlock.
|
Thanks for the review. I agree with the performance issues and we should find a better solution. I would spend more time trying to improve the PR (next week) if that's ok. I believe this will help with incremental logic Type guard clause for dataframesMain branch uses a cheap The SuggestionsThere are some extra layers (e.g., |
not sure if i've misunderstood, but would the |
|
@MarcoGorelli Thanks! Will check the aside: did the docs get an overhaul since last week? |
|
😄 yeah moved to zensical |
fd2ae4a to
5df5b4b
Compare
|
@rudolfix can you run the performance analysis? Switching to the native narwhals type checking should fix all performance concerns. Also, leaving this for follow-up task (can assign it to me):
|
| if is_polars_frame(item): | ||
| from dlt.common.libs.polars import polars_to_arrow |
There was a problem hiding this comment.
Notice that unlike is_polars_frame, narwhals.dependencies.is_into_dataframe returns False for a polars LazyFrame (and we don't have a lazy equivalent of is_into_dataframe - we can consider adding it to be honest, @MarcoGorelli 👀)
There was a problem hiding this comment.
Thanks to you and Marco for the reviews! It's saving us a lot of time actually.
@FBruzzesi it would be nice to have a type guard to that catches LazyFrame. Though, it doesn't block this PR. This polars support hasn't reach master branch and isn't released yet
@rudolfix IMO, the "eager arrow code path" shouldn't support lazyframes and raise "Received LazyFrame, call .collect() inside your @dlt.resource". Currently, we're hiding an expensive operation for minimal convenience. AFAIK, we're not returning LazyFrame objects back for downstream @dlt.transformer.
Lazy objects should be supported via the "lazy model code path" where we have Ibis expressions, SQLGlot, etc. for now.
Hopefully, we can unify both by implementing load package preparations and incremental logic via Narwhals
There was a problem hiding this comment.
Yeah I think calling collect on behalf of your users would only be justified if you were doing a lot of operations lazily before reaching collect which would be difficult for the user to do outside of dlt
instead of calling collect first thing, a clear error message like your one looks like a nice solution 👍
Summary
Instead of implementing logic per dataframe library, centralize logic around Narwhals.
Context
Narwhals is a dependency-free library that implements a (large) subset of the Polars API with the ability to delegate execution of the expression to many different backends (pandas, polars, duckdb, Ibis, PySpark, cuDF).
Benefits
Single logic to maintain for
dltand support more libraries for end-users. Testing and features are less brittle and better tested. Avoid Python plugin dependency hell (lazy importin Python 3.15 should help; see you in 5 years)