You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My thinking was to document the deviations from main, since very big chunks are the same and already well-documented.
I was also concerned about sinking much time into things while they were changing.
Well now, enough of the core is stable (enough) to start blabbing about - so its time!
Documenting
Classes/docstrings
There are a lot, but many are tiny and only really need an understanding of a base class:
stateDiagram-v2
[*] --> init_lp
init_lp: Starting a LogicalPlan
extend_lp: Extending a LogicalPlan
init_lp --> extend_lp
extend_lp --> LogicalToResolved
%% Resolving the plan
%% Doesn't look good when in a nested state
LogicalToResolved --> ResolvedPlan
state fork_resolve <<fork>>
ResolvedPlan --> fork_resolve
fork_resolve --> Schema: collect_schema
fork_resolve --> ResolvedToCompliant: collect/sink_parquet
ResolvedToCompliant --> CompliantLazyFrame
CompliantLazyFrame --> CompliantDataFrame: collect
CompliantDataFrame --> DataFrame
CompliantLazyFrame --> None: sink_parquet
state init_lp {
[*] --> ScanCsv: scan_csv
--
[*] --> ScanParquet: scan_parquet
--
[*] --> ScanDataFrame: DataFrame.lazy(None)
--
[*] --> ScanLazyFrame: LazyFrame.from_native
[*] --> ScanLazyFrame: DataFrame.lazy(backend)
}
state extend_lp {
[*] --> LazyFrame
LazyFrame --> LogicalPlan
LogicalPlan --> LazyFrame
LazyFrame --> Collect: collect
LazyFrame --> SinkParquet: sink_parquet
LazyFrame --> [*] : collect_schema
}
note left of init_lp
First node has a schema.
Children store unresolved
ExprIR/SelectorIR(s)
end note
note right of LogicalToResolved
A Protocol with a builtin
implementation (Resolver)
based *heavily* on polars.
end note
note left of ResolvedPlan
Nodes that alter the
schema store an
output_schema.
ExprIR resolve to NamedIR.
SelectorIR expand to
tuple[str, ...]
end note
note right of Schema
collect_schema() didn't
need to go through
CompliantLazyFrame
end note
note right of ResolvedToCompliant
collect() means we need
*more than just a Schema*.
Time to evaluate our plan!
end note
note right of ResolvedToCompliant
Another Protocol, (like
LogicalToResolved)
but backend-dependent
end note
Description
(#2572) and more recently (expr-ir/logical-plan) add a lot of code, but only a little of docs.
My thinking was to document the deviations from
main, since very big chunks are the same and already well-documented.I was also concerned about sinking much time into things while they were changing.
Well now, enough of the core is stable (enough) to start blabbing about - so its time!
Documenting
Classes/docstrings
There are a lot, but many are tiny and only really need an understanding of a base class:
Look how smol
narwhals/narwhals/_plan/expressions/aggregation.py
Lines 46 to 53 in 51cebab
So these guys are my picks for what could be most beneficial for better docstrings in
narwhals._plan.Reading this back, okay this still looks like a lot 🤦♂️
Expressions
_expr_ir.ExprIR(priority)ExprIR__init_subclass__parameters (af46dec)__init_subclass__(dispatch)ExprIR.__expr_ir_dispatch__(4f1b453)ExprIR.dispatch(7fada41)ExprIR.map_ir(d0a6e72)ExprIR.__expr_ir_dtype__(44c9510)ExprIR.__expr_ir_nodes__(9c59bf0)ExprIRMeta(69886b0), (21a135f)expressions.aggregation.AggExpr(80c919d)expressions.literal.{Lit,LitSeries}(0672e95)expressions.expr.Aliasexpressions.name.KeepNameexpressions.name.RenameAliasSelectors
_expr_ir.SelectorIR(priority)iter_expand_selector(1d6c986)matches(0e4145e)to_dtype_selector(257083d)expressions.expr.RootSelectorexpressions.selectors.DTypeSelector(dd3e4f3)expressions.selectors.BinarySelector(27f3f27)expressions.selectors.InvertSelector(4528b50)Expr/Selectorspecial cases (d231dd0)Functions
_function.Function(priority)__expr_ir_dispatch____expr_ir_dtype__to_function_expr__init_subclass____function_flags__FunctionFlags(priority)expressions.expr.FunctionExpr(high priority)_parameters.{Unary,Binary,Ternary,Variadic}(f9ec804)_dispatch.Dispatcher(medium priority)_dtype.ResolveDType(new inlogical-plan)Expansion
_expr_ir.NamedIRNamedIR.map_ir(ef1c03f)NamedIR.expr,NamedIR.name(10c4fb6)_expansion.Expander(117d06d)prepare_projection,expand_selectors(249fa22)meta.MetaNamespace(d99befb)has_multiple_outputsbehaviorpl.Expr.meta.has_multiple_outputs()is broken on main pola-rs/polars#23708schema.FrozenSchemaExpander.iter_expand_expressionsExprIR.iter_expandExprTraverseriter_expanditer_expand_by_combination(f1a8996), (786bcea)ExprNodeMisc
_immutable.Immutable,_meta.ImmutableMeta(priority)(a546d0f)_meta.SlottedMeta(1dd5f6f)_nodes.node(68903b4)_nodes.nodes(b54927e)_nodes(5574a51)Compliant
compliant.expr.CompliantExprcompliant.scalar.CompliantScalarcompliant.{concat,io,ranges,translate}.*(in flux inlogical-plan)compliant.group_by.(Eager)DataFrameGroupBy(in flux inlogical-plan)narwhals.dataframe.DataFrame.group_bygroup_by.GroupBy.agg_compliant.group_by.ParseKeysGroupBypolars_plan::plans::conversion::dsl_to_ir::resolve_group_byNarrative
Docstrings can only go so far.
How the pieces come together, why they do and what even is an IR in the first place?
Note
Section needs more fluff
And now for something completely different
expr-ir/logical-plan adds (among other things), a new package
plans.While it is still a work-in-progress (read: expect things to change), the overall
idea is coming together.
The journey of a query looks like this:
Or more visually:
Show Mermaid Diagram
stateDiagram-v2 [*] --> init_lp init_lp: Starting a LogicalPlan extend_lp: Extending a LogicalPlan init_lp --> extend_lp extend_lp --> LogicalToResolved %% Resolving the plan %% Doesn't look good when in a nested state LogicalToResolved --> ResolvedPlan state fork_resolve <<fork>> ResolvedPlan --> fork_resolve fork_resolve --> Schema: collect_schema fork_resolve --> ResolvedToCompliant: collect/sink_parquet ResolvedToCompliant --> CompliantLazyFrame CompliantLazyFrame --> CompliantDataFrame: collect CompliantDataFrame --> DataFrame CompliantLazyFrame --> None: sink_parquet state init_lp { [*] --> ScanCsv: scan_csv -- [*] --> ScanParquet: scan_parquet -- [*] --> ScanDataFrame: DataFrame.lazy(None) -- [*] --> ScanLazyFrame: LazyFrame.from_native [*] --> ScanLazyFrame: DataFrame.lazy(backend) } state extend_lp { [*] --> LazyFrame LazyFrame --> LogicalPlan LogicalPlan --> LazyFrame LazyFrame --> Collect: collect LazyFrame --> SinkParquet: sink_parquet LazyFrame --> [*] : collect_schema } note left of init_lp First node has a schema. Children store unresolved ExprIR/SelectorIR(s) end note note right of LogicalToResolved A Protocol with a builtin implementation (Resolver) based *heavily* on polars. end note note left of ResolvedPlan Nodes that alter the schema store an output_schema. ExprIR resolve to NamedIR. SelectorIR expand to tuple[str, ...] end note note right of Schema collect_schema() didn't need to go through CompliantLazyFrame end note note right of ResolvedToCompliant collect() means we need *more than just a Schema*. Time to evaluate our plan! end note note right of ResolvedToCompliant Another Protocol, (like LogicalToResolved) but backend-dependent end note