-
Notifications
You must be signed in to change notification settings - Fork 56
Another attempt at an astable flag #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 27 commits
a8701c8
9b997a6
d639560
b77e8ca
3cdf0d5
b878fbb
2344a2e
6557def
6002def
08a1c4b
581b2cf
7cc8947
0eca67d
a4ab9a6
ab9bae4
495f08a
01cb5e7
01fb3b7
915191c
a331fc2
2ce4d9e
57b4051
da7674d
285e3ac
713eaf0
4e01c4a
09c692a
ae26da8
a7fd1a2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ In addition, DataFramesMeta provides | |
| convenient syntax. | ||
| * `@byrow` for applying functions to each row of a data frame (only supported inside other macros). | ||
| * `@passmissing` for propagating missing values inside row-wise DataFramesMeta.jl transformations. | ||
| * `@astable` to create multiple columns within a single transformation. | ||
| * `@chain`, from [Chain.jl](https://github.com/jkrumbiegel/Chain.jl) for piping the above macros together, similar to [magrittr](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html)'s | ||
| `%>%` in R. | ||
|
|
||
|
|
@@ -396,11 +397,38 @@ julia> @rtransform df @passmissing x = parse(Int, :x_str) | |
| 3 │ missing missing | ||
| ``` | ||
|
|
||
| ## Creating multiple columns at once with `@astable` | ||
|
|
||
| Often new variables may depend on the same intermediate calculations. `@astable` makes it easy to create multiple | ||
| new variables in the same operation, yet have them share | ||
| information. | ||
|
|
||
| In a single block, all assignments of the form `:y = f(:x)` | ||
| or `$y = f(:x)` at the top-level generate new columns. In the 2nd example, `y` | ||
| must be a string or `Symbol`. | ||
|
|
||
| ``` | ||
| julia> df = DataFrame(a = [1, 2, 3], b = [400, 500, 600]); | ||
|
|
||
| julia> @transform df @astable begin | ||
| ex = extrema(:b) | ||
| :b_first = :b .- first(ex) | ||
| :b_last = :b .- last(ex) | ||
| end | ||
| 3×4 DataFrame | ||
| Row │ a b b_first b_last | ||
| │ Int64 Int64 Int64 Int64 | ||
| ─────┼─────────────────────────────── | ||
| 1 │ 1 400 0 -200 | ||
| 2 │ 2 500 100 -100 | ||
| 3 │ 3 600 200 0 | ||
| ``` | ||
|
|
||
|
|
||
| ## [Working with column names programmatically with `$`](@id dollar) | ||
|
|
||
| DataFramesMeta provides the special syntax `$` for referring to | ||
| columns in a data frame via a `Symbol`, string, or column position as either | ||
| a literal or a variable. | ||
| columns in a data frame via a `Symbol`, string, or column position as either a literal or a variable. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While we are at it given our recent discussion on Discourse, I think it is essential to mention when the
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will do this as another PR. In summary, you can't use other macros which use
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear why I stress it so much. With DataFrames.jl my answer to users is: if you learn Julia Base then you will know exactly how DataFrames.jl works. With DataFramesMeta.jl unfortunately this is not the case as it is a DSL so we need to be very precise how things work in documentation. |
||
|
|
||
| ```julia | ||
| df = DataFrame(A = 1:3, B = [2, 1, 2]) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -282,11 +282,10 @@ macro byrow(args...) | |
| throw(ArgumentError("@byrow is deprecated outside of DataFramesMeta macros.")) | ||
| end | ||
|
|
||
|
|
||
| """ | ||
| passmissing(args...) | ||
| @passmissing(args...) | ||
|
|
||
| Propograte missing values inside DataFramesMeta.jl macros. | ||
| Propagrate missing values inside DataFramesMeta.jl macros. | ||
pdeffebach marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| `@passmissing` is not a "real" Julia macro but rather serves as a "flag" | ||
|
|
@@ -350,6 +349,156 @@ macro passmissing(args...) | |
| throw(ArgumentError("@passmissing only works inside DataFramesMeta macros.")) | ||
| end | ||
|
|
||
| global astable_docstring_snippet = """ | ||
pdeffebach marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Transformations can also use the macro-flag `@astable` for creating multiple | ||
pdeffebach marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| new columns at once and letting transformations share the same name-space. | ||
| See `? @astable` for more details. | ||
| """ | ||
|
|
||
| """ | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| @astable(args...) | ||
|
|
||
| Return a `NamedTuple` from a single transformation inside the DataFramesMeta.jl | ||
| macros, `@select`, `@transform`, and their mutating and row-wise equivalents. | ||
|
|
||
| `@astable` acts on a single block. It works through all top-level expressions | ||
| and collects all such expressions of the form `:y = ...` or `$(DOLLAR)y = ...`, i.e. assignments to a | ||
| `Symbol` or an escaped column identifier, which is a syntax error outside of | ||
| DataFramesMeta.jl macros. At the end of the expression, all assignments are collected | ||
| into a `NamedTuple` to be used with the `AsTable` destination in the DataFrames.jl | ||
| transformation mini-language. | ||
|
|
||
| Concretely, the expressions | ||
|
|
||
| ``` | ||
| df = DataFrame(a = 1) | ||
|
|
||
| @rtransform df @astable begin | ||
| :x = 1 | ||
| y = 50 | ||
| :z = :x + y + :a | ||
| end | ||
| ``` | ||
|
|
||
| become the pair | ||
|
|
||
| ``` | ||
| function f(a) | ||
| x_t = 1 | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| y = 50 | ||
| z_t = x_t + y + a | ||
|
|
||
| (; x = x_t, z = z_t) | ||
| end | ||
|
|
||
| transform(df, [:a] => ByRow(f) => AsTable) | ||
| ``` | ||
|
|
||
| `@astable` has two major advantages at the cost of increasing complexity. | ||
| First, `@astable` makes it easy to create multiple columns from a single | ||
| transformation, which share a scope. For example, `@astable` allows | ||
| for the following (where `:x` and `:x_2` exist in the data frame already). | ||
|
|
||
| ``` | ||
| @transform df @astable begin | ||
| m = mean(:x) | ||
| :x_demeaned = :x .- m | ||
| :x2_demeaned = :x2 .- m | ||
| end | ||
| ``` | ||
|
|
||
| The creation of `:x_demeaned` and `:x2_demeaned` both share the variable `m`, | ||
pdeffebach marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| which does not need to be calculated twice. | ||
|
|
||
| Second, `@astable` is useful when performing intermediate calculations | ||
| and storing their results in new columns. For example, the following fails. | ||
|
|
||
| ``` | ||
| @rtransform df begin | ||
| :new_col_1 = :x + :y | ||
| :new_col_2 = :new_col_1 + :z | ||
| end | ||
| ``` | ||
|
|
||
| This because DataFrames.jl does not guarantee sequential evaluation of | ||
| transformations. `@astable` solves this problem | ||
|
|
||
| @rtransform df @astable begin | ||
| :new_col_1 = :x + :y | ||
| :new_col_2 = :new_col_1 + :z | ||
| end | ||
|
|
||
| Column assignment in `@astable` follows similar rules as | ||
| column assignment in other DataFramesMeta.jl macros. The left- | ||
| -hand-side of a column assignment can be either a `Symbol` or any | ||
| expression which evaluates to a `Symbol` or `AbstractString`. For example | ||
| `:y = ...`, and `$(DOLLAR)y = ...` are both valid ways of assigning a new column. | ||
| However unlike other DataFramesMeta.jl macros, multi-column assignments via | ||
| `AsTable` are disallowed. The following will fail. | ||
|
|
||
| ``` | ||
| @transform df @astable begin | ||
| $AsTable = :x | ||
| end | ||
| ``` | ||
|
|
||
| References to existing columns also follow the same | ||
| rules as other DataFramesMeta.jl macros. | ||
|
|
||
| ### Examples | ||
|
|
||
| ``` | ||
| julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6]); | ||
|
|
||
| julia> d = @rtransform df @astable begin | ||
| :x = 1 | ||
| y = 5 | ||
| :z = :x + y | ||
| end | ||
| 3×4 DataFrame | ||
| Row │ a b x z | ||
| │ Int64 Int64 Int64 Int64 | ||
| ─────┼──────────────────────────── | ||
| 1 │ 1 4 1 6 | ||
| 2 │ 2 5 1 6 | ||
| 3 │ 3 6 1 6 | ||
|
|
||
| julia> df = DataFrame(a = [1, 1, 2, 2], b = [5, 6, 70, 80]); | ||
|
|
||
| julia> @by df :a @astable begin | ||
| ex = extrema(:b) | ||
| :min_b = first(ex) | ||
| :max_b = last(ex) | ||
| end | ||
|
Comment on lines
+429
to
+472
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This example can be achieved without Also, I wouldn't use long column names with spaces in them: better illustrate a single feature at a time.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great. changed. |
||
| 2×3 DataFrame | ||
| Row │ a min_b max_b | ||
| │ Int64 Int64 Int64 | ||
| ─────┼───────────────────── | ||
| 1 │ 1 5 6 | ||
| 2 │ 2 70 80 | ||
|
|
||
| julia> new_col = "New Column"; | ||
|
|
||
| julia> @rtransform df @astable begin | ||
| f_a = first(:a) | ||
| $(DOLLAR)new_col = :a + :b + f_a | ||
bkamins marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| :y = :a * :b | ||
| end | ||
| 4×4 DataFrame | ||
| Row │ a b New Column y | ||
| │ Int64 Int64 Int64 Int64 | ||
| ─────┼───────────────────────────────── | ||
| 1 │ 1 5 7 5 | ||
| 2 │ 1 6 8 6 | ||
| 3 │ 2 70 74 140 | ||
| 4 │ 2 80 84 160 | ||
| ``` | ||
|
|
||
| """ | ||
| macro astable(args...) | ||
| throw(ArgumentError("@astable only works inside DataFramesMeta macros.")) | ||
| end | ||
|
|
||
| ############################################################################## | ||
| ## | ||
| ## @with | ||
|
|
@@ -1097,6 +1246,8 @@ transformations by row, `@transform` allows `@byrow` at the | |
| beginning of a block of transformations (i.e. `@byrow begin... end`). | ||
| All transformations in the block will operate by row. | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```jldoctest | ||
|
|
@@ -1233,6 +1384,8 @@ transform!ations by row, `@transform!` allows `@byrow` at the | |
| beginning of a block of transform!ations (i.e. `@byrow begin... end`). | ||
| All transform!ations in the block will operate by row. | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```jldoctest | ||
|
|
@@ -1345,6 +1498,8 @@ transformations by row, `@select` allows `@byrow` at the | |
| beginning of a block of selectations (i.e. `@byrow begin... end`). | ||
| All transformations in the block will operate by row. | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```jldoctest | ||
|
|
@@ -1465,6 +1620,8 @@ transformations by row, `@select!` allows `@byrow` at the | |
| beginning of a block of select!ations (i.e. `@byrow begin... end`). | ||
| All transformations in the block will operate by row. | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```jldoctest | ||
|
|
@@ -1546,17 +1703,6 @@ function combine_helper(x, args...; deprecation_warning = false) | |
|
|
||
| exprs, outer_flags = create_args_vector(args...) | ||
|
|
||
| fe = first(exprs) | ||
| if length(exprs) == 1 && | ||
| get_column_expr(fe) === nothing && | ||
| !(fe.head == :(=) || fe.head == :kw) | ||
|
|
||
| @warn "Returning a Table object from @by and @combine now requires `$(DOLLAR)AsTable` on the LHS." | ||
|
|
||
| lhs = Expr(:$, :AsTable) | ||
| exprs = ((:($lhs = $fe)),) | ||
| end | ||
|
|
||
| t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs) | ||
|
|
||
| quote | ||
|
|
@@ -1592,6 +1738,8 @@ and | |
| @combine(df, :mx = mean(:x), :sx = std(:x)) | ||
| ``` | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```julia | ||
|
|
@@ -1666,16 +1814,6 @@ end | |
| function by_helper(x, what, args...) | ||
| # Only allow one argument when returning a Table object | ||
| exprs, outer_flags = create_args_vector(args...) | ||
| fe = first(exprs) | ||
| if length(exprs) == 1 && | ||
| get_column_expr(fe) === nothing && | ||
| !(fe.head == :(=) || fe.head == :kw) | ||
|
|
||
| @warn "Returning a Table object from @by and @combine now requires `\$AsTable` on the LHS." | ||
|
|
||
| lhs = Expr(:$, :AsTable) | ||
| exprs = ((:($lhs = $fe)),) | ||
| end | ||
|
|
||
| t = (fun_to_vec(ex; gensym_names = false, outer_flags = outer_flags) for ex in exprs) | ||
|
|
||
|
|
@@ -1718,6 +1856,8 @@ and | |
| @by(df, :g, mx = mean(:x), sx = std(:x)) | ||
| ``` | ||
|
|
||
| $astable_docstring_snippet | ||
|
|
||
| ### Examples | ||
|
|
||
| ```julia | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.