From c5d3e165ffe77220eef90536141cc80a64413095 Mon Sep 17 00:00:00 2001 From: Julius Krumbiegel Date: Mon, 10 Oct 2022 11:03:09 +0200 Subject: [PATCH 1/7] add mention of dataframemacros to the docs --- docs/Project.toml | 2 + docs/src/index.md | 5 +- docs/src/man/querying_frameworks.md | 82 ++++++++++++++++++++++++- docs/src/man/working_with_dataframes.md | 3 + 4 files changed, 89 insertions(+), 3 deletions(-) diff --git a/docs/Project.toml b/docs/Project.toml index 6b701748f0..ebe348a76b 100755 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,6 +1,8 @@ [deps] CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597" +Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc" +DataFrameMacros = "75880514-38bc-4a95-a458-c2aea5a3a702" DataFramesMeta = "1313f7d8-7da2-5740-9ea0-a2ca25f37964" Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28" diff --git a/docs/src/index.md b/docs/src/index.md index 8c2fea1734..134436570a 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -120,7 +120,10 @@ integrated they are with DataFrames.jl. DataFrames.jl, other tabular data libraries (more on those below), and even non-tabular data. Provides many convenience functions analogous to those in dplyr in R or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). - - You can find more on both of these packages in the + - [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl): + Provides macro versions of the common DataFrames functions similar to DataFramesMeta, + with convenient syntax for the manipulation of multiple columns at once. + - You can find more information on these packages in the [Data manipulation frameworks](@ref) section of this manual. - **And More!** - [Graphs.jl](https://github.com/JuliaGraphs/Graphs.jl): A pure-Julia, diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index 0d4e9c4990..bdaa99c9b6 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -1,7 +1,7 @@ # Data manipulation frameworks -Two popular frameworks provide convenience methods to manipulate `DataFrame`s: -DataFramesMeta.jl and Query.jl. They implement a functionality similar to +Three frameworks provide convenience methods to manipulate `DataFrame`s: +DataFramesMeta.jl, Query.jl and DataFrameMacros.jl. They implement a functionality similar to [dplyr](https://dplyr.tidyverse.org/) or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). @@ -247,3 +247,81 @@ These examples only scratch the surface of what one can do with referred to the [Query.jl documentation](http://www.queryverse.org/Query.jl/stable/) for more information. + +## DataFrameMacros.jl + +[DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) is +an alternative to `DataFramesMeta.jl` with an additional focus on convenient +solutions for the transformation of multiple columns at once. +The instructions below are for version 0.3 of DataFrameMacros.jl. + +First, install the DataFrameMacros.jl package: + +```julia +using Pkg +Pkg.add("DataFrameMacros") +``` + +In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. +There is also a `@groupby` that works like a `@transform` with `groupby` together, +for grouping by new columns without writing them out twice. + +In the below example, you can also see some of DataFrameMacros' multi-column +features, where `mean` is applied to both age columns at once by selecting +them with the `r"age"` regex. The new column names are then derived using the +`"{}"` shortcut which splices the transformed column names into a string. + +```jldoctest dataframemacros +using DataFrames, DataFrameMacros, Chain, Statistics + +julia> df = DataFrame(name=["John", "Sally", "Roger"], + age=[54.0, 34.0, 79.0], + children=[0, 2, 4]) +3×3 DataFrame + Row │ name age children + │ String Float64 Int64 +─────┼─────────────────────────── + 1 │ John 54.0 0 + 2 │ Sally 34.0 2 + 3 │ Roger 79.0 4 + +julia> @chain df begin + @transform :age_months = :age * 12 + @groupby :has_child = :children > 0 + @combine "mean_{}" = mean({r"age"}) + end +2×3 DataFrame + Row │ has_child mean_age mean_age_months + │ Bool Float64 Float64 +─────┼────────────────────────────────────── + 1 │ false 54.0 648.0 + 2 │ true 56.5 678.0 +``` + +There's also the capability to reference a group of multiple columns as a single unit, +for example to run aggregations over them, with the `{{ }}` syntax. +In the following example, the first quarter is compared to the maximum of the other three: + +```jldoctest dataframemacros +julia> df = DataFrame( + q1 = [12.0, 0.4, 42.7], + q2 = [6.4, 2.3, 40.9], + q3 = [9.5, 0.2, 13.6], + q4 = [6.3, 5.4, 39.3]) +3×4 DataFrame + Row │ q1 q2 q3 q4 + │ Float64 Float64 Float64 Float64 +─────┼──────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 + 2 │ 0.4 2.3 0.2 5.4 + 3 │ 42.7 40.9 13.6 39.3 + +julia> @transform df :q1_best = :q1 > maximum({{Not(:q1)}}) +3×5 DataFrame + Row │ q1 q2 q3 q4 q1_best + │ Float64 Float64 Float64 Float64 Bool +─────┼───────────────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 true + 2 │ 0.4 2.3 0.2 5.4 false + 3 │ 42.7 40.9 13.6 39.3 true +``` diff --git a/docs/src/man/working_with_dataframes.md b/docs/src/man/working_with_dataframes.md index 88b7557385..01e6953759 100755 --- a/docs/src/man/working_with_dataframes.md +++ b/docs/src/man/working_with_dataframes.md @@ -738,6 +738,9 @@ operations: - the [DataFramesMeta.jl](https://github.com/JuliaStats/DataFramesMeta.jl) package provides interfaces similar to LINQ and [dplyr](https://dplyr.tidyverse.org) +- the [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) + package provides macros for most standard functions from DataFrames.jl, + with convenient syntax for the manipulation of multiple columns at once. See the [Data manipulation frameworks](@ref) section for more information. From 0375cb52f4a0064fb477a4cef97de03e806f9531 Mon Sep 17 00:00:00 2001 From: jkrumbiegel <22495855+jkrumbiegel@users.noreply.github.com> Date: Mon, 10 Oct 2022 12:02:04 +0200 Subject: [PATCH 2/7] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Bogumił Kamiński --- docs/src/index.md | 2 +- docs/src/man/querying_frameworks.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index 134436570a..cc48f0bf65 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -121,7 +121,7 @@ integrated they are with DataFrames.jl. non-tabular data. Provides many convenience functions analogous to those in dplyr in R or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). - [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl): - Provides macro versions of the common DataFrames functions similar to DataFramesMeta, + Provides macro versions of the common DataFrames.jl functions similar to DataFramesMeta.jl, with convenient syntax for the manipulation of multiple columns at once. - You can find more information on these packages in the [Data manipulation frameworks](@ref) section of this manual. diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index bdaa99c9b6..7143f2ce44 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -251,7 +251,7 @@ information. ## DataFrameMacros.jl [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) is -an alternative to `DataFramesMeta.jl` with an additional focus on convenient +an alternative to DataFramesMeta.jl with an additional focus on convenient solutions for the transformation of multiple columns at once. The instructions below are for version 0.3 of DataFrameMacros.jl. @@ -266,13 +266,13 @@ In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. There is also a `@groupby` that works like a `@transform` with `groupby` together, for grouping by new columns without writing them out twice. -In the below example, you can also see some of DataFrameMacros' multi-column +In the below example, you can also see some of DataFrameMacros.jl' multi-column features, where `mean` is applied to both age columns at once by selecting them with the `r"age"` regex. The new column names are then derived using the `"{}"` shortcut which splices the transformed column names into a string. ```jldoctest dataframemacros -using DataFrames, DataFrameMacros, Chain, Statistics +julia> using DataFrames, DataFrameMacros, Chain, Statistics julia> df = DataFrame(name=["John", "Sally", "Roger"], age=[54.0, 34.0, 79.0], From 2f68e21eae4c95fff628189af74a750e95d344e6 Mon Sep 17 00:00:00 2001 From: Julius Krumbiegel Date: Mon, 10 Oct 2022 12:03:34 +0200 Subject: [PATCH 3/7] apply suggestion to change section order --- docs/src/man/querying_frameworks.md | 158 ++++++++++++++-------------- 1 file changed, 79 insertions(+), 79 deletions(-) diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index 7143f2ce44..04e70db1ac 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -1,7 +1,7 @@ # Data manipulation frameworks Three frameworks provide convenience methods to manipulate `DataFrame`s: -DataFramesMeta.jl, Query.jl and DataFrameMacros.jl. They implement a functionality similar to +DataFramesMeta.jl, DataFrameMacros.jl and Query.jl. They implement a functionality similar to [dplyr](https://dplyr.tidyverse.org/) or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). @@ -117,6 +117,84 @@ julia> @chain df begin You can find more details about how this package can be used on the [DataFramesMeta.jl GitHub page](https://github.com/JuliaData/DataFramesMeta.jl). +## DataFrameMacros.jl + +[DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) is +an alternative to DataFramesMeta.jl with an additional focus on convenient +solutions for the transformation of multiple columns at once. +The instructions below are for version 0.3 of DataFrameMacros.jl. + +First, install the DataFrameMacros.jl package: + +```julia +using Pkg +Pkg.add("DataFrameMacros") +``` + +In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. +There is also a `@groupby` that works like a `@transform` with `groupby` together, +for grouping by new columns without writing them out twice. + +In the below example, you can also see some of DataFrameMacros.jl' multi-column +features, where `mean` is applied to both age columns at once by selecting +them with the `r"age"` regex. The new column names are then derived using the +`"{}"` shortcut which splices the transformed column names into a string. + +```jldoctest dataframemacros +julia> using DataFrames, DataFrameMacros, Chain, Statistics + +julia> df = DataFrame(name=["John", "Sally", "Roger"], + age=[54.0, 34.0, 79.0], + children=[0, 2, 4]) +3×3 DataFrame + Row │ name age children + │ String Float64 Int64 +─────┼─────────────────────────── + 1 │ John 54.0 0 + 2 │ Sally 34.0 2 + 3 │ Roger 79.0 4 + +julia> @chain df begin + @transform :age_months = :age * 12 + @groupby :has_child = :children > 0 + @combine "mean_{}" = mean({r"age"}) + end +2×3 DataFrame + Row │ has_child mean_age mean_age_months + │ Bool Float64 Float64 +─────┼────────────────────────────────────── + 1 │ false 54.0 648.0 + 2 │ true 56.5 678.0 +``` + +There's also the capability to reference a group of multiple columns as a single unit, +for example to run aggregations over them, with the `{{ }}` syntax. +In the following example, the first quarter is compared to the maximum of the other three: + +```jldoctest dataframemacros +julia> df = DataFrame( + q1 = [12.0, 0.4, 42.7], + q2 = [6.4, 2.3, 40.9], + q3 = [9.5, 0.2, 13.6], + q4 = [6.3, 5.4, 39.3]) +3×4 DataFrame + Row │ q1 q2 q3 q4 + │ Float64 Float64 Float64 Float64 +─────┼──────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 + 2 │ 0.4 2.3 0.2 5.4 + 3 │ 42.7 40.9 13.6 39.3 + +julia> @transform df :q1_best = :q1 > maximum({{Not(:q1)}}) +3×5 DataFrame + Row │ q1 q2 q3 q4 q1_best + │ Float64 Float64 Float64 Float64 Bool +─────┼───────────────────────────────────────────── + 1 │ 12.0 6.4 9.5 6.3 true + 2 │ 0.4 2.3 0.2 5.4 false + 3 │ 42.7 40.9 13.6 39.3 true +``` + ## Query.jl The [Query.jl](https://github.com/queryverse/Query.jl) package provides advanced @@ -247,81 +325,3 @@ These examples only scratch the surface of what one can do with referred to the [Query.jl documentation](http://www.queryverse.org/Query.jl/stable/) for more information. - -## DataFrameMacros.jl - -[DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl) is -an alternative to DataFramesMeta.jl with an additional focus on convenient -solutions for the transformation of multiple columns at once. -The instructions below are for version 0.3 of DataFrameMacros.jl. - -First, install the DataFrameMacros.jl package: - -```julia -using Pkg -Pkg.add("DataFrameMacros") -``` - -In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. -There is also a `@groupby` that works like a `@transform` with `groupby` together, -for grouping by new columns without writing them out twice. - -In the below example, you can also see some of DataFrameMacros.jl' multi-column -features, where `mean` is applied to both age columns at once by selecting -them with the `r"age"` regex. The new column names are then derived using the -`"{}"` shortcut which splices the transformed column names into a string. - -```jldoctest dataframemacros -julia> using DataFrames, DataFrameMacros, Chain, Statistics - -julia> df = DataFrame(name=["John", "Sally", "Roger"], - age=[54.0, 34.0, 79.0], - children=[0, 2, 4]) -3×3 DataFrame - Row │ name age children - │ String Float64 Int64 -─────┼─────────────────────────── - 1 │ John 54.0 0 - 2 │ Sally 34.0 2 - 3 │ Roger 79.0 4 - -julia> @chain df begin - @transform :age_months = :age * 12 - @groupby :has_child = :children > 0 - @combine "mean_{}" = mean({r"age"}) - end -2×3 DataFrame - Row │ has_child mean_age mean_age_months - │ Bool Float64 Float64 -─────┼────────────────────────────────────── - 1 │ false 54.0 648.0 - 2 │ true 56.5 678.0 -``` - -There's also the capability to reference a group of multiple columns as a single unit, -for example to run aggregations over them, with the `{{ }}` syntax. -In the following example, the first quarter is compared to the maximum of the other three: - -```jldoctest dataframemacros -julia> df = DataFrame( - q1 = [12.0, 0.4, 42.7], - q2 = [6.4, 2.3, 40.9], - q3 = [9.5, 0.2, 13.6], - q4 = [6.3, 5.4, 39.3]) -3×4 DataFrame - Row │ q1 q2 q3 q4 - │ Float64 Float64 Float64 Float64 -─────┼──────────────────────────────────── - 1 │ 12.0 6.4 9.5 6.3 - 2 │ 0.4 2.3 0.2 5.4 - 3 │ 42.7 40.9 13.6 39.3 - -julia> @transform df :q1_best = :q1 > maximum({{Not(:q1)}}) -3×5 DataFrame - Row │ q1 q2 q3 q4 q1_best - │ Float64 Float64 Float64 Float64 Bool -─────┼───────────────────────────────────────────── - 1 │ 12.0 6.4 9.5 6.3 true - 2 │ 0.4 2.3 0.2 5.4 false - 3 │ 42.7 40.9 13.6 39.3 true -``` From 712082d6120e0de2b8dbd2ce7740a4911d03b6db Mon Sep 17 00:00:00 2001 From: Julius Krumbiegel Date: Mon, 10 Oct 2022 12:05:38 +0200 Subject: [PATCH 4/7] grammar --- docs/src/man/querying_frameworks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index 04e70db1ac..a5809c4ac9 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -135,7 +135,7 @@ In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. There is also a `@groupby` that works like a `@transform` with `groupby` together, for grouping by new columns without writing them out twice. -In the below example, you can also see some of DataFrameMacros.jl' multi-column +In the example below, you can also see some of DataFrameMacros.jl' multi-column features, where `mean` is applied to both age columns at once by selecting them with the `r"age"` regex. The new column names are then derived using the `"{}"` shortcut which splices the transformed column names into a string. From 0a3715340139910a3d4d054997feceeb4bf3e597 Mon Sep 17 00:00:00 2001 From: jkrumbiegel <22495855+jkrumbiegel@users.noreply.github.com> Date: Tue, 18 Oct 2022 10:18:43 +0200 Subject: [PATCH 5/7] Update docs/src/man/querying_frameworks.md Co-authored-by: Milan Bouchet-Valat --- docs/src/man/querying_frameworks.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index a5809c4ac9..58abb12a92 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -132,8 +132,9 @@ Pkg.add("DataFrameMacros") ``` In DataFrameMacros.jl, all but the `@combine` macro are row-wise by default. -There is also a `@groupby` that works like a `@transform` with `groupby` together, -for grouping by new columns without writing them out twice. +There is also a `@groupby` which allows creating grouping columns on the fly +using the same syntax as `@transform`, for grouping by new columns +without writing them out twice. In the example below, you can also see some of DataFrameMacros.jl' multi-column features, where `mean` is applied to both age columns at once by selecting From 508156349ab10d59551d07d5ea2c5f892616dc17 Mon Sep 17 00:00:00 2001 From: jkrumbiegel <22495855+jkrumbiegel@users.noreply.github.com> Date: Tue, 18 Oct 2022 10:20:47 +0200 Subject: [PATCH 6/7] Apply suggestions from code review Co-authored-by: Milan Bouchet-Valat --- docs/src/man/querying_frameworks.md | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md index 58abb12a92..47799c5d52 100644 --- a/docs/src/man/querying_frameworks.md +++ b/docs/src/man/querying_frameworks.md @@ -136,7 +136,7 @@ There is also a `@groupby` which allows creating grouping columns on the fly using the same syntax as `@transform`, for grouping by new columns without writing them out twice. -In the example below, you can also see some of DataFrameMacros.jl' multi-column +In the example below, you can also see some of DataFrameMacros.jl's multi-column features, where `mean` is applied to both age columns at once by selecting them with the `r"age"` regex. The new column names are then derived using the `"{}"` shortcut which splices the transformed column names into a string. @@ -145,8 +145,8 @@ them with the `r"age"` regex. The new column names are then derived using the julia> using DataFrames, DataFrameMacros, Chain, Statistics julia> df = DataFrame(name=["John", "Sally", "Roger"], - age=[54.0, 34.0, 79.0], - children=[0, 2, 4]) + age=[54.0, 34.0, 79.0], + children=[0, 2, 4]) 3×3 DataFrame Row │ name age children │ String Float64 Int64 @@ -173,11 +173,10 @@ for example to run aggregations over them, with the `{{ }}` syntax. In the following example, the first quarter is compared to the maximum of the other three: ```jldoctest dataframemacros -julia> df = DataFrame( - q1 = [12.0, 0.4, 42.7], - q2 = [6.4, 2.3, 40.9], - q3 = [9.5, 0.2, 13.6], - q4 = [6.3, 5.4, 39.3]) +julia> df = DataFrame(q1 = [12.0, 0.4, 42.7], + q2 = [6.4, 2.3, 40.9], + q3 = [9.5, 0.2, 13.6], + q4 = [6.3, 5.4, 39.3]) 3×4 DataFrame Row │ q1 q2 q3 q4 │ Float64 Float64 Float64 Float64 From 3cf70d58b85fc9fa8826f2fadc06a9f453d2ba7f Mon Sep 17 00:00:00 2001 From: Julius Krumbiegel Date: Tue, 18 Oct 2022 10:23:35 +0200 Subject: [PATCH 7/7] flip order of mentions --- docs/src/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/src/index.md b/docs/src/index.md index cc48f0bf65..63d828d462 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -115,14 +115,14 @@ integrated they are with DataFrames.jl. A range of convenience functions for DataFrames.jl that augment `select` and `transform` to provide a user experience similar to that provided by [dplyr](https://dplyr.tidyverse.org/) in R. + - [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl): + Provides macro versions of the common DataFrames.jl functions similar to DataFramesMeta.jl, + with convenient syntax for the manipulation of multiple columns at once. - [Query.jl](https://github.com/queryverse/Query.jl): Query.jl provides a single framework for data wrangling that works with a range of libraries, including DataFrames.jl, other tabular data libraries (more on those below), and even non-tabular data. Provides many convenience functions analogous to those in dplyr in R or [LINQ](https://en.wikipedia.org/wiki/Language_Integrated_Query). - - [DataFrameMacros.jl](https://github.com/jkrumbiegel/DataFrameMacros.jl): - Provides macro versions of the common DataFrames.jl functions similar to DataFramesMeta.jl, - with convenient syntax for the manipulation of multiple columns at once. - You can find more information on these packages in the [Data manipulation frameworks](@ref) section of this manual. - **And More!**