-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add shuffle, shuffle! functions #2048
Comments
Now you can do @nalimilan - given we have settled to treat a |
Thanks, I didn't know about |
A bit less efficient (but more aesthetic) way to do it is |
Maybe also consider offering column shuffling?
|
We treat
or this:
should be used |
Reminds me of a similar discussion about Shuffling columns doesn't sound too common, is it? |
Also another pattern that can be used to shuffle columns is An in-place operation is more challenging and will require a careful design. OK - leaving this decision post 1.0 (mostly because it is easy to do this without this function). |
I haven't seen many column permutation examples, though I use it in my work. Appreciate the pointer on how to do it. When I'm deep in a language it is obvious. In this case I'm in multiple languages and frameworks and looking for convenience functions. |
Sure. I guess the point of @nalimilan is that we want to move towards 1.0 pretty soon. In general - as we try to look at |
I'd like to add a use case that is common in my work, for grouped dataframes. I want to shuffle the groups, which in my case consist of group of items with time series of transactions. Then I want to take the first N groups after shuffle (ie randomly select N groups). Maybe there is a similarly simple way to shuffle the grouped df The following process demonstrates the steps I'm currently taking: df = DataFrame(time = [1, 2, 1, 2, 1, 2]
, amt = [19.00, 11.00, 35.50, 32.50, 5.99, 5.99]
, item = ["B001", "B001", "B020", "B020", "BX00", "BX00"])
6×3 DataFrame
│ Row │ time │ amt │ item │
│ │ Int64 │ Float64 │ String │
├─────┼───────┼─────────┼────────┤
│ 1 │ 1 │ 19.0 │ B001 │
│ 2 │ 2 │ 11.0 │ B001 │
│ 3 │ 1 │ 35.5 │ B020 │
│ 4 │ 2 │ 32.5 │ B020 │
│ 5 │ 1 │ 5.99 │ BX00 │
│ 6 │ 2 │ 5.99 │ BX00 │
using StatsBase, Pipe
@pipe df |> groupby(_, :item) |>
combine(_, :time, :amt, :item, :item => (x -> rand()) => :rando) |>
sort(_, :rando) |>
transform(_, :rando => denserank => :rnk_rnd)
6×5 DataFrame
│ Row │ item │ time │ amt │ rando │ rnk_rnd │
│ │ String │ Int64 │ Float64 │ Float64 │ Int64 │
├─────┼────────┼───────┼─────────┼──────────┼─────────┤
│ 1 │ BX00 │ 0 │ 5.99 │ 0.241881 │ 1 │
│ 2 │ BX00 │ 1 │ 5.99 │ 0.241881 │ 1 │
│ 3 │ B001 │ 0 │ 19.0 │ 0.292468 │ 2 │
│ 4 │ B001 │ 1 │ 11.0 │ 0.292468 │ 2 │
│ 5 │ B020 │ 0 │ 35.5 │ 0.70816 │ 3 │
│ 6 │ B020 │ 1 │ 32.5 │ 0.70816 │ 3 │
# I only want the original columns
@pipe filter(:rnk_rnd => <=(2), res) |>
select(_, :item, :time, :amt)
4×3 DataFrame
│ Row │ item │ time │ amt │
│ │ String │ Int64 │ Float64 │
├─────┼────────┼───────┼─────────┤
│ 1 │ BX00 │ 1 │ 5.99 │
│ 2 │ BX00 │ 2 │ 5.99 │
│ 3 │ B020 │ 1 │ 35.5 │
│ 4 │ B020 │ 2 │ 32.5 │ |
Got it: # take the first 2 shuffled groups
@pipe df |> groupby(_, :item) |>
_[shuffle(1:end)] |>
combine(_[1:2], :)
4×3 DataFrame
│ Row │ item │ time │ amt │
│ │ String │ Int64 │ Float64 │
├─────┼────────┼───────┼─────────┤
│ 1 │ BX00 │ 0 │ 5.99 │
│ 2 │ BX00 │ 1 │ 5.99 │
│ 3 │ B001 │ 0 │ 19.0 │
│ 4 │ B001 │ 1 │ 11.0 │ I guess i'll put it up on stack overflow. |
Adding this and |
Hi,
Would be helpful to see shuffle, shuffle! functions in DataFrames. Used in randomizing machine learning mini batches.
What do you think?
The text was updated successfully, but these errors were encountered: