Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update comparisons with data.table info #2725
Update comparisons with data.table info #2725
Changes from 11 commits
0e3f29c
f92cd07
aa7e445
6c3e4e0
3a494dc
2daee86
6d9e07f
e33e492
65c61f6
90343c2
a8a08f6
10bd7c6
7375117
9f9e74e
bcbd70e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a way of taking a single column as a vector?
I think it is sometimes useful and nicely corresponds between
data.table
andDataFrames.jl
.I'm only not sure either we should use here
!
or:
, but I think view should be preferred.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion! Implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we have some work to do on this one. I can't think of an easier way right now. There may be an outstanding issue or pull request, maybe @jangorecki @MichaelChirico recall. I never wanted to encourage wide data, so my focus was on long. But I know people like to go wide like this, perhaps for presenting results in a paper or web page, so this task should be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cheating a bit — or maybe not — but I'd probably use
dcast
here.The advantage of this approach is that it also scales well to cases where you want to collapse by group. I think the 'unlist' approach would struggle here.
Mind you, grouping is something that the DataFrames.jl implementation automatically supports (and, to @mattdowle's point, might be conceptually simpler than my
dcast
workflow).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do
df[, c(lapply(.SD, max), lapply(.SD, min)), .SDcols = c("x", "y")]
. That shouldGForce
as well where theunlist
one will not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it doesn't use GForce, and also, it results in duplicate names! ouch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I would usually do this as
df[order(-x), .SD[1], by=grp]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. For this use case, it is probably more idiomatic.
However, the goal here is to showcase a function that uses subdataframes. I am afraid that if we only use
first
and.SD[1]
, this might seem more limited than using actual function on.SD
.If you have an other example to showcase using functions on
.SD
, I will be happy to take it!There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! FWIW I ran into the exact same issue writing the .SD vignette:
https://stackoverflow.com/a/47406952/3576984