-
Notifications
You must be signed in to change notification settings - Fork 998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix naming in j=c() under by= queries with lapply() optimization #4883
Conversation
…64.10. oops. also added more tests. all passing now
Codecov Report
@@ Coverage Diff @@
## master #4883 +/- ##
==========================================
+ Coverage 99.38% 99.46% +0.07%
==========================================
Files 77 73 -4
Lines 14506 14427 -79
==========================================
- Hits 14417 14350 -67
+ Misses 89 77 -12
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests on a prefix having dot in its name would be useful, as well as prefix with a whitespace inside.
@jangorecki Good idea. I'll add those and regorganize the tests a bit. I also kind of want to run these all twice, once for option optimize=0 and once for optimize=3. Optimization is really the culprit here (particularly the difference between optimize=0 and optimize=1) and we want to be sure everything behaves the same regardless of optimization level. Is there a good way to loop through options, rerunning the same tests? Or should I just copy/paste then manually edit the test numbers? |
Copy paste will be fine, if there are many tests you can wrap them in |
I need to look into nested concatenation. From the current build: library(data.table)
M <- as.data.table(mtcars)
names( M[, c(list(mpg,b=hp),lapply(.SD, mean)), by="cyl", .SDcols=c("vs", "am")])
#> [1] "cyl" "V1" "b" "vs" "am"
names( M[, c(list(mpg,b=hp),c(lapply(.SD, mean))), by="cyl", .SDcols=c("vs", "am")])
#> [1] "cyl" "" "b" "vs" "am" |
@jangorecki can you take a look at this? I added tests with "." and " " and seem to have fixed the nested c(c()) inconsistency. |
…iable name for the column created by x[1]. previously this could be an empty string column name in some circumstances
@jangorecki yep, it's added. See 2164.019 and 2164.119 |
@jangorecki Hey Jan is there any more testing you think should be done on this? This is ready from my perspective but if you or others have further suggestions I'm happy to implement them. |
@jangorecki Just another very happy user of |
@myoung3 Any chance you could fix those merge conflicts (and hopefully trigger another look by Jan or other DT core member)? For my part, I'll say that this PR looks to go on my local system. All of the new features in the latest dev version of data.table are fantastic... but this one is high up on my wish list. I'm hoping it makes the cut before CRAN submission time ;-) |
Hi @myoung3, would you like to update this PR for inclusion in the next release? |
@MichaelChirico Yeah I can probably take this on. It will take some time to reacquaint myself with the changes since the naming code is a bit esoteric. What is the release timeline? What are your thoughts on whether this change needs to be phased in gradually? See @jangorecki's comment above. On the one hand, it will change how columns are named which could break code. On the other hand, columns are currently named inconsistently (iirc, depending on whether there is a by statement) to the point of naming in this context being bugged, and I'm not sure a bug fix would warrant gradual introduction over multiple versions. |
Sorry, I haven't read anything carefully yet, I'll give you a better answer when I return to review (ideally after it's updated to current
O(months), no urgent rush. Per the new GOVERNANCE doc: Line 90 in 07fd933
We are looking at roughly July. |
Hi @myoung3 ! Thanks for this PR. Looks like this is a highly requested feature. Some minor conflicts for DESCRIPTION, NEWS and tests are left. @MichaelChirico thoughts on phasing this one in? Feel like this does fix a bug that ppl should have avoided in practice (but it will likely break in some places). |
Thanks for the PR! I've invited you to join Rdatatable/data.table as a member. That should make it easier to contribute+collaborate going forward, in particular by publishing branches directly to this repo. |
Re: possibility of breaking change -- lets see what revdeps tells us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test suite looks good, new behavior looks correct, consistent, especially a huge improvement for the ability to do c(sum=lapply(.SD, sum), max=lapply(.SD, max))
and get unique names out of it just like that. Great work!
The code in [.data.table is a bit messy, but that region is already a huge mess that you've added to only marginally. It should be refactored for readability with some helpers, later.
atime failure is unrelated, #6481 is for that. |
Was just wishing for this functionality again, whilst working a new project this week. Huzzah! |
fixes #2311