Fix naming in j=c() under by= queries with lapply() optimization #4883

myoung3 · 2021-01-31T20:18:41Z

…list yet

…tion left

…64.10. oops. also added more tests. all passing now

codecov · 2021-01-31T21:06:43Z

Codecov Report

Merging #4883 (819beaf) into master (7f0ce14) will increase coverage by 0.07%.
The diff coverage is 100.00%.

❗ Current head 819beaf differs from pull request most recent head 1c021f0. Consider uploading reports for the commit 1c021f0 to get more accurate results

@@            Coverage Diff             @@
##           master    #4883      +/-   ##
==========================================
+ Coverage   99.38%   99.46%   +0.07%     
==========================================
  Files          77       73       -4     
  Lines       14506    14427      -79     
==========================================
- Hits        14417    14350      -67     
+ Misses         89       77      -12

Impacted Files	Coverage Δ
R/data.table.R	`99.94% <100.00%> (-0.01%)`	⬇️
src/fastmean.c	`96.82% <0.00%> (-3.18%)`	⬇️
src/uniqlist.c	`98.26% <0.00%> (-1.27%)`	⬇️
src/fmelt.c	`99.05% <0.00%> (-0.95%)`	⬇️
src/between.c	`99.21% <0.00%> (-0.79%)`	⬇️
src/subset.c	`99.50% <0.00%> (-0.50%)`	⬇️
src/frollR.c	`99.53% <0.00%> (-0.47%)`	⬇️
src/forder.c	`99.61% <0.00%> (-0.39%)`	⬇️
src/dogroups.c	`99.66% <0.00%> (-0.34%)`	⬇️
src/assign.c	`99.70% <0.00%> (-0.15%)`	⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f0ce14...1c021f0. Read the comment docs.

jangorecki

Some tests on a prefix having dot in its name would be useful, as well as prefix with a whitespace inside.

myoung3 · 2021-02-01T18:17:11Z

@jangorecki Good idea. I'll add those and regorganize the tests a bit. I also kind of want to run these all twice, once for option optimize=0 and once for optimize=3. Optimization is really the culprit here (particularly the difference between optimize=0 and optimize=1) and we want to be sure everything behaves the same regardless of optimization level. Is there a good way to loop through options, rerunning the same tests? Or should I just copy/paste then manually edit the test numbers?

jangorecki · 2021-02-02T08:02:12Z

Copy paste will be fine, if there are many tests you can wrap them in { block so it is easier to fold and see where branches starts and finishes.

myoung3 · 2021-02-02T08:09:20Z

I need to look into nested concatenation. From the current build:

library(data.table)
M <- as.data.table(mtcars)
names( M[, c(list(mpg,b=hp),lapply(.SD, mean)), by="cyl", .SDcols=c("vs", "am")])
#> [1] "cyl" "V1"  "b"   "vs"  "am"
names( M[, c(list(mpg,b=hp),c(lapply(.SD, mean))), by="cyl", .SDcols=c("vs", "am")])
#> [1] "cyl" ""    "b"   "vs"  "am"

myoung3 · 2021-02-20T05:03:28Z

@jangorecki can you take a look at this? I added tests with "." and " " and seem to have fixed the nested c(c()) inconsistency.

…iable name for the column created by x[1]. previously this could be an empty string column name in some circumstances

myoung3 · 2021-02-20T17:42:20Z

@jangorecki yep, it's added. See 2164.019 and 2164.119

myoung3 · 2021-03-27T16:52:44Z

@jangorecki Hey Jan is there any more testing you think should be done on this? This is ready from my perspective but if you or others have further suggestions I'm happy to implement them.

legendre6891 · 2021-05-20T06:09:02Z

@jangorecki Just another very happy user of data.table chiming in: it would be really great to have this PR in the next release. I hit upon this problem this PR solves quite often.

grantmcdermott · 2021-08-31T20:17:40Z

@myoung3 Any chance you could fix those merge conflicts (and hopefully trigger another look by Jan or other DT core member)?

For my part, I'll say that this PR looks to go on my local system. All of the new features in the latest dev version of data.table are fantastic... but this one is high up on my wish list. I'm hoping it makes the cut before CRAN submission time ;-)

MichaelChirico · 2024-02-28T06:33:54Z

Hi @myoung3, would you like to update this PR for inclusion in the next release?

myoung3 · 2024-02-28T06:50:03Z

@MichaelChirico Yeah I can probably take this on. It will take some time to reacquaint myself with the changes since the naming code is a bit esoteric. What is the release timeline?

What are your thoughts on whether this change needs to be phased in gradually? See @jangorecki's comment above. On the one hand, it will change how columns are named which could break code. On the other hand, columns are currently named inconsistently (iirc, depending on whether there is a by statement) to the point of naming in this context being bugged, and I'm not sure a bug fix would warrant gradual introduction over multiple versions.

MichaelChirico · 2024-02-28T06:58:40Z

What are your thoughts on whether this change needs to be phased in gradually?

Sorry, I haven't read anything carefully yet, I'll give you a better answer when I return to review (ideally after it's updated to current master to get the cleanest understanding of the PR possible).

What is the release timeline?

O(months), no urgent rush. Per the new GOVERNANCE doc:

data.table/GOVERNANCE.md

Line 90 in 07fd933

    
           * Regular CRAN releases should ideally occur twice per year, and can include new features.

We are looking at roughly July.

TysonStanley · 2024-07-15T04:24:38Z

Hi @myoung3 ! Thanks for this PR. Looks like this is a highly requested feature. Some minor conflicts for DESCRIPTION, NEWS and tests are left. @MichaelChirico thoughts on phasing this one in? Feel like this does fix a bug that ppl should have avoided in practice (but it will likely break in some places).

MichaelChirico · 2024-09-06T06:45:40Z

Thanks for the PR! I've invited you to join Rdatatable/data.table as a member. That should make it easier to contribute+collaborate going forward, in particular by publishing branches directly to this repo.

MichaelChirico · 2024-09-06T07:08:44Z

Re: possibility of breaking change -- lets see what revdeps tells us.

MichaelChirico

Test suite looks good, new behavior looks correct, consistent, especially a huge improvement for the ability to do c(sum=lapply(.SD, sum), max=lapply(.SD, max)) and get unique names out of it just like that. Great work!

The code in [.data.table is a bit messy, but that region is already a huge mess that you've added to only marginally. It should be refactored for readability with some helpers, later.

MichaelChirico · 2024-09-06T07:30:39Z

atime failure is unrelated, #6481 is for that.

grantmcdermott · 2024-09-06T18:14:03Z

Was just wishing for this functionality again, whilst working a new project this week. Huzzah!

Michael Young added 7 commits January 31, 2021 01:37

added failing tests

fdec7a1

successfully fixed 2164.1 but broke others. also, havent fixed named …

a221c6e

…list yet

fixed random broken tests. only the c(a=list(),lapply(.SD,FUN)) situa…

9fbecf5

…tion left

More tests. Some still failing

7fd276e

c(A=list()) shouldn't get a number prefixed

8d0052b

fixed previously failing test. the issue was that I had 2164.1 and 21…

e355f80

…64.10. oops. also added more tests. all passing now

add to author list

26fd863

added github link to issue

99eec50

jangorecki reviewed Feb 1, 2021

View reviewed changes

Michael Young added 4 commits February 3, 2021 00:26

code style changes

c9ccc5a

reworked tests. now includes tests that fail only when optimize=0

75af869

fix breaking tests related to inconsistent naming of blank columnames

db69ee9

added more passing tests

6751d34

Michael Young added 2 commits February 19, 2021 22:33

cleaned up news item

fa0009e

added another passing test ensuring dt[, c(x[1], list(.),] gets a var…

6ce93bf

…iable name for the column created by x[1]. previously this could be an empty string column name in some circumstances

fix code style

fac24aa

avimallu mentioned this pull request May 22, 2021

Passing named lists to .SDcols / .SD #5020

Open

myoung3 mentioned this pull request Jul 1, 2021

Implement DT[, across(.SD, fun1, fun2, fun3), by=group] #4970

Open

MichaelChirico added this to the 1.14.1 milestone Aug 31, 2021

myoung3 added 2 commits August 31, 2021 17:26

resolved conflicts in description and tests.Rraw

37a3809

moved news item to the correct release

fe298c9

MichaelChirico modified the milestones: 1.16.0, 1.17.0 Jul 10, 2024

MichaelChirico changed the title ~~Named lapply~~ Fix naming in j=c() under by= queries with lapply() optimization Sep 6, 2024

MichaelChirico added 6 commits September 5, 2024 23:24

Merge branch 'master' into named_lapply

6933582

modernize: options= in test()

a48b28f

fix test numbers

99927ba

copy-edit NEWS

d85b5ff

redundant test objects

6c249c1

test in a loop, also test opt=2

07fd7e7

MichaelChirico self-requested a review as a code owner September 6, 2024 06:42

consistent GH name, reduce diff

f6214aa

MichaelChirico added 5 commits September 5, 2024 23:46

same

226e332

oops, undo that one. Prefer leaving released NEWS alone

a25a53e

Style touch-up, refer to tests over long comment

5efb327

Extra ')' in DESCRIPTION

279586b

nzchar again

22bede1

MichaelChirico added 3 commits September 6, 2024 00:13

Bring test code & result closer together for easier reading

0237bb1

bad copy-paste: optimize=opt

6f2bf66

more test style

578abec

MichaelChirico approved these changes Sep 6, 2024

View reviewed changes

MichaelChirico mentioned this pull request Sep 6, 2024

Only run atime tests on non-fork branches #6481

Merged

MichaelChirico merged commit cf05b4a into Rdatatable:master Sep 6, 2024
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix naming in j=c() under by= queries with lapply() optimization #4883

Fix naming in j=c() under by= queries with lapply() optimization #4883

myoung3 commented Jan 31, 2021

codecov bot commented Jan 31, 2021 •

edited

Loading

jangorecki left a comment

myoung3 commented Feb 1, 2021 •

edited

Loading

jangorecki commented Feb 2, 2021

myoung3 commented Feb 2, 2021 •

edited

Loading

myoung3 commented Feb 20, 2021

myoung3 commented Feb 20, 2021

myoung3 commented Mar 27, 2021

legendre6891 commented May 20, 2021

grantmcdermott commented Aug 31, 2021

MichaelChirico commented Feb 28, 2024

myoung3 commented Feb 28, 2024 •

edited

Loading

MichaelChirico commented Feb 28, 2024 •

edited

Loading

TysonStanley commented Jul 15, 2024

MichaelChirico commented Sep 6, 2024

MichaelChirico commented Sep 6, 2024

MichaelChirico left a comment

MichaelChirico commented Sep 6, 2024

grantmcdermott commented Sep 6, 2024

Fix naming in j=c() under by= queries with lapply() optimization #4883

Fix naming in j=c() under by= queries with lapply() optimization #4883

Conversation

myoung3 commented Jan 31, 2021

codecov bot commented Jan 31, 2021 • edited Loading

Codecov Report

jangorecki left a comment

Choose a reason for hiding this comment

myoung3 commented Feb 1, 2021 • edited Loading

jangorecki commented Feb 2, 2021

myoung3 commented Feb 2, 2021 • edited Loading

myoung3 commented Feb 20, 2021

myoung3 commented Feb 20, 2021

myoung3 commented Mar 27, 2021

legendre6891 commented May 20, 2021

grantmcdermott commented Aug 31, 2021

MichaelChirico commented Feb 28, 2024

myoung3 commented Feb 28, 2024 • edited Loading

MichaelChirico commented Feb 28, 2024 • edited Loading

TysonStanley commented Jul 15, 2024

MichaelChirico commented Sep 6, 2024

MichaelChirico commented Sep 6, 2024

MichaelChirico left a comment

Choose a reason for hiding this comment

MichaelChirico commented Sep 6, 2024

grantmcdermott commented Sep 6, 2024

codecov bot commented Jan 31, 2021 •

edited

Loading

myoung3 commented Feb 1, 2021 •

edited

Loading

myoung3 commented Feb 2, 2021 •

edited

Loading

myoung3 commented Feb 28, 2024 •

edited

Loading

MichaelChirico commented Feb 28, 2024 •

edited

Loading