-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Tables.jl API #20
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty cool.
I have not reviewed the tests.
Also it is quiet long so I started to flack out towards the end.
I think we probably should support taking the obsdim
as a kwarg.
It is what StatsBase
is moving towards if i recall discussions with @nalimilan
correctly
Unfortunately I don't think we have clearly agreed on the standard keyword argument for this. |
but we have at least settled that where possible it should be a kwarg |
fwiw CovarianceEstimation.jl also takes a |
That's true, but the annoying thing is that |
Based on a recommendation from Curtis I've introduced a |
Okay, I've resolved and applied most of the recommendations. I've commented on things that should be handled in a separate PR and responded if I disagree (e.g., I think |
Codecov Report
@@ Coverage Diff @@
## master #20 +/- ##
==========================================
+ Coverage 92.07% 97.81% +5.73%
==========================================
Files 9 10 +1
Lines 101 183 +82
==========================================
+ Hits 93 179 +86
+ Misses 8 4 -4
Continue to review full report at Codecov.
|
wv::AbstractWeights; | ||
limit::Float64=1.0, | ||
is_missing::Function=ismissing, | ||
on_complete::Function=complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we decided we were getting rid of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be useful for handling custom imputation failure logic without needing a new context type. For example, you could change the on_complete
function to throw a warning instead of needing a try/catch blocks everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not extatic about it about it, but Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, cool
round 2 of review done.
This PR is big enough it certainly deserves multiple rounds of through review.
|
||
for imputor in imputors | ||
imp = typeof(imputor)( | ||
(isa(x, AbstractContext) ? ctx : x for x in fieldvalues(imputor))... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say
Lets define:
Base.similar
on imputors,
that takes an Imputor,
and a new context
and does this.
But since this is bring removed 🤷♂
Base.depwarn( | ||
""" | ||
chain(data, args...) is deprecated. | ||
Please use result = imp1(data) |> imp2 |> imp3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An aside: does ∘
work on these?
Can we do:
(imp3 ∘ imp2 ∘ imp1)(data)
?
If one must use |>
then at least use it fully
Please use result = imp1(data) |> imp2 |> imp3 | |
Please use result = data |> imp1 |> imp2 |> imp3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes ∘
works. I'm not a fan of bare piping of data.
julia> df = dataset("boot", "neuro")
469×6 DataFrame
│ Row │ V1 │ V2 │ V3 │ V4 │ V5 │ V6 │
│ │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1 │ missing │ -203.7 │ -84.1 │ 18.5 │ missing │ missing │
│ 2 │ missing │ -203.0 │ -97.8 │ 25.8 │ 134.7 │ missing │
│ 3 │ missing │ -249.0 │ -92.1 │ 27.8 │ 177.1 │ missing │
│ 4 │ missing │ -231.5 │ -97.5 │ 27.0 │ 150.3 │ missing │
│ 5 │ missing │ missing │ -130.1 │ 25.8 │ 160.0 │ missing │
│ 6 │ missing │ -223.1 │ -70.7 │ 62.1 │ 197.5 │ missing │
│ 7 │ missing │ -164.8 │ -12.2 │ 76.8 │ 202.8 │ missing │
│ 8 │ missing │ -221.6 │ -81.9 │ 27.5 │ 144.5 │ missing │
│ 9 │ missing │ -153.7 │ -17.0 │ 76.1 │ 222.4 │ missing │
│ 10 │ missing │ -184.7 │ -47.3 │ 74.4 │ 208.9 │ missing │
│ 11 │ missing │ missing │ -148.8 │ 11.4 │ 137.7 │ missing │
│ 12 │ missing │ -197.6 │ -6.4 │ 137.1 │ missing │ missing │
│ 13 │ missing │ -247.8 │ -35.4 │ 80.9 │ 229.5 │ missing │
│ 14 │ missing │ -227.0 │ -104.7 │ 20.2 │ 140.2 │ missing │
│ 15 │ -233.6 │ -115.9 │ -10.5 │ 70.0 │ 202.6 │ missing │
│ 16 │ missing │ -232.4 │ -100.6 │ 16.8 │ 145.1 │ missing │
│ 17 │ missing │ -199.4 │ -58.2 │ 29.1 │ 184.4 │ missing │
│ 18 │ missing │ -195.7 │ -89.5 │ 26.4 │ 142.7 │ missing │
│ 19 │ missing │ -180.1 │ -65.0 │ 27.3 │ 171.1 │ missing │
│ 20 │ missing │ missing │ -85.2 │ 27.1 │ missing │ missing │
│ 21 │ missing │ -217.3 │ -77.1 │ 27.6 │ 151.5 │ missing │
│ 22 │ missing │ -139.7 │ -15.8 │ 83.0 │ 215.5 │ missing │
│ 23 │ -249.6 │ -132.8 │ -14.1 │ 78.1 │ 205.7 │ missing │
│ 24 │ missing │ -152.7 │ -36.9 │ 29.7 │ 149.8 │ missing │
│ 25 │ missing │ -224.1 │ -81.9 │ 29.1 │ 172.2 │ missing │
│ 26 │ missing │ missing │ -235.8 │ 6.0 │ 144.4 │ missing │
│ 27 │ missing │ -202.8 │ -45.1 │ 84.0 │ 227.3 │ missing │
│ 28 │ -240.9 │ -138.4 │ -21.5 │ 73.4 │ 210.6 │ missing │
│ 29 │ -247.1 │ -128.2 │ -31.3 │ 29.2 │ 143.1 │ missing │
│ 30 │ missing │ -185.4 │ -80.3 │ 23.9 │ 115.8 │ 222.7 │
│ 31 │ missing │ -182.5 │ -75.8 │ 27.5 │ 165.2 │ missing │
│ 32 │ missing │ -202.2 │ -99.1 │ 23.8 │ 136.3 │ 242.5 │
│ 33 │ missing │ -193.3 │ -82.6 │ 26.3 │ 160.5 │ missing │
│ 34 │ missing │ -189.4 │ -63.3 │ 27.6 │ 136.8 │ missing │
│ 35 │ missing │ -149.0 │ -31.0 │ 73.5 │ 187.8 │ missing │
│ 36 │ missing │ -162.4 │ -26.5 │ 72.6 │ missing │ missing │
⋮
│ 433 │ missing │ -220.6 │ -114.2 │ 9.7 │ 106.4 │ 227.9 │
│ 434 │ -219.9 │ -120.9 │ -1.3 │ 99.5 │ 207.6 │ missing │
│ 435 │ missing │ -240.5 │ -110.3 │ 26.1 │ 142.8 │ missing │
│ 436 │ missing │ -239.6 │ -121.4 │ 2.9 │ 124.9 │ missing │
│ 437 │ missing │ -139.8 │ -7.3 │ 121.0 │ missing │ missing │
│ 438 │ missing │ -212.0 │ -66.2 │ 50.4 │ 178.2 │ missing │
│ 439 │ missing │ -232.7 │ -109.2 │ 18.4 │ 127.5 │ missing │
│ 440 │ missing │ -236.3 │ -115.1 │ 5.1 │ 109.0 │ 212.0 │
│ 441 │ -241.2 │ -107.1 │ -9.1 │ 95.1 │ 198.6 │ missing │
│ 442 │ -226.7 │ -143.8 │ -30.4 │ 75.8 │ 196.6 │ missing │
│ 443 │ missing │ -131.8 │ -26.5 │ 64.7 │ 177.2 │ missing │
│ 444 │ missing │ -144.9 │ -0.9 │ 105.3 │ 230.9 │ missing │
│ 445 │ missing │ -214.0 │ -81.8 │ 66.1 │ 191.3 │ missing │
│ 446 │ missing │ -210.6 │ -94.3 │ 16.7 │ 125.5 │ 239.7 │
│ 447 │ -215.8 │ -114.8 │ -18.4 │ 65.3 │ 171.6 │ 249.7 │
│ 448 │ missing │ -156.0 │ -14.0 │ 113.7 │ 249.3 │ missing │
│ 449 │ missing │ -210.5 │ -41.9 │ missing │ missing │ missing │
│ 450 │ missing │ -189.2 │ -72.0 │ 56.8 │ 133.8 │ 246.7 │
│ 451 │ missing │ -214.2 │ -102.2 │ 5.5 │ 75.6 │ 154.3 │
│ 452 │ -219.6 │ -107.9 │ -16.0 │ 101.7 │ 186.0 │ missing │
│ 453 │ missing │ -153.0 │ -38.0 │ 61.3 │ 144.4 │ 245.9 │
│ 454 │ missing │ -179.8 │ -63.4 │ 56.0 │ 157.5 │ missing │
│ 455 │ missing │ -174.5 │ -44.8 │ 73.3 │ 179.7 │ missing │
│ 456 │ missing │ -206.8 │ -108.9 │ 3.7 │ 102.1 │ 210.3 │
│ 457 │ missing │ -169.5 │ -79.7 │ 27.9 │ 129.4 │ 242.8 │
│ 458 │ -222.2 │ -104.6 │ -2.4 │ 84.3 │ 204.7 │ missing │
│ 459 │ -236.3 │ -124.0 │ -6.8 │ 95.7 │ 196.0 │ missing │
│ 460 │ missing │ -216.5 │ -90.2 │ 27.8 │ 138.9 │ missing │
│ 461 │ missing │ -163.2 │ -43.6 │ 69.5 │ 173.9 │ missing │
│ 462 │ missing │ -207.3 │ -88.3 │ 9.6 │ 104.1 │ 218.0 │
│ 463 │ -242.6 │ -142.0 │ -21.8 │ 69.8 │ 148.7 │ missing │
│ 464 │ -235.9 │ -128.8 │ -33.1 │ 68.8 │ 177.1 │ missing │
│ 465 │ missing │ -140.8 │ -38.7 │ 58.1 │ 186.3 │ missing │
│ 466 │ missing │ -149.5 │ -40.3 │ 62.8 │ 139.7 │ 242.5 │
│ 467 │ -247.6 │ -157.8 │ -53.3 │ 28.3 │ 122.9 │ 227.6 │
│ 468 │ missing │ -154.9 │ -50.8 │ 28.1 │ 119.9 │ 201.1 │
│ 469 │ missing │ -180.7 │ -70.9 │ 33.7 │ 114.8 │ 222.5 │
julia> imp = Impute.interp() ∘ Impute.locf() ∘ Impute.nocb()
#52 (generic function with 1 method)
julia> imp(df)
469×6 DataFrame
│ Row │ V1 │ V2 │ V3 │ V4 │ V5 │ V6 │
│ │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1 │ -233.6 │ -203.7 │ -84.1 │ 18.5 │ 134.7 │ 222.7 │
│ 2 │ -233.6 │ -203.0 │ -97.8 │ 25.8 │ 134.7 │ 222.7 │
│ 3 │ -233.6 │ -249.0 │ -92.1 │ 27.8 │ 177.1 │ 222.7 │
│ 4 │ -233.6 │ -231.5 │ -97.5 │ 27.0 │ 150.3 │ 222.7 │
│ 5 │ -233.6 │ -223.1 │ -130.1 │ 25.8 │ 160.0 │ 222.7 │
│ 6 │ -233.6 │ -223.1 │ -70.7 │ 62.1 │ 197.5 │ 222.7 │
│ 7 │ -233.6 │ -164.8 │ -12.2 │ 76.8 │ 202.8 │ 222.7 │
│ 8 │ -233.6 │ -221.6 │ -81.9 │ 27.5 │ 144.5 │ 222.7 │
│ 9 │ -233.6 │ -153.7 │ -17.0 │ 76.1 │ 222.4 │ 222.7 │
│ 10 │ -233.6 │ -184.7 │ -47.3 │ 74.4 │ 208.9 │ 222.7 │
│ 11 │ -233.6 │ -197.6 │ -148.8 │ 11.4 │ 137.7 │ 222.7 │
│ 12 │ -233.6 │ -197.6 │ -6.4 │ 137.1 │ 229.5 │ 222.7 │
│ 13 │ -233.6 │ -247.8 │ -35.4 │ 80.9 │ 229.5 │ 222.7 │
│ 14 │ -233.6 │ -227.0 │ -104.7 │ 20.2 │ 140.2 │ 222.7 │
│ 15 │ -233.6 │ -115.9 │ -10.5 │ 70.0 │ 202.6 │ 222.7 │
│ 16 │ -249.6 │ -232.4 │ -100.6 │ 16.8 │ 145.1 │ 222.7 │
│ 17 │ -249.6 │ -199.4 │ -58.2 │ 29.1 │ 184.4 │ 222.7 │
│ 18 │ -249.6 │ -195.7 │ -89.5 │ 26.4 │ 142.7 │ 222.7 │
│ 19 │ -249.6 │ -180.1 │ -65.0 │ 27.3 │ 171.1 │ 222.7 │
│ 20 │ -249.6 │ -217.3 │ -85.2 │ 27.1 │ 151.5 │ 222.7 │
│ 21 │ -249.6 │ -217.3 │ -77.1 │ 27.6 │ 151.5 │ 222.7 │
│ 22 │ -249.6 │ -139.7 │ -15.8 │ 83.0 │ 215.5 │ 222.7 │
│ 23 │ -249.6 │ -132.8 │ -14.1 │ 78.1 │ 205.7 │ 222.7 │
│ 24 │ -240.9 │ -152.7 │ -36.9 │ 29.7 │ 149.8 │ 222.7 │
│ 25 │ -240.9 │ -224.1 │ -81.9 │ 29.1 │ 172.2 │ 222.7 │
│ 26 │ -240.9 │ -202.8 │ -235.8 │ 6.0 │ 144.4 │ 222.7 │
│ 27 │ -240.9 │ -202.8 │ -45.1 │ 84.0 │ 227.3 │ 222.7 │
│ 28 │ -240.9 │ -138.4 │ -21.5 │ 73.4 │ 210.6 │ 222.7 │
│ 29 │ -247.1 │ -128.2 │ -31.3 │ 29.2 │ 143.1 │ 222.7 │
│ 30 │ -247.0 │ -185.4 │ -80.3 │ 23.9 │ 115.8 │ 222.7 │
│ 31 │ -247.0 │ -182.5 │ -75.8 │ 27.5 │ 165.2 │ 242.5 │
│ 32 │ -247.0 │ -202.2 │ -99.1 │ 23.8 │ 136.3 │ 242.5 │
│ 33 │ -247.0 │ -193.3 │ -82.6 │ 26.3 │ 160.5 │ 237.9 │
│ 34 │ -247.0 │ -189.4 │ -63.3 │ 27.6 │ 136.8 │ 237.9 │
│ 35 │ -247.0 │ -149.0 │ -31.0 │ 73.5 │ 187.8 │ 237.9 │
│ 36 │ -247.0 │ -162.4 │ -26.5 │ 72.6 │ 158.5 │ 237.9 │
⋮
│ 433 │ -219.9 │ -220.6 │ -114.2 │ 9.7 │ 106.4 │ 227.9 │
│ 434 │ -219.9 │ -120.9 │ -1.3 │ 99.5 │ 207.6 │ 212.0 │
│ 435 │ -241.2 │ -240.5 │ -110.3 │ 26.1 │ 142.8 │ 212.0 │
│ 436 │ -241.2 │ -239.6 │ -121.4 │ 2.9 │ 124.9 │ 212.0 │
│ 437 │ -241.2 │ -139.8 │ -7.3 │ 121.0 │ 178.2 │ 212.0 │
│ 438 │ -241.2 │ -212.0 │ -66.2 │ 50.4 │ 178.2 │ 212.0 │
│ 439 │ -241.2 │ -232.7 │ -109.2 │ 18.4 │ 127.5 │ 212.0 │
│ 440 │ -241.2 │ -236.3 │ -115.1 │ 5.1 │ 109.0 │ 212.0 │
│ 441 │ -241.2 │ -107.1 │ -9.1 │ 95.1 │ 198.6 │ 239.7 │
│ 442 │ -226.7 │ -143.8 │ -30.4 │ 75.8 │ 196.6 │ 239.7 │
│ 443 │ -215.8 │ -131.8 │ -26.5 │ 64.7 │ 177.2 │ 239.7 │
│ 444 │ -215.8 │ -144.9 │ -0.9 │ 105.3 │ 230.9 │ 239.7 │
│ 445 │ -215.8 │ -214.0 │ -81.8 │ 66.1 │ 191.3 │ 239.7 │
│ 446 │ -215.8 │ -210.6 │ -94.3 │ 16.7 │ 125.5 │ 239.7 │
│ 447 │ -215.8 │ -114.8 │ -18.4 │ 65.3 │ 171.6 │ 249.7 │
│ 448 │ -219.6 │ -156.0 │ -14.0 │ 113.7 │ 249.3 │ 246.7 │
│ 449 │ -219.6 │ -210.5 │ -41.9 │ 56.8 │ 133.8 │ 246.7 │
│ 450 │ -219.6 │ -189.2 │ -72.0 │ 56.8 │ 133.8 │ 246.7 │
│ 451 │ -219.6 │ -214.2 │ -102.2 │ 5.5 │ 75.6 │ 154.3 │
│ 452 │ -219.6 │ -107.9 │ -16.0 │ 101.7 │ 186.0 │ 245.9 │
│ 453 │ -222.2 │ -153.0 │ -38.0 │ 61.3 │ 144.4 │ 245.9 │
│ 454 │ -222.2 │ -179.8 │ -63.4 │ 56.0 │ 157.5 │ 210.3 │
│ 455 │ -222.2 │ -174.5 │ -44.8 │ 73.3 │ 179.7 │ 210.3 │
│ 456 │ -222.2 │ -206.8 │ -108.9 │ 3.7 │ 102.1 │ 210.3 │
│ 457 │ -222.2 │ -169.5 │ -79.7 │ 27.9 │ 129.4 │ 242.8 │
│ 458 │ -222.2 │ -104.6 │ -2.4 │ 84.3 │ 204.7 │ 218.0 │
│ 459 │ -236.3 │ -124.0 │ -6.8 │ 95.7 │ 196.0 │ 218.0 │
│ 460 │ -242.6 │ -216.5 │ -90.2 │ 27.8 │ 138.9 │ 218.0 │
│ 461 │ -242.6 │ -163.2 │ -43.6 │ 69.5 │ 173.9 │ 218.0 │
│ 462 │ -242.6 │ -207.3 │ -88.3 │ 9.6 │ 104.1 │ 218.0 │
│ 463 │ -242.6 │ -142.0 │ -21.8 │ 69.8 │ 148.7 │ 242.5 │
│ 464 │ -235.9 │ -128.8 │ -33.1 │ 68.8 │ 177.1 │ 242.5 │
│ 465 │ -247.6 │ -140.8 │ -38.7 │ 58.1 │ 186.3 │ 242.5 │
│ 466 │ -247.6 │ -149.5 │ -40.3 │ 62.8 │ 139.7 │ 242.5 │
│ 467 │ -247.6 │ -157.8 │ -53.3 │ 28.3 │ 122.9 │ 227.6 │
│ 468 │ -247.6 │ -154.9 │ -50.8 │ 28.1 │ 119.9 │ 201.1 │
│ 469 │ -247.6 │ -180.7 │ -70.9 │ 33.7 │ 114.8 │ 222.5 │
end | ||
|
||
table = Tables.select(table, cnames...) |> materializer(table) | ||
return table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See again, I agree with Nick but will not block the PR over it
wv::AbstractWeights; | ||
limit::Float64=1.0, | ||
is_missing::Function=ismissing, | ||
on_complete::Function=complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not extatic about it about it, but Ok
@oxinabox Except having lots of comments makes it hard for github to load the page. If your comments can be added to a separate PR I'd appreciate if you made an issue. I also wasn't expecting people to review. |
Co-Authored-By: Lyndon White <[email protected]>
Co-Authored-By: Lyndon White <[email protected]>
* `on_complete::Function`: a function to run when imputation is complete | ||
""" | ||
function Context(; | ||
limit::Float64=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the default here 0.1
(for WeightedContext
it is 1.0
)?
They should be consistent, at least
limit::Float64=0.1, | |
limit::Float64=1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that change is breaking.
so it can't be changed til the next major release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having them be inconsistent seems uncomfortable, but then again having the default be not 1.0 is also weird... so 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default behaviour should be not to error since the threshold is somewhat arbitrary and data dependent.
end | ||
|
||
""" | ||
Context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be consistent with the docstring for WeightedContext
Context | |
Context(; limit=1.0, is_missing=ismissing, on_complete=complete) |
Althought presuming complete
is not exported, this should be Impute.complete
in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not. I might be inclined to add an @ref
once I'm happy with this, but I'm not sure it's worth investing a lot of time to make the API nice and well documented before we decide that we want to use it.
src/imputors/drop.jl
Outdated
# since Tables.rows is just an iterator | ||
table = Iterators.filter(rows) do r | ||
!any(x -> ismissing(c, x), propertyvalues(r)) | ||
end |> materializer(table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it's "de facto" the style, because it's not used anywhere (becuase a bunch of people dislike it) -- with that in mind it would be a kindness to just move the materializer
call to the next line
(but obviously not gonna not-approve over this)
try | ||
imp.context() do c | ||
for x in var | ||
ismissing(c, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am also confused by this
Co-Authored-By: Nick Robinson <[email protected]>
Co-Authored-By: Nick Robinson <[email protected]>
I do not know how to approve on Github but if someone points me to the button i'll approve :) |
Alright, since I don't have as strong of an argument for using a pipe to the |
Looks like we needed to do some cleanup of the
Context
type, but otherwise this transition was pretty smooth. It also looks like using the Tables interface has actually improved performance by minimizing data copies. Closes #21 #22 #24 #6Dataset:
Original:
New:
TODO: Tag new releases of