Switch to Tables.jl API #20

rofinn · 2019-07-02T22:57:21Z

Looks like we needed to do some cleanup of the Context type, but otherwise this transition was pretty smooth. It also looks like using the Tables interface has actually improved performance by minimizing data copies. Closes #21 #22 #24 #6

Dataset:

julia> data = dataset("boot", "neuro")
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ missing  │ -203.7   │ -84.1   │ 18.5     │ missing  │ missing  │
│ 2   │ missing  │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ missing  │
│ 3   │ missing  │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ missing  │
│ 4   │ missing  │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ missing  │
│ 5   │ missing  │ missing  │ -130.1  │ 25.8     │ 160.0    │ missing  │
...
│ 465 │ missing  │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ missing  │
│ 466 │ missing  │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ missing  │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ missing  │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

Original:

julia> result = chain(
           data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.Interpolate, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.LOCF, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
┌ Warning: `colwise(f, d::AbstractDataFrame)` is deprecated, use `[f(col) for col = eachcol(d)]` instead.
│   caller = impute!(::Impute.NOCB, ::Impute.Context, ::DataFrame) at imputors.jl:70
└ @ Impute ~/.playground/share/tmp-impute/depot/packages/Impute/UX99F/src/imputors.jl:70
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ -233.6   │ -203.7   │ -84.1   │ 18.5     │ 134.7    │ 222.7    │
│ 2   │ -233.6   │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ 222.7    │
│ 3   │ -233.6   │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ 222.7    │
│ 4   │ -233.6   │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ 222.7    │
│ 5   │ -233.6   │ -227.3   │ -130.1  │ 25.8     │ 160.0    │ 222.7    │
...
│ 465 │ -239.8   │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ 236.375  │
│ 466 │ -243.7   │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ -247.6   │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ -247.6   │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

julia> @benchmark chain(
           $data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
BenchmarkTools.Trial:
  memory estimate:  241.88 KiB
  allocs estimate:  11304
  --------------
  minimum time:     753.368 μs (0.00% GC)
  median time:      767.009 μs (0.00% GC)
  mean time:        815.656 μs (2.97% GC)
  maximum time:     46.291 ms (98.26% GC)
  --------------
  samples:          6114
  evals/sample:     1

New:

julia> result = chain(
           data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
469×6 DataFrame
│ Row │ V1       │ V2       │ V3      │ V4       │ V5       │ V6       │
│     │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │
├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤
│ 1   │ -233.6   │ -203.7   │ -84.1   │ 18.5     │ 134.7    │ 222.7    │
│ 2   │ -233.6   │ -203.0   │ -97.8   │ 25.8     │ 134.7    │ 222.7    │
│ 3   │ -233.6   │ -249.0   │ -92.1   │ 27.8     │ 177.1    │ 222.7    │
│ 4   │ -233.6   │ -231.5   │ -97.5   │ 27.0     │ 150.3    │ 222.7    │
│ 5   │ -233.6   │ -227.3   │ -130.1  │ 25.8     │ 160.0    │ 222.7    │
...
│ 465 │ -239.8   │ -140.8   │ -38.7   │ 58.1     │ 186.3    │ 236.375  │
│ 466 │ -243.7   │ -149.5   │ -40.3   │ 62.8     │ 139.7    │ 242.5    │
│ 467 │ -247.6   │ -157.8   │ -53.3   │ 28.3     │ 122.9    │ 227.6    │
│ 468 │ -247.6   │ -154.9   │ -50.8   │ 28.1     │ 119.9    │ 201.1    │
│ 469 │ -247.6   │ -180.7   │ -70.9   │ 33.7     │ 114.8    │ 222.5    │

julia> @benchmark chain(
           $data,
           Impute.Interpolate(),
           Impute.LOCF(),
           Impute.NOCB();
           limit=1.0
       )
BenchmarkTools.Trial:
  memory estimate:  196.75 KiB
  allocs estimate:  10853
  --------------
  minimum time:     307.765 μs (0.00% GC)
  median time:      318.367 μs (0.00% GC)
  mean time:        347.685 μs (4.29% GC)
  maximum time:     45.272 ms (98.96% GC)
  --------------
  samples:          10000
  evals/sample:     1

TODO: Tag new releases of

IterTools and
StatsBase

…ferent file.

src/Impute.jl

oxinabox

Looks pretty cool.

I have not reviewed the tests.
Also it is quiet long so I started to flack out towards the end.

I think we probably should support taking the obsdim as a kwarg.
It is what StatsBase is moving towards if i recall discussions with @nalimilan
correctly

src/Impute.jl

src/imputors/locf.jl

src/imputors/interp.jl

src/imputors/nocb.jl

nalimilan · 2019-07-09T13:13:15Z

I think we probably should support taking the obsdim as a kwarg.
It is what StatsBase is moving towards if i recall discussions with @nalimilan
correctly

Unfortunately I don't think we have clearly agreed on the standard keyword argument for this. pairwise in Distances uses dims, like cov and co. in Statistics, but that's not super obvious. So we could use obsdim or vardim, but we need to discuss that somewhere. One issue is that for pairwise the question isn't really where are observations and where are variables, but rather what you want to compute (both could make sense). But maybe that's a special function and in most cases obsdim or vardim is OK.

oxinabox · 2019-07-09T13:18:34Z

but we have at least settled that where possible it should be a kwarg
and not just baked into the function itself. (and mentioned in doc string).

nickrobinson251 · 2019-07-09T18:07:22Z

fwiw CovarianceEstimation.jl also takes a dims kwarg for specifying observations

rofinn · 2019-07-09T19:16:10Z

That's true, but the annoying thing is that dims only makes sense when the input is a matrix. I'd also rather not need to pass kwargs to the internal impute functions because our handy functions extract the kwargs into an imputor type internally. If there was some nice pattern for consuming and forwarding kwargs then that might work... even if it is a little inconsistent.

…o handle.

rofinn · 2019-07-09T21:09:12Z

Based on a recommendation from Curtis I've introduced a vardim kwarg to the Imputor constructors and imputation convenience functions. The vardim field in each Imputor is ignored unless we're operating on matrices.

rofinn · 2019-07-11T19:30:09Z

Okay, I've resolved and applied most of the recommendations. I've commented on things that should be handled in a separate PR and responded if I disagree (e.g., I think |> makes sense here and you can always use explicit function calls if you don't like that).

codecov · 2019-07-11T20:07:46Z

Codecov Report

Merging #20 into master will increase coverage by 5.73%.
The diff coverage is 98.22%.

@@            Coverage Diff             @@
##           master      #20      +/-   ##
==========================================
+ Coverage   92.07%   97.81%   +5.73%     
==========================================
  Files           9       10       +1     
  Lines         101      183      +82     
==========================================
+ Hits           93      179      +86     
+ Misses          8        4       -4

Impacted Files	Coverage Δ
src/imputors/nocb.jl	`100% <100%> (ø)`	⬆️
src/imputors/interp.jl	`100% <100%> (ø)`	⬆️
src/Impute.jl	`90% <100%> (+23.33%)`	⬆️
src/imputors/chain.jl	`100% <100%> (ø)`	⬆️
src/deprecated.jl	`100% <100%> (ø)`
src/context.jl	`100% <100%> (+6.25%)`	⬆️
src/imputors/locf.jl	`100% <100%> (ø)`	⬆️
src/imputors.jl	`100% <100%> (ø)`	⬆️
src/imputors/fill.jl	`100% <100%> (ø)`	⬆️
src/imputors/drop.jl	`92.5% <92.5%> (-7.5%)`	⬇️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 255ac57...051d6ce. Read the comment docs.

oxinabox · 2019-07-12T14:39:28Z

src/context.jl

+    wv::AbstractWeights;
+    limit::Float64=1.0,
+    is_missing::Function=ismissing,
+    on_complete::Function=complete


I thought we decided we were getting rid of this?

I think it'll be useful for handling custom imputation failure logic without needing a new context type. For example, you could change the on_complete function to throw a warning instead of needing a try/catch blocks everywhere.

I'm not extatic about it about it, but Ok

oxinabox

Cool, cool
round 2 of review done.

This PR is big enough it certainly deserves multiple rounds of through review.

src/context.jl

oxinabox · 2019-07-12T14:45:28Z

src/deprecated.jl

+
+    for imputor in imputors
+        imp = typeof(imputor)(
+            (isa(x, AbstractContext) ? ctx : x for x in fieldvalues(imputor))...


I would say

Lets define:
Base.similar on imputors,
that takes an Imputor,
and a new context
and does this.

But since this is bring removed 🤷‍♂

oxinabox · 2019-07-12T14:48:09Z

src/deprecated.jl

+    Base.depwarn(
+        """
+        chain(data, args...) is deprecated.
+        Please use result = imp1(data) |> imp2 |> imp3


An aside: does ∘ work on these?

Can we do:
(imp3 ∘ imp2 ∘ imp1)(data)
?

If one must use |> then at least use it fully

Suggested change

Please use result = imp1(data) |> imp2 |> imp3

Please use result = data |> imp1 |> imp2 |> imp3

Yes ∘ works. I'm not a fan of bare piping of data.

julia> df = dataset("boot", "neuro") 469×6 DataFrame │ Row │ V1 │ V2 │ V3 │ V4 │ V5 │ V6 │ │ │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │ ├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤ │ 1 │ missing │ -203.7 │ -84.1 │ 18.5 │ missing │ missing │ │ 2 │ missing │ -203.0 │ -97.8 │ 25.8 │ 134.7 │ missing │ │ 3 │ missing │ -249.0 │ -92.1 │ 27.8 │ 177.1 │ missing │ │ 4 │ missing │ -231.5 │ -97.5 │ 27.0 │ 150.3 │ missing │ │ 5 │ missing │ missing │ -130.1 │ 25.8 │ 160.0 │ missing │ │ 6 │ missing │ -223.1 │ -70.7 │ 62.1 │ 197.5 │ missing │ │ 7 │ missing │ -164.8 │ -12.2 │ 76.8 │ 202.8 │ missing │ │ 8 │ missing │ -221.6 │ -81.9 │ 27.5 │ 144.5 │ missing │ │ 9 │ missing │ -153.7 │ -17.0 │ 76.1 │ 222.4 │ missing │ │ 10 │ missing │ -184.7 │ -47.3 │ 74.4 │ 208.9 │ missing │ │ 11 │ missing │ missing │ -148.8 │ 11.4 │ 137.7 │ missing │ │ 12 │ missing │ -197.6 │ -6.4 │ 137.1 │ missing │ missing │ │ 13 │ missing │ -247.8 │ -35.4 │ 80.9 │ 229.5 │ missing │ │ 14 │ missing │ -227.0 │ -104.7 │ 20.2 │ 140.2 │ missing │ │ 15 │ -233.6 │ -115.9 │ -10.5 │ 70.0 │ 202.6 │ missing │ │ 16 │ missing │ -232.4 │ -100.6 │ 16.8 │ 145.1 │ missing │ │ 17 │ missing │ -199.4 │ -58.2 │ 29.1 │ 184.4 │ missing │ │ 18 │ missing │ -195.7 │ -89.5 │ 26.4 │ 142.7 │ missing │ │ 19 │ missing │ -180.1 │ -65.0 │ 27.3 │ 171.1 │ missing │ │ 20 │ missing │ missing │ -85.2 │ 27.1 │ missing │ missing │ │ 21 │ missing │ -217.3 │ -77.1 │ 27.6 │ 151.5 │ missing │ │ 22 │ missing │ -139.7 │ -15.8 │ 83.0 │ 215.5 │ missing │ │ 23 │ -249.6 │ -132.8 │ -14.1 │ 78.1 │ 205.7 │ missing │ │ 24 │ missing │ -152.7 │ -36.9 │ 29.7 │ 149.8 │ missing │ │ 25 │ missing │ -224.1 │ -81.9 │ 29.1 │ 172.2 │ missing │ │ 26 │ missing │ missing │ -235.8 │ 6.0 │ 144.4 │ missing │ │ 27 │ missing │ -202.8 │ -45.1 │ 84.0 │ 227.3 │ missing │ │ 28 │ -240.9 │ -138.4 │ -21.5 │ 73.4 │ 210.6 │ missing │ │ 29 │ -247.1 │ -128.2 │ -31.3 │ 29.2 │ 143.1 │ missing │ │ 30 │ missing │ -185.4 │ -80.3 │ 23.9 │ 115.8 │ 222.7 │ │ 31 │ missing │ -182.5 │ -75.8 │ 27.5 │ 165.2 │ missing │ │ 32 │ missing │ -202.2 │ -99.1 │ 23.8 │ 136.3 │ 242.5 │ │ 33 │ missing │ -193.3 │ -82.6 │ 26.3 │ 160.5 │ missing │ │ 34 │ missing │ -189.4 │ -63.3 │ 27.6 │ 136.8 │ missing │ │ 35 │ missing │ -149.0 │ -31.0 │ 73.5 │ 187.8 │ missing │ │ 36 │ missing │ -162.4 │ -26.5 │ 72.6 │ missing │ missing │ ⋮ │ 433 │ missing │ -220.6 │ -114.2 │ 9.7 │ 106.4 │ 227.9 │ │ 434 │ -219.9 │ -120.9 │ -1.3 │ 99.5 │ 207.6 │ missing │ │ 435 │ missing │ -240.5 │ -110.3 │ 26.1 │ 142.8 │ missing │ │ 436 │ missing │ -239.6 │ -121.4 │ 2.9 │ 124.9 │ missing │ │ 437 │ missing │ -139.8 │ -7.3 │ 121.0 │ missing │ missing │ │ 438 │ missing │ -212.0 │ -66.2 │ 50.4 │ 178.2 │ missing │ │ 439 │ missing │ -232.7 │ -109.2 │ 18.4 │ 127.5 │ missing │ │ 440 │ missing │ -236.3 │ -115.1 │ 5.1 │ 109.0 │ 212.0 │ │ 441 │ -241.2 │ -107.1 │ -9.1 │ 95.1 │ 198.6 │ missing │ │ 442 │ -226.7 │ -143.8 │ -30.4 │ 75.8 │ 196.6 │ missing │ │ 443 │ missing │ -131.8 │ -26.5 │ 64.7 │ 177.2 │ missing │ │ 444 │ missing │ -144.9 │ -0.9 │ 105.3 │ 230.9 │ missing │ │ 445 │ missing │ -214.0 │ -81.8 │ 66.1 │ 191.3 │ missing │ │ 446 │ missing │ -210.6 │ -94.3 │ 16.7 │ 125.5 │ 239.7 │ │ 447 │ -215.8 │ -114.8 │ -18.4 │ 65.3 │ 171.6 │ 249.7 │ │ 448 │ missing │ -156.0 │ -14.0 │ 113.7 │ 249.3 │ missing │ │ 449 │ missing │ -210.5 │ -41.9 │ missing │ missing │ missing │ │ 450 │ missing │ -189.2 │ -72.0 │ 56.8 │ 133.8 │ 246.7 │ │ 451 │ missing │ -214.2 │ -102.2 │ 5.5 │ 75.6 │ 154.3 │ │ 452 │ -219.6 │ -107.9 │ -16.0 │ 101.7 │ 186.0 │ missing │ │ 453 │ missing │ -153.0 │ -38.0 │ 61.3 │ 144.4 │ 245.9 │ │ 454 │ missing │ -179.8 │ -63.4 │ 56.0 │ 157.5 │ missing │ │ 455 │ missing │ -174.5 │ -44.8 │ 73.3 │ 179.7 │ missing │ │ 456 │ missing │ -206.8 │ -108.9 │ 3.7 │ 102.1 │ 210.3 │ │ 457 │ missing │ -169.5 │ -79.7 │ 27.9 │ 129.4 │ 242.8 │ │ 458 │ -222.2 │ -104.6 │ -2.4 │ 84.3 │ 204.7 │ missing │ │ 459 │ -236.3 │ -124.0 │ -6.8 │ 95.7 │ 196.0 │ missing │ │ 460 │ missing │ -216.5 │ -90.2 │ 27.8 │ 138.9 │ missing │ │ 461 │ missing │ -163.2 │ -43.6 │ 69.5 │ 173.9 │ missing │ │ 462 │ missing │ -207.3 │ -88.3 │ 9.6 │ 104.1 │ 218.0 │ │ 463 │ -242.6 │ -142.0 │ -21.8 │ 69.8 │ 148.7 │ missing │ │ 464 │ -235.9 │ -128.8 │ -33.1 │ 68.8 │ 177.1 │ missing │ │ 465 │ missing │ -140.8 │ -38.7 │ 58.1 │ 186.3 │ missing │ │ 466 │ missing │ -149.5 │ -40.3 │ 62.8 │ 139.7 │ 242.5 │ │ 467 │ -247.6 │ -157.8 │ -53.3 │ 28.3 │ 122.9 │ 227.6 │ │ 468 │ missing │ -154.9 │ -50.8 │ 28.1 │ 119.9 │ 201.1 │ │ 469 │ missing │ -180.7 │ -70.9 │ 33.7 │ 114.8 │ 222.5 │ julia> imp = Impute.interp() ∘ Impute.locf() ∘ Impute.nocb() #52 (generic function with 1 method) julia> imp(df) 469×6 DataFrame │ Row │ V1 │ V2 │ V3 │ V4 │ V5 │ V6 │ │ │ Float64⍰ │ Float64⍰ │ Float64 │ Float64⍰ │ Float64⍰ │ Float64⍰ │ ├─────┼──────────┼──────────┼─────────┼──────────┼──────────┼──────────┤ │ 1 │ -233.6 │ -203.7 │ -84.1 │ 18.5 │ 134.7 │ 222.7 │ │ 2 │ -233.6 │ -203.0 │ -97.8 │ 25.8 │ 134.7 │ 222.7 │ │ 3 │ -233.6 │ -249.0 │ -92.1 │ 27.8 │ 177.1 │ 222.7 │ │ 4 │ -233.6 │ -231.5 │ -97.5 │ 27.0 │ 150.3 │ 222.7 │ │ 5 │ -233.6 │ -223.1 │ -130.1 │ 25.8 │ 160.0 │ 222.7 │ │ 6 │ -233.6 │ -223.1 │ -70.7 │ 62.1 │ 197.5 │ 222.7 │ │ 7 │ -233.6 │ -164.8 │ -12.2 │ 76.8 │ 202.8 │ 222.7 │ │ 8 │ -233.6 │ -221.6 │ -81.9 │ 27.5 │ 144.5 │ 222.7 │ │ 9 │ -233.6 │ -153.7 │ -17.0 │ 76.1 │ 222.4 │ 222.7 │ │ 10 │ -233.6 │ -184.7 │ -47.3 │ 74.4 │ 208.9 │ 222.7 │ │ 11 │ -233.6 │ -197.6 │ -148.8 │ 11.4 │ 137.7 │ 222.7 │ │ 12 │ -233.6 │ -197.6 │ -6.4 │ 137.1 │ 229.5 │ 222.7 │ │ 13 │ -233.6 │ -247.8 │ -35.4 │ 80.9 │ 229.5 │ 222.7 │ │ 14 │ -233.6 │ -227.0 │ -104.7 │ 20.2 │ 140.2 │ 222.7 │ │ 15 │ -233.6 │ -115.9 │ -10.5 │ 70.0 │ 202.6 │ 222.7 │ │ 16 │ -249.6 │ -232.4 │ -100.6 │ 16.8 │ 145.1 │ 222.7 │ │ 17 │ -249.6 │ -199.4 │ -58.2 │ 29.1 │ 184.4 │ 222.7 │ │ 18 │ -249.6 │ -195.7 │ -89.5 │ 26.4 │ 142.7 │ 222.7 │ │ 19 │ -249.6 │ -180.1 │ -65.0 │ 27.3 │ 171.1 │ 222.7 │ │ 20 │ -249.6 │ -217.3 │ -85.2 │ 27.1 │ 151.5 │ 222.7 │ │ 21 │ -249.6 │ -217.3 │ -77.1 │ 27.6 │ 151.5 │ 222.7 │ │ 22 │ -249.6 │ -139.7 │ -15.8 │ 83.0 │ 215.5 │ 222.7 │ │ 23 │ -249.6 │ -132.8 │ -14.1 │ 78.1 │ 205.7 │ 222.7 │ │ 24 │ -240.9 │ -152.7 │ -36.9 │ 29.7 │ 149.8 │ 222.7 │ │ 25 │ -240.9 │ -224.1 │ -81.9 │ 29.1 │ 172.2 │ 222.7 │ │ 26 │ -240.9 │ -202.8 │ -235.8 │ 6.0 │ 144.4 │ 222.7 │ │ 27 │ -240.9 │ -202.8 │ -45.1 │ 84.0 │ 227.3 │ 222.7 │ │ 28 │ -240.9 │ -138.4 │ -21.5 │ 73.4 │ 210.6 │ 222.7 │ │ 29 │ -247.1 │ -128.2 │ -31.3 │ 29.2 │ 143.1 │ 222.7 │ │ 30 │ -247.0 │ -185.4 │ -80.3 │ 23.9 │ 115.8 │ 222.7 │ │ 31 │ -247.0 │ -182.5 │ -75.8 │ 27.5 │ 165.2 │ 242.5 │ │ 32 │ -247.0 │ -202.2 │ -99.1 │ 23.8 │ 136.3 │ 242.5 │ │ 33 │ -247.0 │ -193.3 │ -82.6 │ 26.3 │ 160.5 │ 237.9 │ │ 34 │ -247.0 │ -189.4 │ -63.3 │ 27.6 │ 136.8 │ 237.9 │ │ 35 │ -247.0 │ -149.0 │ -31.0 │ 73.5 │ 187.8 │ 237.9 │ │ 36 │ -247.0 │ -162.4 │ -26.5 │ 72.6 │ 158.5 │ 237.9 │ ⋮ │ 433 │ -219.9 │ -220.6 │ -114.2 │ 9.7 │ 106.4 │ 227.9 │ │ 434 │ -219.9 │ -120.9 │ -1.3 │ 99.5 │ 207.6 │ 212.0 │ │ 435 │ -241.2 │ -240.5 │ -110.3 │ 26.1 │ 142.8 │ 212.0 │ │ 436 │ -241.2 │ -239.6 │ -121.4 │ 2.9 │ 124.9 │ 212.0 │ │ 437 │ -241.2 │ -139.8 │ -7.3 │ 121.0 │ 178.2 │ 212.0 │ │ 438 │ -241.2 │ -212.0 │ -66.2 │ 50.4 │ 178.2 │ 212.0 │ │ 439 │ -241.2 │ -232.7 │ -109.2 │ 18.4 │ 127.5 │ 212.0 │ │ 440 │ -241.2 │ -236.3 │ -115.1 │ 5.1 │ 109.0 │ 212.0 │ │ 441 │ -241.2 │ -107.1 │ -9.1 │ 95.1 │ 198.6 │ 239.7 │ │ 442 │ -226.7 │ -143.8 │ -30.4 │ 75.8 │ 196.6 │ 239.7 │ │ 443 │ -215.8 │ -131.8 │ -26.5 │ 64.7 │ 177.2 │ 239.7 │ │ 444 │ -215.8 │ -144.9 │ -0.9 │ 105.3 │ 230.9 │ 239.7 │ │ 445 │ -215.8 │ -214.0 │ -81.8 │ 66.1 │ 191.3 │ 239.7 │ │ 446 │ -215.8 │ -210.6 │ -94.3 │ 16.7 │ 125.5 │ 239.7 │ │ 447 │ -215.8 │ -114.8 │ -18.4 │ 65.3 │ 171.6 │ 249.7 │ │ 448 │ -219.6 │ -156.0 │ -14.0 │ 113.7 │ 249.3 │ 246.7 │ │ 449 │ -219.6 │ -210.5 │ -41.9 │ 56.8 │ 133.8 │ 246.7 │ │ 450 │ -219.6 │ -189.2 │ -72.0 │ 56.8 │ 133.8 │ 246.7 │ │ 451 │ -219.6 │ -214.2 │ -102.2 │ 5.5 │ 75.6 │ 154.3 │ │ 452 │ -219.6 │ -107.9 │ -16.0 │ 101.7 │ 186.0 │ 245.9 │ │ 453 │ -222.2 │ -153.0 │ -38.0 │ 61.3 │ 144.4 │ 245.9 │ │ 454 │ -222.2 │ -179.8 │ -63.4 │ 56.0 │ 157.5 │ 210.3 │ │ 455 │ -222.2 │ -174.5 │ -44.8 │ 73.3 │ 179.7 │ 210.3 │ │ 456 │ -222.2 │ -206.8 │ -108.9 │ 3.7 │ 102.1 │ 210.3 │ │ 457 │ -222.2 │ -169.5 │ -79.7 │ 27.9 │ 129.4 │ 242.8 │ │ 458 │ -222.2 │ -104.6 │ -2.4 │ 84.3 │ 204.7 │ 218.0 │ │ 459 │ -236.3 │ -124.0 │ -6.8 │ 95.7 │ 196.0 │ 218.0 │ │ 460 │ -242.6 │ -216.5 │ -90.2 │ 27.8 │ 138.9 │ 218.0 │ │ 461 │ -242.6 │ -163.2 │ -43.6 │ 69.5 │ 173.9 │ 218.0 │ │ 462 │ -242.6 │ -207.3 │ -88.3 │ 9.6 │ 104.1 │ 218.0 │ │ 463 │ -242.6 │ -142.0 │ -21.8 │ 69.8 │ 148.7 │ 242.5 │ │ 464 │ -235.9 │ -128.8 │ -33.1 │ 68.8 │ 177.1 │ 242.5 │ │ 465 │ -247.6 │ -140.8 │ -38.7 │ 58.1 │ 186.3 │ 242.5 │ │ 466 │ -247.6 │ -149.5 │ -40.3 │ 62.8 │ 139.7 │ 242.5 │ │ 467 │ -247.6 │ -157.8 │ -53.3 │ 28.3 │ 122.9 │ 227.6 │ │ 468 │ -247.6 │ -154.9 │ -50.8 │ 28.1 │ 119.9 │ 201.1 │ │ 469 │ -247.6 │ -180.7 │ -70.9 │ 33.7 │ 114.8 │ 222.5 │

src/imputors.jl

oxinabox · 2019-07-12T16:09:49Z

src/imputors/drop.jl

+    end
+
+    table = Tables.select(table, cnames...) |> materializer(table)
+    return table


See again, I agree with Nick but will not block the PR over it

oxinabox · 2019-07-12T16:19:34Z

src/context.jl

+    wv::AbstractWeights;
+    limit::Float64=1.0,
+    is_missing::Function=ismissing,
+    on_complete::Function=complete


I'm not extatic about it about it, but Ok

test/runtests.jl

rofinn · 2019-07-12T20:46:55Z

@oxinabox Except having lots of comments makes it hard for github to load the page. If your comments can be added to a separate PR I'd appreciate if you made an issue. I also wasn't expecting people to review.

Co-Authored-By: Lyndon White <[email protected]>

nickrobinson251 · 2019-07-15T10:56:55Z

src/context.jl

+* `on_complete::Function`: a function to run when imputation is complete
+"""
+function Context(;
+    limit::Float64=0.1,


Why is the default here 0.1 (for WeightedContext it is 1.0)?

They should be consistent, at least

Suggested change

limit::Float64=0.1,

limit::Float64=1.0,

Because that change is breaking.
so it can't be changed til the next major release

having them be inconsistent seems uncomfortable, but then again having the default be not 1.0 is also weird... so 🤷‍♂

The default behaviour should be not to error since the threshold is somewhat arbitrary and data dependent.

nickrobinson251 · 2019-07-15T10:58:15Z

src/context.jl

+end
+
+"""
+    Context


Should be consistent with the docstring for WeightedContext

Suggested change

Context

Context(; limit=1.0, is_missing=ismissing, on_complete=complete)

Althought presuming complete is not exported, this should be Impute.complete in the docstring

I'd rather not. I might be inclined to add an @ref once I'm happy with this, but I'm not sure it's worth investing a lot of time to make the API nice and well documented before we decide that we want to use it.

nickrobinson251 · 2019-07-15T11:04:02Z

src/imputors/drop.jl

+        # since Tables.rows is just an iterator
+        table = Iterators.filter(rows) do r
+            !any(x -> ismissing(c, x), propertyvalues(r))
+        end |> materializer(table)


yeah, it's "de facto" the style, because it's not used anywhere (becuase a bunch of people dislike it) -- with that in mind it would be a kindness to just move the materializer call to the next line

(but obviously not gonna not-approve over this)

nickrobinson251 · 2019-07-15T11:04:47Z

src/imputors/drop.jl

+        try
+            imp.context() do c
+                for x in var
+                    ismissing(c, x)


i am also confused by this

test/deprecated.jl

src/imputors/fill.jl

src/context.jl

Co-Authored-By: Nick Robinson <[email protected]>

nickrobinson251 · 2019-07-15T17:01:22Z

I do not know how to approve on Github but if someone points me to the button i'll approve :)

rofinn · 2019-07-15T17:08:25Z

Alright, since I don't have as strong of an argument for using a pipe to the materializer I've opted to change that. I still have a strong preference for using julia's |> and ∘ operators for composing imputation pipelines though.

rofinn mentioned this pull request Jul 3, 2019

Refactoring #17

Closed

8 tasks

rofinn added 6 commits July 5, 2019 14:08

Started work on using Tables API.

1d85605

Fixed up Context code to better fit with Tables interface changes.

78ba17f

Tests and bug fixes for working with Context types directly.

37b7ef2

Simplify exports deprecation.

45dfea9

API simplification.

7c6ceed

Fix automerge on Project.toml

f54e2e2

rofinn force-pushed the rf/tables branch from 2066b5b to f54e2e2 Compare July 5, 2019 19:09

Drop 0.7 tests and add the deprecated file.

a0ab2ea

rofinn force-pushed the rf/tables branch from 5768140 to a0ab2ea Compare July 5, 2019 20:02

rofinn added 3 commits July 5, 2019 15:45

Added a deprecation for switching to the column-major convention.

0b2bbe7

Updated tests to new API and moved existing deprecated tests to a dif…

1f99bbd

…ferent file.

Added some more tests for Chain and mutating methods.

20f084e

rofinn force-pushed the rf/tables branch from aabe774 to 20f084e Compare July 7, 2019 18:42

invenia deleted a comment from codecov bot Jul 7, 2019

rofinn added 2 commits July 8, 2019 16:57

Introduce dropobs and dropvars and deprecate Drop.

aedd1ab

Add a test for broadcasted imputation over a groupby.

5f1f4d8

rofinn force-pushed the rf/tables branch from 2a51457 to 5f1f4d8 Compare July 9, 2019 04:32

oxinabox self-assigned this Jul 9, 2019

oxinabox reviewed Jul 9, 2019

View reviewed changes

src/Impute.jl Outdated Show resolved Hide resolved

oxinabox reviewed Jul 9, 2019

View reviewed changes

Review changes.

e512171

rofinn force-pushed the rf/tables branch from 6043da8 to e512171 Compare July 9, 2019 19:19

Introduce a vardim kwarg to make the column-major convention easier t…

83a4bf5

…o handle.

invenia deleted a comment from codecov bot Jul 9, 2019

rofinn requested a review from nickrobinson251 July 11, 2019 19:30

Missed PR review fixes.

81fc7f8

rofinn force-pushed the rf/tables branch from 1322225 to 81fc7f8 Compare July 11, 2019 19:45

invenia deleted a comment from codecov bot Jul 11, 2019

oxinabox reviewed Jul 12, 2019

View reviewed changes

oxinabox requested changes Jul 12, 2019

View reviewed changes

rofinn and others added 3 commits July 12, 2019 16:25

Update src/imputors.jl

4b18a0d

Co-Authored-By: Lyndon White <[email protected]>

Update src/context.jl

fafe219

Co-Authored-By: Lyndon White <[email protected]>

Throw MethodErrors in fallback table methods.

8f0f4b6

oxinabox approved these changes Jul 12, 2019

View reviewed changes

oxinabox mentioned this pull request Jul 12, 2019

Add tests on a row table #27

Closed

nickrobinson251 reviewed Jul 15, 2019

View reviewed changes

src/imputors/fill.jl Outdated Show resolved Hide resolved

nickrobinson251 reviewed Jul 15, 2019

View reviewed changes

src/context.jl Outdated Show resolved Hide resolved

rofinn mentioned this pull request Jul 15, 2019

Deprecate interp to interpolate #28

Closed

rofinn and others added 2 commits July 15, 2019 11:34

Update src/imputors/fill.jl

d8b51d4

Co-Authored-By: Nick Robinson <[email protected]>

Update src/context.jl

7c70227

Co-Authored-By: Nick Robinson <[email protected]>

rofinn mentioned this pull request Jul 15, 2019

Deprecate ismissing(ctx, x) to ismissing!(ctx, x) #29

Closed

rofinn added 4 commits July 15, 2019 11:41

Use selectdim for obswise and varwise.

d5ff2c5

Use ∘ in tests to compose imputor pipelines.

5591076

Change !any(ismissing, ...) tests to all(!ismissing, ...)

ec902fe

Restrict RDatasets to >=0.6.2

e823cc2

nickrobinson251 approved these changes Jul 15, 2019

View reviewed changes

Don't pipe to materializer.

051d6ce

rofinn merged commit 77b9fa3 into master Jul 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Tables.jl API #20

Switch to Tables.jl API #20

rofinn commented Jul 2, 2019 •

edited

Loading

oxinabox left a comment

nalimilan commented Jul 9, 2019

oxinabox commented Jul 9, 2019

nickrobinson251 commented Jul 9, 2019

rofinn commented Jul 9, 2019 •

edited

Loading

rofinn commented Jul 9, 2019

rofinn commented Jul 11, 2019

codecov bot commented Jul 11, 2019 •

edited

Loading

oxinabox Jul 12, 2019

rofinn Jul 12, 2019 •

edited

Loading

oxinabox Jul 12, 2019

oxinabox left a comment •

edited

Loading

oxinabox Jul 12, 2019

oxinabox Jul 12, 2019

rofinn Jul 12, 2019

oxinabox Jul 12, 2019

oxinabox Jul 12, 2019

rofinn commented Jul 12, 2019

nickrobinson251 Jul 15, 2019

oxinabox Jul 15, 2019

nickrobinson251 Jul 15, 2019

rofinn Jul 15, 2019

nickrobinson251 Jul 15, 2019

rofinn Jul 15, 2019

nickrobinson251 Jul 15, 2019

nickrobinson251 Jul 15, 2019

nickrobinson251 commented Jul 15, 2019

rofinn commented Jul 15, 2019

	Please use result = imp1(data) \|> imp2 \|> imp3
	Please use result = data \|> imp1 \|> imp2 \|> imp3

+              end
+              """
+                  Context

	Context
	Context(; limit=1.0, is_missing=ismissing, on_complete=complete)

Switch to Tables.jl API #20

Switch to Tables.jl API #20

Conversation

rofinn commented Jul 2, 2019 • edited Loading

oxinabox left a comment

Choose a reason for hiding this comment

nalimilan commented Jul 9, 2019

oxinabox commented Jul 9, 2019

nickrobinson251 commented Jul 9, 2019

rofinn commented Jul 9, 2019 • edited Loading

rofinn commented Jul 9, 2019

rofinn commented Jul 11, 2019

codecov bot commented Jul 11, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

rofinn Jul 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oxinabox left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rofinn commented Jul 12, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nickrobinson251 commented Jul 15, 2019

rofinn commented Jul 15, 2019

rofinn commented Jul 2, 2019 •

edited

Loading

rofinn commented Jul 9, 2019 •

edited

Loading

codecov bot commented Jul 11, 2019 •

edited

Loading

rofinn Jul 12, 2019 •

edited

Loading

oxinabox left a comment •

edited

Loading