-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[[ subsetting much slower than $ #780
Comments
The call to |
Out of interest, do you know if $ was significantly quicker prior to tibble 3 - or was performance more equal? |
From tibble 2.1.1 on a different machine. So it seems like $ was ~2x faster on 2.1.1 and is 25-30x faster on 3.0.1. df <- tibble::tibble(x = 1)
bench::mark(
dollar = df$x,
bracket = df[["x"]],
iterations = 1000
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dollar 682.65ns 1.36us 514262. 6.28KB 0
#> 2 bracket 1.02us 1.36us 472464. 5.3KB 0 Created on 2020-06-10 by the reprex package (v0.2.1) |
Thanks @ebein (all my machines were on tibble 3 and the hassle of doing a full reinstall to check it meant I was cheeky and just asked the question!): very interesting that from a practical perspective it may be better to train my muscle memory to use $ where possible (obviously [[ has benefits where the column name isn't a constant!) |
Once we remove the |
Pure S3 dispatch without doing actual work is already 1.3 µs. Oh well... |
Now: df <- tibble::tibble(x = 1)
bench::mark(
dollar = df$x,
bracket = df[["x"]],
iterations = 1000
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 dollar 4.29µs 4.67µs 199184. 17.6KB 0
#> 2 bracket 3.91µs 4.25µs 219202. 90.6KB 0 Created on 2020-06-14 by the reprex package (v0.3.0) |
We can strive for even faster processing (closer to 2 µs), I suspect this needs a full rewrite in C. This should be fast enough for most use cases. |
tibble 3.0.2 - `[[` works with classed indexes again, e.g. created with `glue::glue()` (#778). - `add_column()` works without warning for 0-column data frames (#786). - `tribble()` now better handles named inputs (#775) and objects of non-vtrs classes like `lubridate::Period` (#784) and `formattable::formattable` (#785). - Subsetting and subassignment are faster (#780, #790, #794). - `is.null()` is preferred over `is_null()` for speed. - Implement continuous benchmarking (#793). - `is_vector_s3()` is no longer reexported from pillar (#789).
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary. |
Starting with tibble 3.0.0, column subsetting using [[ is much slower than $. This causes slowdowns in functions that call [[ many times, for example data.matrix on a wide tibble.
Created on 2020-06-03 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: