-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
collapse_index on a broken index gives difficult-to-debug errors #56
Comments
I think there are two problems here. library(tibbletime)
library(dplyr)
data(FB)
FB_time <- as_tbl_time(FB, date)
head(FB_time)
#> # A time tibble: 6 x 8
#> # Index: date
#> symbol date open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400. 28.0
#> 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600. 27.8
#> 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400. 28.8
#> 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800. 29.4
#> 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300. 29.1
#> 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700. 30.6
FB_time2 <- rename(FB_time, date2 = date)
# Notice how "date" is the index but "date2" is the column name
FB_time2
#> # A time tibble: 1,008 x 8
#> # Index: date
#> symbol date2 open high low close volume adjusted
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2013-01-02 27.4 28.2 27.4 28.0 69846400. 28.0
#> 2 FB 2013-01-03 27.9 28.5 27.6 27.8 63140600. 27.8
#> 3 FB 2013-01-04 28.0 28.9 27.8 28.8 72715400. 28.8
#> 4 FB 2013-01-07 28.7 29.8 28.6 29.4 83781800. 29.4
#> 5 FB 2013-01-08 29.5 29.6 28.9 29.1 45871300. 29.1
#> 6 FB 2013-01-09 29.7 30.6 29.5 30.6 104787700. 30.6
#> 7 FB 2013-01-10 30.6 31.5 30.3 31.3 95316400. 31.3
#> 8 FB 2013-01-11 31.3 32.0 31.1 31.7 89598000. 31.7
#> 9 FB 2013-01-14 32.1 32.2 30.6 31.0 98892800. 31.0
#> 10 FB 2013-01-15 30.6 31.7 29.9 30.1 173242600. 30.1
#> # ... with 998 more rows
# Can't find the "date" index
collapse_by(FB_time2)
#> Error in mutate_impl(.data, dots): Evaluation error: cannot coerce type 'closure' to vector of type 'double'.
# This is your problem. NA values mess up the collapse_by()
FB_time$date[1] <- NA
collapse_by(FB_time)
#> Error in mutate_impl(.data, dots): Evaluation error: missing value where TRUE/FALSE needed. Created on 2018-03-26 by the reprex package (v0.2.0). |
For the first problem of changing the name of the index, I think that every function that could run into this problem calls |
Additionally, I am going to support |
Is renaming the index column name itself bad? If we can detect it is a rename, would it be possible to honour the renamed index variable instead? |
The only way to possibly detect it is if it came from a |
…names. This catches manual renaming or removal of an index and throws an error when the user next uses a tibbletime function requiring the index. Also refactor getter assertions. See #56
That’s reasonable. Is it worth /possible to warn the user that performing dplyr rename on a tbl_time outputs a tibble not a tbl_time? Doing an explicit as_tibble forced downcast would be a lot better than coming back to the code later and trying to guess why the index doesn’t work! |
I can see the value in both, and I kind of changed my mind after a bit of thinking. I added a branch that implements
|
If it seems like it works nicely, I'll use it. If not, I'll revert back to returning a |
Hmm it might be worth warning if the output will be downcast just to avoid difficult-to-spot errors later. If you are re-reading the code later seeing code which reads: Dplyr::select(as_tibble(myTblTime) , foo, bar) gives you a hint you’re about to just get a tibble back, Whereas: Dplyr::select(myTblTime, foo, bar) Doesn’t intuitively tell you that you’re about to lose time indexing. Just a “warning - dplyr::select will downcast ‘myTblTime’ to a tibble. If you intend to do this, force downcast myTblTime using as_tibble” would suffice as a reminder. |
(Only saying this having had a really fun session debugging my code on Sunday as you can guess!) Dplyr code feels really obvious at the time. But things like tidyeval can make an innocuous-looking dplyr statement very tricky indeed, so might be worth just enforcing good practice. |
One thing i worry about is being too noisy, especially when programming with this. I do the select() thing all the time inside package functions, knowing that it will drop the tbl_time class, but i dont want the user to see that because i am just doing it for some intermediate calculation. I could obviously opt for some kind of |
I think I'd compare this to doing a dplyr::join where if you fail to specify all the join parameters - you get a message which tells you what else has been used to make the join. I think if you can downcast the object type to a tibble before using with dplyr this is a sensible way to omit the message whilst acknowledging that you know you're only going to get a tibble and not a tbl_time. I wouldn't be keen on the verbose = TRUE because it doesn't cleanly indicate what it's hiding and I agree it would be clumsy. as_tibble(tbl_time) for the input feels cleaner and makes object type in = object type out, which reads more logically. Perhaps the message should be cleaner: 'dplyr::select downcasts the tbl_time to a tibble. See X for more information' where X is a help page which advises which operations would take in a tbl_time and output a tibble, and advise the coding acknowledgement. |
Tested with the new version this evening. We get a warning that the index has gone. I have then just reapplied a new index to the renamed column. All looks good so far :-). |
Hi,
This is really feedback from an 'I shot myself in the foot and spent ages working out why/how' thing. I don't have a reprex per se - I suspect one could be made if I think about it for long enough...Apologies that I haven't had time to explore the issue further (yet).
I managed to produce a broken tibbletime index. The way I think I did this was by left joining two tibbletime tables. Because the tables started out with the same name, I renamed the index on the LHS before joining (so far so good - so our intended index is on the LHS with a new name, and the old index on the RHS has the original name and is still present but now may contain NULLs).
However I then tried to call dplyr::mutate with collapse_index on the old index name to make the collapsed index field by mistake rather than the new name.
The error you get is: Error in mutate_impl(.data, dots) :
Evaluation error: missing value where TRUE/FALSE needed.
Clearly that is a long way away from where the error truly lies (ie. using an invalid index).
My thoughts:
Hope this makes sense. If I get the time I'll try to make a reprex for you of this is in action if it doesn't make sense from the above!
The text was updated successfully, but these errors were encountered: