-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New force_identical_write
arg
#735
Conversation
Tested and works as expected. |
Thank you so much @kmasiello! 🙌
This is a good point. Right now, Do we think that now in this PR |
cli::cli_warn( | ||
"The hash of pin {.val {name}} has not changed and will not be stored." | ||
) | ||
return(invisible(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could return something different here, like NULL
maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct; it should return what pin_write()
usually returns.
Refined the warning message a smidge: library(pins)
b <- board_connect()
#> Connecting to Posit Connect 2023.03.0 at <https://colorado.posit.co/rsc>
b %>% pin_write(1:10, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/really-nice-numbers'
b %>% pin_write(1:10, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Warning:
#> ! The hash of pin "julia.silge/really-nice-numbers" has not changed.
#> • Your pin will not be stored.
b %>% pin_write(1:11, "julia.silge/really-nice-numbers", check_hash = TRUE)
#> Guessing `type = 'rds'`
#> Writing to pin 'julia.silge/really-nice-numbers' Created on 2023-05-01 with reprex v2.0.2 |
My gut says this should be handled with the broader considerations of what we return. I don't want to see a bespoke solution here that isn't compatible with future work. |
Additional suggestion and comments after reviewing with users - Checking metadata for changes is an outstanding ask. A sample scenario is using |
Can I make sure I am understanding this? It sounds like users are expressing a desire to:
This would require a more extensive change than in this PR, will be pretty tough in the near term, and likely would require special handling of different backends (S3 vs. Connect vs. folders vs. ...). As of now, updating the metadata but not the pin contents isn't something straightforward to implement. We could consider how to approach this problem in the future. I tend to think we should stick with I tend to think we should still go with the change as scoped here because it lets us offer a solution to the storing-many-copies problem. It does mean folks have to choose between having the metadata reflect the last time the pin contents really changed and the last time a write was attempted; this still seems better than the situation as it is now. |
Agreed. As currently scoped, this helps the main use case tremendously. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems so simple — I love it 😄
I also refreshed my memory of pin_hash()
, and I'm confident it's ok to be used in this way. There is some chance that you might get the same hash value with different data values, but it's extremely small. (Or maybe this is better: if you hashed a dataset every day for two years, you're more likely to get hit by a meteor than get a hash collision.)
@@ -111,6 +116,17 @@ pin_write <- function(board, x, | |||
) | |||
meta$user <- metadata | |||
|
|||
if (check_hash) { | |||
old_hash <- possibly_pin_meta(board, name)$pin_hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we think this might fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to let someone use check_hash = TRUE
when they write a pin for the first time, before there is any metadata. old_hash
returns as NULL
here when there isn't any metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I think I'd prefer an explicit pin_exists()
check? Is there some reason not to do that?
cli::cli_warn( | ||
"The hash of pin {.val {name}} has not changed and will not be stored." | ||
) | ||
return(invisible(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's correct; it should return what pin_write()
usually returns.
R/pin-read-write.R
Outdated
@@ -64,6 +64,10 @@ pin_read <- function(board, name, version = NULL, hash = NULL, ...) { | |||
#' use the default for `board` | |||
#' @param tags A character vector of tags for the pin; most important for | |||
#' discoverability on shared boards. | |||
#' @param check_hash Check whether the pin contents are identical to the last |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider defaulting this to TRUE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's try it. A release is still a little ways out so we can see if it is a problem in an expected way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh actually @hadley the other reason I had not set it to TRUE
originally is it adds one additional call to pin_meta()
. Do we think this is OK? All the API calls can be especially painful on Connect, but maybe adding yet one more set isn't much marginal difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, I think it's probably still worth it. It definitely saves you a lot of time/space in the case where the data hasn't changed. We'll definitely need to promote that change heavily.
R/pin-read-write.R
Outdated
@@ -64,6 +64,10 @@ pin_read <- function(board, name, version = NULL, hash = NULL, ...) { | |||
#' use the default for `board` | |||
#' @param tags A character vector of tags for the pin; most important for | |||
#' discoverability on shared boards. | |||
#' @param check_hash Check whether the pin contents are identical to the last | |||
#' version if one exists (using the hash), and then **do not store** the pin | |||
#' again. This argument does not check the pin metadata, only the pin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording isn't quite right because the argument isn't checking anything. But I couldn't think of better wording off the top of my head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I change the whole arg to compare_hash
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would actually be better because there is an internal function already called check_hash()
. Take a look and see if you have feedback on the new name/docs.
Co-authored-by: Hadley Wickham <[email protected]>
pin_write(b, 1:5, "x", type = "rds") | ||
pin_write(b, 1:6, "x", type = "rds") | ||
pin_write(b, 1:7, "x", type = "rds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did have to change a couple of tests if the default is compare_hash = TRUE
R/pin-read-write.R
Outdated
@@ -64,6 +64,10 @@ pin_read <- function(board, name, version = NULL, hash = NULL, ...) { | |||
#' use the default for `board` | |||
#' @param tags A character vector of tags for the pin; most important for | |||
#' discoverability on shared boards. | |||
#' @param compare_hash Compare the pin contents to the last version if one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement, but I wondered about something even more direct like write_if_different
? Or write_unchanged_data
? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about force_identical_write
? This one would default to FALSE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these ideas really speak to me (including mine), so just go with your heart 😄
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
Addresses #589
This does not solve all our hashing issues (for example, the
pin_hash
still only looks at the pin contents and not the metadata) but it is a small step toward a better hashing situation.After talking with users, I think that an opt-in new argument is the best way forward (rather than, say, a new function or new default behavior). This new
check_hash
arg comes after the dots and defaults toFALSE
. It does require another read ofpin_meta()
to get the old hash.Created on 2023-04-26 with reprex v2.0.2
With this PR as is now,
pin_write()
will still return the name invisibly but does not store the pin and generates a warning. Thoughts?