Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom type for pin_write() #649

Closed
wants to merge 9 commits into from

Conversation

ijlyttle
Copy link
Contributor

@ijlyttle ijlyttle commented Sep 6, 2022

fix #631

Sorry for delay, I finally got enough things pushed to the side of my "plate" so that I can devote some time to this - this draft is just to indicate I have not forgotten this or #627 .

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 6, 2022

I played around with a few ideas today, hoping each time to arrive at something simpler.

I still need to add tests, and more-complete documentation, but I wanted to run this proposed approach past you:

I am impressed with what you can do using rlang::call_modify(), but (at least to my eye) a "plain" function seems more familiar. You could call using:

  • a bare-function: .f = readr::write_csv
  • a purrr-style anonymous function: .f = ~arrow::write_feather(.x, .y, compression = "uncompressed")

It has to be a function that takes the object first, then the path second.

The larger question seems to be the interaction between type and .f; I propose these possibilities for each:

type:

  • NULL
  • one of the "standard" types (e.g. "csv", "json")
  • a string that begins with "custom-".

.f:

  • NULL
  • a function

Here's what I propose for behavior:

type .f behavior
NULL NULL no change, use guess_type()
NULL function use guess_type(), inform that custom function is used, write using custom function
standard NULL no change, write using object_write()
standard function inform that custom function is used, write using custom function
custom NULL abort in object_write()
custom function write using custom function (don't inform)

Notes:

  • This would let a user write type = "arrow" uncompressed, or type = "csv" using readr or vroom. It becomes the users' responsibility ensure their custom funciton comports with the type. We would inform that a standard type is using a custom function.
  • If you are specifying a custom type, it has to begin with "custom-". This seems like a way to avoid collisions with future "standard" types.

I'll start working on tests, etc., but wanted to give you an early chance to react. Thanks!

@juliasilge
Copy link
Member

I think this is looking promising, but that table you made had me realize that we are probably bumping up against argument dependencies/interdependence. This has absolutely been a headache for me before 😩 so we definitely want to avoid this pattern if possible.

What do you think about adding support for passing a function to type, for folks to do something more custom? This is how unnest_tokens() works and it has been pretty decent for maintenance and flexibility. (Take a look at the find_function() function.)

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 9, 2022

I am with you. I now appreciate the danger of argument dependencies, thanks for helping me learn new things!

I will be happy to revise things to use the type argument. There are a couple things I would like to think about, in the fullness of time:

  • If a function is passed to type, it seems the only possible value for meta$type is something like "custom". This would seem to preclude being able to assert that something is a "csv", but written using a different function. To be clear, I don't think this merits discarding the approach of overloading type, but it remains something I'd like to ponder the possibility to address.

  • pin_read() could use an argument for a custom-reader function; presumably, this would be a separate issue.

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 13, 2022

Pondering this a bit more, there already exists the "file" type used with pin_upload() and pin_download().

Overloading the type argument could provide an easier way to use pin_upload(), but instead of modifying pin_read(), one could:

very_important_data <- 
  pin_download(my_board, "my-custom-pin") |>
  custom_reader_function()

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 13, 2022

I have run into another problem: how to specify the file extension. In pin_write(), this is handled by the type; if type is a function, we can't deduce the extension.

One possible solution might be to offer a helper function, pin_file_writer():

#' @param f `function` that takes the object and a path, then writes the object to the path.
#' @param extension `character` extension for the file to be written.
#'
pin_file_writer <- function(.f, extension) {

  .f <- rlang::as_function(f)

  function(x, name = NULL) {

    # determine name, if needed
    if (is.null(name)) {
      name <- enexpr(x)
      if (is_symbol(name)) {
        name <- as.character(name)
        pins_inform("Using `name = '{name}'`")
      } else {
        abort("Must supply `name` when `x` is an expression")
      }
    }

    # determine path for file
    path <- fs::path_temp(fs::path_ext_set(fs::path_file(name), extension))

    # write object to file
    .f(x, path)

    # delete file when environment that called writer is destroyed
    withr::defer(fs::file_delete(path), envir = parent.frame())

    invisible(path)
  }
}
arrow_pin_writer <- pin_file_writer(
  \(x, path) arrow::write_feather(x, path, compression = "uncompressed"), 
  extension = "arrow"
)

pin_write(my_board, x = mtcars, type = arrow_pin_writer)

Alternatively (if I'm understanding the code correctly):

pin_upload(my_board, paths = arrow_pin_writer(mtcars))

If using pin_upload() works, would it be necessary to mess with pin_write()? It may be possible to get everything done using pin_upload() and pin_download().

I am sorry that I have turned this into a ride on the "API merry-go-round". While I enjoy an API discussion as much as (perhaps more than) the next person, I know you all have other things to do.

I'll start coding in this direction unless or until I hear otherwise :)

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 14, 2022

I think I have something minimally working; here's what I get:

library("pins")

arrow_pin_writer <- pin_file_writer(
  ~arrow::write_feather(.x, .y, compression = "uncompressed"),
  extension = "arrow"
)

board <- board_temp()

pin_write(board, mtcars, type = arrow_pin_writer)
#> Using `name = 'mtcars'`
#> Creating new version '20220914T202009Z-8a125'
#> Writing to pin 'mtcars'

pin_upload(board, arrow_pin_writer(mtcars))
#> Guessing `name = 'mtcars.arrow'`
#> Creating new version '20220914T202009Z-8a125'

pin_download(board, "mtcars") |> arrow::read_feather() |> head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

pin_download(board, "mtcars.arrow") |> arrow::read_feather() |> head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Created on 2022-09-14 by the reprex package (v2.0.1)

One thing that does not appear in the reprex, but does appear interactively:

pin_upload(board, arrow_pin_writer(mtcars))
Setting global deferred event(s).
i These will be run:
  * Automatically, when the R session ends.
  * On demand, if you call `withr::deferred_run()`.
i Use `withr::deferred_clear()` to clear them without executing.
Guessing `name = 'mtcars.arrow'`
Creating new version '20220914T214753Z-8a125'

I think this is happening because withr::defer() is using the global environment. I don't know if this is a problem or not, but I could note it in the function documentation.

Assuming things are OK, some questions:

  • do you wish to support using both pin_write() via type and pin_upload()?
  • using pin_upload(), the filename is used as the default name, whereas pin_write() uses the filename sans extension. Is this difference intended, and if not, is it worth harmonizing (presumably, using a separate issue)?
  • using pin_upload(), we don't have the message Writing to pin '...' - same question as above, is this worth harmonizing?

Thanks!

@juliasilge
Copy link
Member

juliasilge commented Sep 21, 2022

So.... I am not feeling great about the pin_file_writer() approach. 😔 I think that is going to be confusing for many users. Maybe the nice interface of pin_write() isn't easily extensible to this kind of custom writing, because of needing to know how to write, plus what to write (i.e. the extension), etc.

HOWEVER, I am really excited about making this work better via pin_upload()! 🤩 This may be the right place to provide more functionality/documentation for folks who want to pin using more custom options.

This fills the need, if I am understanding correctly?

library(pins)

b <- board_temp()
pin_name <- "my-mtcars"
path <- fs::path_temp(fs::path_ext_set(pin_name, "arrow"))
arrow::write_feather(mtcars, path, compression = "uncompressed")
b %>% pin_upload(path, pin_name, title = "My very excellent cars")
#> Creating new version '20220921T174004Z-8a125'

Created on 2022-09-21 with reprex v2.0.2

What do you think we could do to make this a nicer user experience?

@ijlyttle
Copy link
Contributor Author

ijlyttle commented Sep 21, 2022

I think the pin_upload() approach is the simplest way to go. As you point out in the example, which does fill the need, I think it becomes a matter of documentation, rather than adding a new function. That being the case, I'm thinking to close this PR, then create another PR for documentation.

@juliasilge
Copy link
Member

That would be perfect @ijlyttle! Thank you for being willing to work through this in the way we did.

@ijlyttle ijlyttle closed this Sep 22, 2022
@ijlyttle ijlyttle deleted the feature-631-type branch September 24, 2022 14:06
@ijlyttle ijlyttle restored the feature-631-type branch September 24, 2022 14:06
@github-actions
Copy link

github-actions bot commented Oct 9, 2022

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

could pin_write() support type = "arrow" uncompressed?
2 participants