Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-defined file formats #736

Closed
4 tasks done
wlandau opened this issue Dec 22, 2021 · 5 comments
Closed
4 tasks done

User-defined file formats #736

wlandau opened this issue Dec 22, 2021 · 5 comments
Assignees

Comments

@wlandau
Copy link
Member

wlandau commented Dec 22, 2021

Prework

  • Read and agree to the code of conduct and contributing guidelines.
  • If there is already a relevant issue, whether open or closed, comment on the existing thread instead of posting a new issue.
  • New features take time and effort to create, and they take even more effort to maintain. So if the purpose of the feature is to resolve a struggle you are encountering personally, please consider first posting a "trouble" or "other" issue so we can discuss your use case and search for existing solutions first.
  • Format your code according to the tidyverse style guide.

Proposal

I am not sure this is possible, but if it is, it could catch edge cases like #599. At the interface level, it would look something like this for local Keras:

tar_target(
  name,
  command,
  format = tar_format(
    read = function(path) keras::load_model_hdf5(path),
    write = function(object, path) keras::save_model_hdf5(object = object, filepath = path),
    marshal = function(object) keras::serialize_model(object),
    unmarshal = function(object) keras::unserialize_model(object),
    cloud = "none" # or "aws" or "gcp"
  )
)

tar_format() could deparse the arguments and turn it into a character vector with a class name and encoded strings:

library(targets)

tar_format <- function(read, write, serialize, unserialize, cloud = c("none", "aws", "gcp")) {
  c(
    "custom",
    tar_format_field("read", read),
    tar_format_field("write", write),
    tar_format_field("marshal", marshal),
    tar_format_field("unmarshal", marshal),
    paste0("cloud=", match.arg(cloud))
  )
}

tar_format_field <- function(key, value) {
  paste0(key, "=", base64url::base64_urlencode(tar_deparse_safe(value)))
}

tar_format(
  read = function(path) keras::load_model_hdf5(path),
  write = function(object, path) keras::save_model_hdf5(object = object, filepath = path),
  marshal = function(object) keras::serialize_model(object),
  unmarshal = function(object) keras::unserialize_model(object),
  cloud = "none"
)
#> [1] "custom"                                                                                                            
#> [2] "read=ZnVuY3Rpb24gKHBhdGgpIAprZXJhczo6bG9hZF9tb2RlbF9oZGY1KHBhdGgp"                                                 
#> [3] "write=ZnVuY3Rpb24gKG9iamVjdCwgcGF0aCkgCmtlcmFzOjpzYXZlX21vZGVsX2hkZjUob2JqZWN0ID0gb2JqZWN0LCBmaWxlcGF0aCA9IHBhdGgp"
#> [4] "marshal=ZnVuY3Rpb24gKG9iamVjdCkgCmtlcmFzOjpzZXJpYWxpemVfbW9kZWwob2JqZWN0KQ"                                      
#> [5] "unmarshal=ZnVuY3Rpb24gKG9iamVjdCkgCmtlcmFzOjp1bnNlcmlhbGl6ZV9tb2RlbChvYmplY3Qp"                                  
#> [6] "cloud=none" 

Plan:

@wlandau wlandau self-assigned this Dec 22, 2021
@wlandau
Copy link
Member Author

wlandau commented Dec 23, 2021

On second thought, better to use the resources interface instead.

@wlandau
Copy link
Member Author

wlandau commented Dec 23, 2021

I just tried to implement this, and creating a new "store_custom" class modeled after local keras is not a good fit for the design. Another idea is to dynamically create the S3 methods and assign them to an environment that the package can access at runtime. But what environment? assignInNamespace() cannot create new bindings, and tar_option_get("envir") is no automatically reachable from the package unless it is the global environment, and it is not good practice to use the global environment like this.

@wlandau
Copy link
Member Author

wlandau commented Dec 24, 2021

Yeah, any storage methods should really be part of the package if they exist. Not a good fit.

@wlandau wlandau closed this as completed Dec 24, 2021
@wlandau
Copy link
Member Author

wlandau commented Dec 24, 2021

So it wasn't working because resources can't be saved to metadata (they are a nested list with non-exportable objects). So it really does have to be part of the format string.

@wlandau wlandau reopened this Dec 24, 2021
@wlandau
Copy link
Member Author

wlandau commented Dec 24, 2021

Conceptually makes sense because marshal/unmarshal is part of the format definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant