Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp board_url() to handle manifest files #681

Merged
merged 25 commits into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
7588461
Stash work on versioned board_url
juliasilge Oct 26, 2022
96b6f33
Merge from main
juliasilge Dec 3, 2022
a048f2e
Get started on `board_url()` changes after manifest work
juliasilge Dec 3, 2022
200a2ad
Fix `pin_meta` method
juliasilge Dec 12, 2022
0ea5571
Update test to accommodate `is_testing()`
juliasilge Dec 12, 2022
fd4ba93
Update `pin_meta` to check for versioning
juliasilge Dec 12, 2022
eee8ac3
Update NEWS
juliasilge Dec 12, 2022
c2babe9
Merged origin/main into board-url-versioned
juliasilge Dec 12, 2022
4cad0a9
Get rid of "No encoding supplied: defaulting to UTF-8."
juliasilge Dec 13, 2022
0d0f7e2
Rename to `end_with_slash()`
juliasilge Dec 13, 2022
c2ebee4
Pass through other board args correctly
juliasilge Dec 13, 2022
415c1e8
Update R/board_url.R
juliasilge Dec 14, 2022
6b392e0
Feedback from code review
juliasilge Dec 14, 2022
39861d6
Test error messages better
juliasilge Dec 14, 2022
c0bba6d
Update from branch
juliasilge Dec 14, 2022
7ec357c
Update test for new error message
juliasilge Dec 14, 2022
3990c06
simplify url-patching for manifest (#686)
ijlyttle Dec 15, 2022
b780993
Merged origin/main into board-url-versioned
juliasilge Dec 15, 2022
14873da
Add example for manifest file
juliasilge Dec 15, 2022
8453c72
Update from master to main
juliasilge Dec 15, 2022
90ffc31
Update R/board_url.R
juliasilge Dec 15, 2022
b84aa87
Update R/board_url.R
juliasilge Dec 15, 2022
77e1aff
Fix up error message handling
juliasilge Dec 15, 2022
5dec63c
Remember that paste0 is already vectorized
juliasilge Dec 15, 2022
832836f
Remove unnecessary `which()`
juliasilge Dec 16, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -114,13 +114,15 @@ S3method(pin_version_delete,pins_board_folder)
S3method(pin_version_delete,pins_board_ms365)
S3method(pin_version_delete,pins_board_rsconnect)
S3method(pin_version_delete,pins_board_s3)
S3method(pin_version_delete,pins_board_url)
S3method(pin_versions,pins_board)
S3method(pin_versions,pins_board_azure)
S3method(pin_versions,pins_board_folder)
S3method(pin_versions,pins_board_kaggle_dataset)
S3method(pin_versions,pins_board_ms365)
S3method(pin_versions,pins_board_rsconnect)
S3method(pin_versions,pins_board_s3)
S3method(pin_versions,pins_board_url)
S3method(print,pin_info)
S3method(print,pins_board)
S3method(print,pins_hidden)
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
recording all pins and their versions to the board's root directory.
This function only works for boards that are not read-only
(#661, based on work of @ijlyttle).

* Updated `board_url()` to handle versions recorded via a manifest file
(#681, based on work of @ijlyttle).


# pins 1.0.3
Expand Down
2 changes: 1 addition & 1 deletion R/board.R
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ make_manifest <- function(board) {
result <- map(
pin_names,
~fs::path(.x, pin_versions(board, name = .x)$version) %>%
append_slash() %>% # versions usually don't include slash
end_with_slash() %>% # versions usually don't include slash
as.list()
)
names(result) <- pin_names
Expand Down
204 changes: 173 additions & 31 deletions R/board_url.R
Original file line number Diff line number Diff line change
@@ -1,52 +1,95 @@
#' Use a vector of URLs as a board
#'
#' @description
#' `board_url()` lets you build up a board from individual urls. This is
#' useful because [pin_download()] and [pin_read()] will be cached - they'll
#' only re-download the data if it's changed from the last time you downloaded
#' it (using the tools of
#' [HTTP caching](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)).
#' You'll also be protected from the vagaries of the internet; if a fresh
#' download fails, you'll get the previously cached result with a warning.
#' `board_url()` lets you build up a board from individual urls or a [manifest
#' file][write_board_manifest()].
#'
#' `board_url()` is read only and does not currently support versions.
#' `board_url()` is read only.
#'
#' @param urls A named character vector of URLs If the URL ends in a `/`,
#' `board_url` will look for a `data.txt` that provides metadata. The
#' easiest way to generate this file is to upload a pin directory created by
#' [board_folder()].
#' @param urls Identify available pins being served at a URL or set of URLs (see details):
#' - Unnamed string: URL to a [manifest file][write_board_manifest()].
#' - Named character vector: URLs to specific pins (does not support versioning).
#' - Named list: URLs to pin version directories (supports versioning).
#' @param use_cache_on_failure If the pin fails to download, is it ok to
#' use the last cached version? Defaults to `is_interactive()` so you'll
#' be robust to poor internet connectivity when exploring interactively,
#' but you'll get clear errors when the code is deployed.
#' @family boards
#' @inheritParams new_board
#' @details
#' The way `board_url()` works depends on the type of the `urls` argument:
#' - Unnamed character scalar, i.e. **a single URL** to a
#' [manifest file][write_board_manifest()]: If the URL ends in a `/`,
#' `board_url()` will look for a `_pins.yaml` manifest. If the manifest
#' file parses to a named list, versioning is supported. If it parses to a
#' named character vector, the board will not support versioning.
Comment on lines +21 to +25
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you point it to something other than a directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to pass in a URL to a file that is a manifest file but not named _pins.yaml, like this:

library(pins)
b1 <- board_folder(withr::local_tempdir(), versioned = TRUE)
b1 %>% pin_write(1:10, "x", type = "json")
#> Creating new version '20221215T205631Z-c3943'
#> Writing to pin 'x'
b1 %>% pin_write(11:20, "y", type = "json")
#> Creating new version '20221215T205631Z-cba09'
#> Writing to pin 'y'
b1 %>% pin_write(1:20, "y", type = "csv")
#> Creating new version '20221215T205631Z-5026d'
#> Writing to pin 'y'
write_board_manifest(b1)
#> Manifest file written to root folder of board, as `_pins.yaml`
fs::file_move(fs::path(b1$path, "_pins.yaml"), fs::path(b1$path, "_different.yaml"))
b1_path <- fs::path(b1$path)

b1_server <- webfakes::new_app()
b1_server$use(webfakes::mw_static(root = b1_path))
b1_process <- webfakes::new_app_process(b1_server)

b2 <- board_url(fs::path(b1_process$url(), "_different.yaml"))
b2 %>%
  pin_read("x") 
#> → <http:/127.0.0.1:64054/x/20221215T205631Z-c3943/data.txt> is not cacheable
#> → <http:/127.0.0.1:64054/x/20221215T205631Z-c3943/x.json> is not cacheable
#>  [1]  1  2  3  4  5  6  7  8  9 10

Created on 2022-12-15 with reprex v2.0.2

I'm sure it won't be the most common use case but seems like some nice flexibility to offer.

#' - **Named character vector of URLs**: If the URLs end in a `/`,
#' `board_url()` will look for a `data.txt` that provides metadata for the
#' associated pin. The easiest way to generate this file is to upload a pin
#' version directory created by [board_folder()]. Versioning is not supported.
#' - **Named list**, where the values are character vectors of URLs and each
#' element of the vector refers to a version of the particular pin: If a
#' URL ends in a `/`, `board_url()` will look for a `data.txt` that
#' provides metadata. Versioning is supported.
#'
#' Using a vector of URLs can be useful because [pin_download()] and
#' [pin_read()] will be cached; they'll only re-download the data if it's
#' changed from the last time you downloaded it (using the tools of
#' [HTTP caching](https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching)).
#' You'll also be protected from the vagaries of the internet; if a fresh
#' download fails, you'll get the previously cached result with a warning.
#'
#' Using a [manifest file][write_board_manifest()] can be useful because you
#' can serve a board of pins and allow collaborators to access the board
#' straight from a URL, without worrying about board-level storage details.
#'
#' @export
#' @examples
#' github_raw <- "https://raw.githubusercontent.com/"
#' board <- board_url(c(
#' files = paste0(github_raw, "rstudio/pins-r/master/tests/testthat/pin-files/"),
#' rds = paste0(github_raw, "rstudio/pins-r/master/tests/testthat/pin-rds/"),
#' raw = paste0(github_raw, "rstudio/pins-r/master/tests/testthat/pin-files/first.txt")
#' github_raw <- function(x) paste0("https://raw.githubusercontent.com/", x)
#'
#' ## with a named vector of URLs to specific pins:
#' b1 <- board_url(c(
#' files = github_raw("rstudio/pins-r/main/tests/testthat/pin-files/"),
#' rds = github_raw("rstudio/pins-r/main/tests/testthat/pin-rds/"),
#' raw = github_raw("rstudio/pins-r/main/tests/testthat/pin-files/first.txt")
#' ))
#'
#' board %>% pin_read("rds")
#' board %>% pin_browse("rds", local = TRUE)
#' b1 %>% pin_read("rds")
#' b1 %>% pin_browse("rds", local = TRUE)
#'
#' board %>% pin_download("files")
#' board %>% pin_download("raw")
board_url <- function(urls, cache = NULL, use_cache_on_failure = is_interactive()) {
if (!is.character(urls) || !is_named(urls)) {
abort("`urls` must be a named character vector")
#' b1 %>% pin_download("files")
#' b1 %>% pin_download("raw")
#'
#' ## with a manifest file:
#' b2 <- board_url(github_raw("rstudio/pins-r/main/tests/testthat/pin-board/"))
#' b2 %>% pin_list()
#' b2 %>% pin_versions("y")
#'
board_url <- function(urls,
cache = NULL,
use_cache_on_failure = is_interactive()) {

url_format <- get_url_format(urls)
if (url_format == "pins_yaml") {
manifest <- get_manifest(urls)
board <- board_url(
manifest,
cache = cache,
use_cache_on_failure = use_cache_on_failure
)
return(board)
}
versioned <- url_format == "manifest_content"

# Share cache across all instances of board_url(); pins are stored in
# directories based on the hash of the URL to avoid cache collisions.
cache <- cache %||% board_cache_path("url")

new_board_v1("pins_board_url",
new_board_v1(
"pins_board_url",
urls = urls,
cache = cache,
versioned = versioned,
use_cache_on_failure = use_cache_on_failure
)
}
Expand All @@ -69,14 +112,19 @@ pin_exists.pins_board_url <- function(board, name, ...) {
pin_meta.pins_board_url <- function(board, name, version = NULL, ...) {
check_name(name)
check_pin_exists(board, name)

if (!is.null(version)) {
abort("board_url() doesn't support versions")
if (!is.null(version) && !board$versioned) {
abort_board_not_versioned("board_url")
}

url <- board$urls[[name]]
is_dir <- grepl("/$", url)

if (board$versioned) {
versions <- pin_versions(board, name)$version
version <- check_pin_version(board, name, version)
url <- board$urls[[name]][[versions == version]]
}

is_dir <- grepl("/$", url)
cache_dir <- fs::path(board$cache, hash(url))
fs::dir_create(cache_dir)

Expand All @@ -89,7 +137,8 @@ pin_meta.pins_board_url <- function(board, name, version = NULL, ...) {
use_cache_on_failure = board$use_cache_on_failure
)
meta <- read_meta(cache_dir)
local_meta(meta,
local_meta(
meta,
name = name,
dir = cache_dir,
url = url,
Expand All @@ -102,7 +151,8 @@ pin_meta.pins_board_url <- function(board, name, version = NULL, ...) {
file = fs::path_file(url),
api_version = 1
)
local_meta(meta,
local_meta(
meta,
name = name,
dir = cache_dir,
url = url,
Expand All @@ -111,6 +161,20 @@ pin_meta.pins_board_url <- function(board, name, version = NULL, ...) {
}
}

#' @export
pin_versions.pins_board_url <- function(board, name, ...) {

if (!board$versioned) {
abort_board_not_versioned("board_url")
}

check_name(name)
check_pin_exists(board, name)

paths <- board$urls[[name]]
version_from_path(fs::path_file(paths))
}

#' @export
pin_fetch.pins_board_url <- function(board, name, version = NULL, ...) {
meta <- pin_meta(board, name, version = version)
Expand Down Expand Up @@ -141,12 +205,90 @@ pin_store.pins_board_url <- function(board, name, paths, metadata,
abort_board_read_only("board_url")
}

#' @export
pin_version_delete.pins_board_url <- function(board, name, version, ...) {
abort_board_read_only("board_url")
}

#' @export
write_board_manifest_yaml.pins_board_url <- function(board, manifest, ...) {
abort_board_read_only("board_url")
}

# Helpers ------------------------------------------------------------------

get_url_format <- function(urls) {
if (is_scalar_character(urls) && !is_named(urls)) {
juliasilge marked this conversation as resolved.
Show resolved Hide resolved
"pins_yaml"
} else if (is_list(urls) && is_named(urls) && all(map_lgl(urls, is_character))) {
"manifest_content"
} else if (is.character(urls) && is_named(urls)) {
"vector_of_urls"
} else {
cli::cli_abort(
c(
"{.var urls} must resolve to either:",
"*" = "an unnamed character scalar, i.e. a single URL",
"*" = "a named character vector",
"*" = "a named list, where all elements are character scalars or vectors"
),
class = "pins_error_board_url_argument",
urls = urls
)
}
}

get_manifest <- function(url, call = rlang::caller_env()) {
# if ends with "/", look for manifest
if (grepl("/$", url)) {
url <- paste0(url, manifest_pin_yaml_filename)
}

# if request fails or returns with error code
tryCatch(
{
resp <- httr::GET(url)
httr::stop_for_status(resp)
},
error = function(e) {
cli::cli_abort(
message = "Failed to access manifest file at {.url {url}}:",
class = "pins_error_board_url_request",
parent = e,
url = url,
call = call
)
}
)

# if file is not parsable
tryCatch(
{
text <- httr::content(resp, as = "text", encoding = "UTF-8")
manifest <- yaml::yaml.load(text)
},
error = function(e) {
cli::cli_abort(
message = c(
juliasilge marked this conversation as resolved.
Show resolved Hide resolved
"Failed to parse manifest file at URL {.url {url}}:",
" " = "{e$message}",
"i" = "Manifest file must be text and parsable as YAML."
),
class = "pins_error_board_url_parse",
parent = e,
resp = resp,
call = call
)
}
)

# url_root is directory containing manifest-file
url_root <- sub("[^/]*$", "", url)
# for each manifest entry, prepend url_root to each path entry
manifest <- map(manifest, ~ paste0(url_root, .x))
manifest
}

http_download <- function(url, path_dir, path_file, ...,
use_cache_on_failure = FALSE,
on_failure = NULL) {
Expand Down
5 changes: 5 additions & 0 deletions R/testthat.R
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,8 @@ abort_pin_versioned <- function() {
abort_board_read_only <- function(board) {
abort(glue("{board}() is read only"), class = "pins_board_read_only")
}

abort_board_not_versioned <- function(board) {
cli::cli_abort(glue("This {board}() is not versioned"),
class = "pins_board_not_versioned")
}
2 changes: 1 addition & 1 deletion R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ is_url <- function(x) {
grepl("^http://|^https://", x)
}

append_slash <- function(x) {
end_with_slash <- function(x) {
has_slash <- grepl("/$", x)
x[!has_slash] <- paste0(x[!has_slash], "/")
x
Expand Down
Loading