Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new write_board_manifest() function #661

Merged
merged 40 commits into from
Nov 18, 2022
Merged

Add new write_board_manifest() function #661

merged 40 commits into from
Nov 18, 2022

Conversation

juliasilge
Copy link
Member

Related to #650

That PR is a bit sprawling so I think this will be better if we can break it up into some pieces. This PR implements part of that change, adding the new function board_manifest() and the internal generic to call for various boards.

I have not implemented this for board_rsconnect(), because it's going to be pretty complicated. (Maybe look up all pins that the specific user can access?)

I changed the function name since this operates on a board and returns the board (it doesn't act on a pin).

@juliasilge
Copy link
Member Author

Currently one failure because we can't build the brand-new version of arrow on Windows.

@ijlyttle do you mind taking a look at this PR and seeing if you have feedback on the changes I made relative to your original work?

@juliasilge juliasilge requested a review from machow October 26, 2022 19:39
@ijlyttle
Copy link
Contributor

I think this looks great; the naming makes perfect sense!

Sorry I let this get away from me a bit. I agree that reducing the scope of each bit will make it much easier to manage.

Thanks!

Merge commit '4439a8f174f245305045b5da6f0ee7a5bce0472a'

#Conflicts:
#	.github/workflows/R-CMD-check.yaml
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Producing a manifest seems super useful! I noticed we are writing a top-level file to a board, so mostly discuss that here, but wonder if it's worth separating out the creating of a manifest file, and this new behavior (a top-level file on a board) into separate PRs?

I'm a bit concerned that writing files (or even worse folders) that aren't pins to a board may produce surprising behaviors.

From a quick test, pins-python will see pins.txt as a pin (as will an R pins azure board?). This can be patched pretty quickly, but may take a bit of syncing across libraries / users sharing a board (OTOH it may be pretty innocuous). Weird things might also happen with the pins-python cache here, but I'd have to check.

For the new behavior of adding a top-level file, can we test the following across all boards?:

  • writing a pins.txt will not cause it to show up in pin_list
  • writing a pins.txt will produce the correct result for pin_{exists,meta}("pins.txt")
  • writing a pins.txt will not cause pins::cache_{info,prune} errors / surprising behavior
  • explicate what will happen in pins legacy functions (e.g. could be explicitly unspecified)
  • explicate / test what happens if a user attempts to create a pin named "pins.txt"
  • explicate / test what happens if a user has created a pin named "pins.txt" and runs board_manifest()

It seems like these could go in test_api_basic() or a similarly named function.

Additional notes:

When I try it on s3, I get the warning below -- do you mind double checking? (happy to pair to reproduce if it's easier -- I only know how to create temporary boards using pins python, so I create them there, then connect using R pins 😬):

image

R/board_manifest.R Outdated Show resolved Hide resolved
@juliasilge juliasilge changed the title Add new board_manifest() function Add new write_board_manifest() function Nov 3, 2022
@juliasilge juliasilge marked this pull request as ready for review November 4, 2022 16:07
@juliasilge juliasilge requested a review from machow November 4, 2022 16:08
@juliasilge
Copy link
Member Author

I think I got all the tests and additional documentation added here.

I don't think I'll open an issue about doing this for Connect until someone asks for it, as it is going to be a bear (just leave it non-implemented so default method gets called). As a reminder, this is the first ~half of what #650 aims for, and there will still be more to do for the overall workflow to be possible.

Merge commit 'c408dc74b0e0cdef2249592f4e68a7701f8b9453'

#Conflicts:
#	R/board_azure.R
@ijlyttle
Copy link
Contributor

ijlyttle commented Nov 7, 2022

Hi @juliasilge - just a quick note, I have updated my workflows to use this branch, everything is working great with _pins.yaml and write_board_manifest().

This should not be in the least bit surprising, but figured you might like to know.

Thanks!

@machow
Copy link
Collaborator

machow commented Nov 8, 2022

Sorry for the wait -- I'm going to double check quickly against pins python boards, but expect everything should work fine!

@machow
Copy link
Collaborator

machow commented Nov 8, 2022

@juliasilge it appears adding a pins.txt file as a manifest breaks pin_search on python pins. The reason is that in the python version pin_list() returns the filename, and pin_search attempts to retrieve metadata for every pin as part of search.

Here's the relevant R pins pins_search code fetching metadata (which works fine with the manifest file).

wdyt is a good move here? I can modify pins-python to ignore pins.txt files, but I'm wondering what the right way sequence is for releasing..

edit: how I tested

Using an .env file with credentials filled in from .env.dev.

In python:

from dotenv import load_dotenv
load_dotenv()
from pins import board_s3
from pins.tests.helpers import BoardBuilder

bb = BoardBuilder("s3")
board = bb.create_tmp_board("pins/tests/pins-compat")

# list path to temporary pins bucket, for copying into R
board.board

Switching to R:

board <- board_s3(<bucket_from_above>, prefix="<prefix_from_above>/")

# note that this also prints a warning
board %>% board_manifest()

Back in python:

board.pin_search("abc")

@juliasilge
Copy link
Member Author

juliasilge commented Nov 9, 2022

@machow In R, what worked best was making sure that pin_list() only ever returns directories. This is helpful in general in case people are putting other files in, say, an S3 bucket that holds pins. (Of course, doesn't help if there are also directories there, but at least a little better.)

What do you think about making the needed change in Python (whether that is special-casing _pins.yaml or only listing dirs) and releasing to PyPI before the next CRAN release for R? I think we could still merge here because the likelihood of someone using the development version of pins for R with pins for Python and doing pin_list() in Python seems low. Merging here let's us move forward with other related next steps.

Also, I just want to make sure you are working with the current head of this branch. You wrote board %>% board_manifest() but there hopefully is no function by that name exported with this branch currently (it's write_board_manifest().

Merge commit 'd40ac680563fca00718996d4bd890e075ba59aa8'

#Conflicts:
#	R/board_azure.R
#	R/board_ms365.R
#	R/board_s3.R
@machow
Copy link
Collaborator

machow commented Nov 15, 2022

This is helpful in general in case people are putting other files in, say, an S3 bucket that holds pins.

I wonder if we should discourage people from putting other files/folders into a pin board? Ignoring files will work okay, but if people put folders in they'll get listed as pins (afaict the exact behavior varies from board to board).

For example, if a folder with a file in it (abc/xyz.txt) is inside of an S3 board, then pin_search will raise an error (because it will try to pull the metadata for a pin named abc):

image

@machow
Copy link
Collaborator

machow commented Nov 15, 2022

What do you think about making the needed change in Python (whether that is special-casing _pins.yaml or only listing dirs) and releasing to PyPI before the next CRAN release for R?

I agree merging here seems okay -- since the odds someone uses this and is also using python boards seems low! Maybe we could include a warning when they run write_board_manifest (or the relevant command)?

In any event, I'm on board with merging as is, and then removing _pins.yaml from python board results!

@juliasilge juliasilge requested a review from hadley November 15, 2022 20:46
NEWS.md Outdated Show resolved Hide resolved
R/board.R Outdated Show resolved Hide resolved
R/board.R Outdated Show resolved Hide resolved
R/board.R Outdated Show resolved Hide resolved
R/board.R Show resolved Hide resolved
@@ -226,13 +245,11 @@ azure_ls <- function(board, dir = "") {
paths <- AzureStor::list_storage_files(
board$container,
dir = dir,
recursive = FALSE,
info = "name"
recursive = FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to know isdir, not only the name; see line 250 ⤵️

R/board_ms365.R Outdated Show resolved Hide resolved
R/board.R Outdated Show resolved Hide resolved
# values (relative paths to versions) are correct
imap(
manifest,
~ expect_identical(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way you could construct this expected values so the structure is obvious at a glance? And maybe worthwhile to create a pin with more than one version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added another version to one of the test pins 👍

I thought a while about a way to construct the expected values, but it is not obvious to me with the versions. We currently handle that with is_testing() in version_name() but that has caused problems as outlined in #640. I wonder if maybe the way to go is snapshot tests with a function to redact versions. I'll add that in the existing issue.

tests/testthat/test-utils.R Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Dec 3, 2022

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants