Skip to content

Commit

Permalink
Update docs, especially about concurrent writes (#817)
Browse files Browse the repository at this point in the history
* Remove pins 1.0 language

* Update main vignette with more caveats
  • Loading branch information
juliasilge authored Jan 8, 2024
1 parent aea52de commit 6dbad60
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 13 deletions.
4 changes: 0 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,6 @@ The pins package publishes data, models, and other R objects, making it easy to
You can pin objects to a variety of pin *boards*, including folders (to share on a networked drive or with services like DropBox), Posit Connect, Amazon S3, Google Cloud Storage, Azure storage, and Microsoft 365 (OneDrive and SharePoint).
Pins can be automatically versioned, making it straightforward to track changes, re-run analyses on historical data, and undo mistakes.

pins 1.0.0 includes a new more explicit API and greater support for versioning.
The legacy API (`pin()`, `pin_get()`, and `board_register()`) will continue to work, but new features will only be implemented with the new API, so we encourage you to switch to the modern API as quickly as possible.
Learn more in `vignette("pins-update")`.

You can use pins from Python as well as R. For example, you can use one language to read a pin created with the other. Learn more about [pins for Python](https://rstudio.github.io/pins-python/).

## Installation
Expand Down
25 changes: 16 additions & 9 deletions vignettes/pins.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,17 @@ The first argument is the object to save (usually a data frame, but it can be an
The name is basically equivalent to a file name: you'll use it when you later want to read the data from the pin.
The only rule for a pin name is that it can't contain slashes.

As you can see from the output, pins has chosen to save this data to an `.rds` file.
After you've pinned an object, you can read it back with `pin_read()`:

```{r}
board %>% pin_read("mtcars")
```

You don't need to supply the file type when reading data from a pin because pins automatically stores the file type in the [metadata](#Metadata).

## How and what to store as a pin

As you can see from the output in the previous section, pins has chosen to save this example data to an `.rds` file.
But you can choose another option depending on your goals:

- `type = "rds"` uses `writeRDS()` to create a binary R data file. It can save any R object (including trained models) but it's only readable from R, not other languages.
Expand All @@ -68,19 +78,16 @@ But you can choose another option depending on your goals:
- `type = "json"` uses `jsonlite::write_json()` to create a JSON file. Pretty much every programming language can read json files, but they only work well for nested lists.
- `type = "qs"` uses `qs::qsave()` to create a binary R data file, like `writeRDS()`. This format achieves faster read/write speeds than RDS, and compresses data more efficiently, making it a good choice for larger objects. Read more on the [qs package](https://github.com/traversc/qs).

After you've pinned an object, you can read it back with `pin_read()`:

```{r}
board %>% pin_read("mtcars")
```

You don't need to supply the file type when reading data from a pin because pins automatically stores the file type in the metadata, the topic of the next section.

Note that when the data lives elsewhere, pins takes care of downloading and caching so that it's only re-downloaded when needed.
That said, most boards transmit pins over HTTP, and this is going to be slow and possibly unreliable for very large pins.
As a general rule of thumb, we don't recommend using pins with files over 500 MB.
If you find yourself routinely pinning data larger that this, you might need to reconsider your data engineering pipeline.

Storing your data/object as a pin works well when you write from a single source or process. It is _not_ appropriate when multiple sources or processes need to write to the same pin; since the pins package reads and writes files, it cannot manage concurrent writes.

- **Good** use for pins: an ETL pipeline that stores a model or summarized dataset once a day
- **Bad** use for pins: a Shiny app that collects data from users, who may be using the app at the same time

## Metadata

Every pin is accompanied by some metadata that you can access with `pin_meta()`:
Expand Down

0 comments on commit 6dbad60

Please sign in to comment.