Skip to content

Commit

Permalink
feat(pins): add support for custom storage options in board_s3 (#237)
Browse files Browse the repository at this point in the history
* feat(pins): add support for custom storage options in `board_s3`

- Introduce the ability to pass arbitrary storage options to the underlying fsspec S3FileSystem in `board_s3`.
- This enhancement allows for greater flexibility when connecting to S3-compatible services by enabling the specification of custom credentials, endpoints, and other S3FileSystem parameters.
- Added documentation and examples to illustrate how to use the new `storage_options` parameter, including an example for connecting to Backblaze B2.

This change enables users to more easily integrate with a variety of S3-compatible storage solutions, improving the library's versatility and user experience.

* fix(pins): correct kwargs reference in board_s3 constructor

- Replace `**kwargs` with `**storage_options` to accurately reflect the intended parameter in `board_s3` function, ensuring the correct handling of storage options.

* docs(pins): add missing import statement in board_s3 example

- This commit adds an import statement for the `pins` module in the docstring example of the `board_s3` function. This change ensures that the example is self-contained and can be executed without prior context, improving the documentation's usability for new users.

* style(pins): format board_s3 function definition for better readability

- Adjusted the function definition of `board_s3` to span multiple lines, improving code readability and maintainability.
- Ensured consistency with project coding standards for function definitions.

* feat(pins): add warning for non-zero listings_expiry_time in board_s3

- Implemented a warning for users setting `listings_expiry_time` to a non-zero value in `board_s3` function to alert them about potential unexpected cache behaviour.
- Refactored the handling of `storage_options` to ensure `listings_expiry_time` is explicitly set, either by the user or to a default of 0, to improve clarity and maintainability of the code.

This change aims to guide users towards optimal performance settings and enhance the reliability of cache operations with S3 boards.
  • Loading branch information
ericmjl authored Jul 19, 2024
1 parent 2c53d3c commit 5c929ce
Showing 1 changed file with 36 additions and 5 deletions.
41 changes: 36 additions & 5 deletions pins/constructors.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import fsspec
import os
import tempfile
import warnings

from .boards import BaseBoard, BoardRsConnect, BoardManual, board_deparse
from .cache import PinsCache, PinsRscCacheMapper, PinsAccessTimeCache, prefix_cache
Expand Down Expand Up @@ -432,7 +433,9 @@ def board_connect(
board_rsconnect = board_connect


def board_s3(path, versioned=True, cache=DEFAULT, allow_pickle_read=None):
def board_s3(
path, versioned=True, cache=DEFAULT, allow_pickle_read=None, **storage_options
):
"""Create a board to read and write pins from an AWS S3 bucket folder.
Parameters
Expand All @@ -453,19 +456,47 @@ def board_s3(path, versioned=True, cache=DEFAULT, allow_pickle_read=None):
You can enable reading pickles by setting this to `True`, or by setting the
environment variable `PINS_ALLOW_PICKLE_READ`. If both are set, this argument
takes precedence.
storage_options:
Additional keyword arguments to be passed to the underlying fsspec S3FileSystem.
Notes
-----
The s3 board uses the fsspec library (s3fs) to handle interacting with AWS S3.
In order to authenticate, set the `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`,
and (optionally) `AWS_REGION` environment variables.
and (optionally) `AWS_REGION` environment variables. If you are using an
s3-compatible storage service that is not from AWS, you can pass in the necessary
credentials to the `storage_options` dictionary, such as `endpoint_url`, `key`, and
`secret`. We recommend setting these as environment variables. An example using
Backblaze B2 would look like:
Examples
--------
>>> import pins
>>> board = pins.board_s3(
... "pins-test",
... endpoint_url=os.getenv("FSSPEC_S3_ENDPOINT_URL"),
... key=os.getenv("FSSPEC_S3_KEY"),
... secret=os.getenv("FSSPEC_S3_SECRET"),
... )
See <https://github.com/fsspec/s3fs>
"""
# TODO: user should be able to specify storage options here?

opts = {"listings_expiry_time": 0}
# Warn user about the use of non-zero listings_expiry_time
listings_expiry_time = storage_options.get("listings_expiry_time", 0)
if listings_expiry_time != 0:
warning_msg = """
Non-zero `listings_expiry_time` may lead to unexpected behaviour with cache operations.
We're not discouraging you from setting it to be a non-zero value,
but we strongly recommend setting it to 0 for optimal performance.
"""
warnings.warn(warning_msg)

# Set options to pass in. Start with storage options provided by user.
opts = {**storage_options}
# Set listings_expiry_time based on what's provided by user
# or the default value of 0.
opts.update({"listings_expiry_time": listings_expiry_time})
return board("s3", path, versioned, cache, allow_pickle_read, storage_options=opts)


Expand Down

0 comments on commit 5c929ce

Please sign in to comment.