-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/rclone export support s3 compatible #8564
base: master
Are you sure you want to change the base?
Feature/rclone export support s3 compatible #8564
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jameshod5,
Thanks for your contribution to lakeFS! It certainly seems like a worthwhile and useful improvement. To be clear: I want to pull it.
The inline comments are about two important aspects:
- Backwards compatibility: We don't want to break usage for existing users. We can of course give them additional ways to configure, just please don't break their working setups. This will be blocking.
- Naming: I know that many tools accept access keys using the AWS variables... even if they end up using them on a different system. Are the new names used by other tools? If there is a common case that users know about, I would rather go for it than require new names.
THANKS!
S3_SECRET_KEY = os.getenv('S3_SECRET_ACCESS_KEY') | ||
S3_ENDPOINT = os.getenv('S3_ENDPOINT_URL', default="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this breaks backwards compatibility: someone using the script will need to change their environment variables in order to keep on using it. The new script version must accept the old values if the new ones are missing.
S3_PROVIDER = os.getenv('S3_PROVIDER', default="AWS") | ||
S3_ACCESS_KEY = os.getenv('S3_ACCESS_KEY_ID') | ||
S3_SECRET_KEY = os.getenv('S3_SECRET_ACCESS_KEY') | ||
S3_ENDPOINT = os.getenv('S3_ENDPOINT_URL', default="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, we need to decide on appropriate environment variables to use for this. The easiest way to justify names for these variables ("S3_ACCESS_KEY_ID", "S3_SECRET_ACCESS_KEY", "S3_ENDPOINT_URL") is if they are already used by other tools. If not -- are there perhaps other environment variables which are commonly in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment, I think I understand what you mean here. I will keep them as AWS_ACCESS_KEY... etc. (like they were) as this lines up with what most people will have set already and I think that is less confusing for users than having 2 different env variables for the same thing.
The provider env variable I will keep the same. Does this sound sensible?
Closes #8512
Change Description
Added env variables to the lakefs_export tool for S3 provider and S3 endpoint, with the same added to the rclone.conf.temp.
Changed the names of the env variables to improve readability.
Background
Raised the original issue in the Slack (thought it was a bug at first), but quickly noticed it was because our S3 storage was not an AWS S3 storage, and the config only allowed AWS providers.
New Feature
Previously, the export tool was hardcoded to only export to AWS S3 storages. This change adds the functionaility of changing the provider of the S3 storage, along with the endpoint, such that the export can work with non-AWS storages.
Testing Details
I built the image locally for the exporter with my changes.
Using the following command, I was able to export my LakeFS repo successfully to our S3 storage:
Checking the endpoint storage:
Additional info
I do not have an AWS storage to test this funciton with, but the current changes should(!) default to AWS and no specified endpoint if none are supplied. I would appreciate if this could be tested.
This is my first PR for an open-source project like this, I have followed the contribution guide but do let me know if I have missed anything. I will try and make changes, if requested, ASAP.
Contact Details
[email protected]