Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Scylla API for backup #4169

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

Michal-Leszczynski
Copy link
Collaborator

No description provided.

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/backup-scylla-api branch 3 times, most recently from abbfaed to b51c8b6 Compare December 17, 2024 11:12
For Scylla to access object storage, it needs to be configured
in the 'object_storage.yaml' config file.
A separate column for Scylla task ID is needed because:
- it has a different type from agent job ID
- it make it clear which API was used
Those methods consist of both:
- direct Scylla backup API call
- helper Scylla Task Manager API calls
When working with Rclone, SM specifies just the provider name,
and Rclone (with agent config) resolves it internally to the correct endpoint.
This made it so user didn't need to specify the exact endpoint when running SM backup/restore tasks.

When working with Scylla, SM needs to specify resolved host name on its own.
This should be the same name as specified in 'object_storage.yaml'
(See https://github.com/scylladb/scylladb/blob/92db2eca0b8ab0a4fa2571666a7fe2d2b07c697b/docs/dev/object_storage.md?plain=1#L29-L39).

In order to maximize compatibility and UX, we still want it to be possible
to specify just the provider name when running backup/restore.
In such case, SM sends provider name as the "endpoint" query param,
which is resolved by agent to proper host name when forwarding request to Scylla.
Different "endpoint" query params are not resolved.

Note that resolving "endpoint" query param in the proxy is just for the UX,
so it might not work correctly in all the cases.
In order to ensure correctness, "endpoint" should be specified directly by SM user
so that no resolving is needed.
Scylla backup API can be used when:
- node exposes Scylla backup API
- s3 is the used provider
- backup won't create versioned files
This commit adds code for using Scylla backup API.
Luckily for us, handling pause/resume and progress
is analogous to the Rclone API handling.

Fixes #4143
Fixes #4138
Fixes #4141
Some tests used interceptor for given paths
in order to wait/block/check some API calls.
Those interceptors were updated to also look
for Scylla backup API paths.
Using Scylla backup API does not result in changes
to Rclone transfers, rate limiting or cpu pinning,
so it shouldn't be checked as a part of the restore test.
This is a simple test for checking whether the correct API
is used during the backup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant