Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify file structure in repos and workspaces #342

Open
sergiimk opened this issue Dec 13, 2023 · 0 comments
Open

Unify file structure in repos and workspaces #342

sergiimk opened this issue Dec 13, 2023 · 0 comments

Comments

@sergiimk
Copy link
Member

sergiimk commented Dec 13, 2023

We have too many repository structures:

  • Single-tenant alias-based
  • Multi-tenant alias-based
  • Multi-tenant DID-based

We should unify our repositories under DID-based structure.

Additional Consideration:

  1. Consider where to store dataset aliases in different types of repositories (including local CLI workspace). Ideally we want ID -> Alias, Alias -> ID resolutions to be O(1), but at the same time:
  • guarantee that aliases are unique
  • our "index" is resistant to concurrent updates
  1. Storing datasets by DID means it will no longer be possible to have same dataset under different aliases in CLI (or any other repo). This currently is allowed, but consider how to handle this after the change:
kamu pull s3://datasets.kamu.dev/odf/v2/contrib/com.cryptocompare.ohlcv.eth-usd --as a
kamu pull s3://datasets.kamu.dev/odf/v2/contrib/com.cryptocompare.ohlcv.eth-usd --as b
  1. Consider what to do when we push datasets to basic repositories (plain S3, Local FS), e.g. what the following operation does now and how it should work after this change:
kamu repo add myrepo /some/path
kamu push com.cryptocompare.ohlcv.eth-usd --to myrepo/com.cryptocompare.ohlcv.eth-usd
  1. SearchServiceImpl is currently making hard assumptions about the remote repository structure and will need to be updated. See Support ODF repositories in kamu search command #508 for details.

  2. We have some python scripts that push datasets from local (multi-tenant) workspace directly to S3. We should update them after these changes.

Notes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants