Unify file structure in repos and workspaces #342

sergiimk · 2023-12-13T01:11:46Z

We have too many repository structures:

Single-tenant alias-based
Multi-tenant alias-based
Multi-tenant DID-based

We should unify our repositories under DID-based structure.

Additional Consideration:

Consider where to store dataset aliases in different types of repositories (including local CLI workspace). Ideally we want ID -> Alias, Alias -> ID resolutions to be O(1), but at the same time:

guarantee that aliases are unique
our "index" is resistant to concurrent updates

Storing datasets by DID means it will no longer be possible to have same dataset under different aliases in CLI (or any other repo). This currently is allowed, but consider how to handle this after the change:

kamu pull s3://datasets.kamu.dev/odf/v2/contrib/com.cryptocompare.ohlcv.eth-usd --as a
kamu pull s3://datasets.kamu.dev/odf/v2/contrib/com.cryptocompare.ohlcv.eth-usd --as b

Consider what to do when we push datasets to basic repositories (plain S3, Local FS), e.g. what the following operation does now and how it should work after this change:

kamu repo add myrepo /some/path
kamu push com.cryptocompare.ohlcv.eth-usd --to myrepo/com.cryptocompare.ohlcv.eth-usd

SearchServiceImpl is currently making hard assumptions about the remote repository structure and will need to be updated. See Support ODF repositories in kamu search command #508 for details.
We have some python scripts that push datasets from local (multi-tenant) workspace directly to S3. We should update them after these changes.

Notes:

https://github.com/kamu-data/kamu-cli/pull/556/files#r1529599266

The text was updated successfully, but these errors were encountered:

sergiimk added the epic label Dec 13, 2023

sergiimk mentioned this issue Dec 14, 2023

Holiday breaking changes special #363

Closed

sergiimk added good first issue Good for newcomers and removed epic labels Jan 11, 2024

zaychenko-sergei assigned rmn-boiko and s373r and unassigned rmn-boiko Feb 14, 2024

sergiimk mentioned this issue Feb 16, 2024

Support ODF repositories in kamu search command #508

Closed

sergiimk mentioned this issue Feb 27, 2024

Support searching in ODF repos #522

Merged

5 tasks

sergiimk mentioned this issue Mar 19, 2024

Aggressive S3 metadata caching #556

Merged

6 tasks

zaychenko-sergei unassigned s373r Mar 25, 2024

sergiimk mentioned this issue Sep 24, 2024

Add validation for dataset existence in Fs mode #846

Closed

sergiimk removed the good first issue Good for newcomers label Oct 2, 2024

zaychenko-sergei mentioned this issue Dec 10, 2024

Refactoring: get rid of DatasetSummary as a repository file, store these key properties in the database #983

Open

sergiimk mentioned this issue Jan 6, 2025

Smart protocol: Support force push/pull in #737

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify file structure in repos and workspaces #342

Unify file structure in repos and workspaces #342

sergiimk commented Dec 13, 2023 •

edited by s373r

Loading

Unify file structure in repos and workspaces #342

Unify file structure in repos and workspaces #342

Comments

sergiimk commented Dec 13, 2023 • edited by s373r Loading

sergiimk commented Dec 13, 2023 •

edited by s373r

Loading