Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduced DatasetRegistry abstraction, encapsulating listing and resolution of datasets #941

Merged

Conversation

zaychenko-sergei
Copy link
Contributor

@zaychenko-sergei zaychenko-sergei commented Nov 7, 2024

Description

Related to #857.

Introduced DatasetRegistry abstraction, encapsulating listing and resolution of datasets.

Key changes:

  • Registry is backed by database-stored dataset entries, which are automatically maintained
  • Scope for DatasetRepository is now limited to support DatasetRegistry and in-memory dataset dependency graph
  • New concept of ResolvedDataset: a wrapper arround Arc<dyn Dataset>, aware of dataset identity
  • DatasetRegistryRepoBridge utility connects both abstractions in a simple way for testing needs
  • Query and Dataset Search functions now consider only the datasets accessible for current user
  • Core services now explicitly separate planning (transactional) and execution (non-transactional) processing phases
  • Similar decomposition introduced in task system execution logic
  • Revised implementation of core commands and services: pull, push, reset, verify, compact, setting watermark
  • All commands, except kamu ui and kamu system api-server now require to be backed by a db transaction
  • More parallelism from pull command, allowing to mix ingest/sync/transform operations of the same depth level
  • Optimized pull flow, when a single non-recursive dataset is sent for processing
  • Batched form for dataset authorization checks
  • Ensuring correct transactionality for dataset lookup and authorization checks all over the code base
  • Passing multi/single tenancy as an enum configuration instead of boolean
  • Renamed outbox "durability" term to "delivery mechanism" to clarify the design intent
  • Greatly reduced complexity and code duplication of many use case and service tests with oop macro for inheritance of harnesses

Checklist before requesting a review

@zaychenko-sergei zaychenko-sergei linked an issue Nov 7, 2024 that may be closed by this pull request
@zaychenko-sergei zaychenko-sergei force-pushed the 857-show-tables-query-times-out-after-a-minute branch 3 times, most recently from a5e494f to e7e895e Compare November 14, 2024 14:06
Copy link
Member

@sergiimk sergiimk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed changes in domain/ up until transform_elaboration_service - will continue tomorrow.

@zaychenko-sergei zaychenko-sergei force-pushed the 857-show-tables-query-times-out-after-a-minute branch 5 times, most recently from c701dd7 to 112638d Compare November 21, 2024 07:48
@zaychenko-sergei zaychenko-sergei changed the title 857 show tables query times out after a minute Introduced DatasetRegistry abstraction, encapsulating listing and resolution of datasets Nov 21, 2024
@zaychenko-sergei zaychenko-sergei marked this pull request as ready for review November 21, 2024 07:50
…esolution of datasets.

Key changes:
- Registry is backed by database-stored dataset entries, which are automatically maintained
- Scope for `DatasetRepository` is now limited to support `DatasetRegistry` and in-memory dataset dependency graph
- New concept of `ResolvedDataset`: a wrapper arround `Arc<dyn Dataset>`, aware of dataset identity
- `DatasetRegistryRepoBridge` utility connects both abstractions in a simple way for testing needs
- Query and Dataset Search functions now consider only the datasets accessible for current user
- Core services now explicitly separate planning (transactional) and execution (non-transactional) processing phases
- Similar decomposition introduced in task system execution logic
- Revised implementation of core commands and services: `pull`, `push`, `reset`, `verify`, `compact`, setting watermark
- More parallelism from `pull` command, allowing to mix ingest/sync/transform operations of the same depth level
- Optimized `pull` flow, when a single non-recursive dataset is sent for processing
- Batched form for dataset authorization checks
- Ensuring correct transactionality for dataset lookup and authorization checks all over the code base
- Passing multi/single tenancy as an enum configuration instead of boolean
- Renamed outbox "durability" term to "delivery mechanism" to clarify the design intent
- Greatly reduced complexity and code duplication of many use case and service tests with `oop` macro for inheritance of harnesses
@zaychenko-sergei zaychenko-sergei force-pushed the 857-show-tables-query-times-out-after-a-minute branch from 112638d to 02d8c7c Compare November 21, 2024 17:24
@zaychenko-sergei zaychenko-sergei merged commit ccfaccf into master Nov 21, 2024
6 checks passed
@zaychenko-sergei zaychenko-sergei deleted the 857-show-tables-query-times-out-after-a-minute branch November 21, 2024 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SHOW TABLES query times out after a minute
2 participants