Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for saga idempotency safety #2094

Open
4 of 13 tasks
smklein opened this issue Dec 26, 2022 · 1 comment
Open
4 of 13 tasks

Tracking issue for saga idempotency safety #2094

smklein opened this issue Dec 26, 2022 · 1 comment
Labels
nexus Related to nexus Testing & Analysis Tests & Analyzers

Comments

@smklein
Copy link
Collaborator

smklein commented Dec 26, 2022

This issue tracks testing that invoking all actions / undo actions within a saga "twice" is safe, and results in the same output as being called once.

Related to #2052, #1799

@smklein
Copy link
Collaborator Author

smklein commented Dec 26, 2022

Blocked on oxidecomputer/steno#88 (Edit: No longer blocked)

@smklein smklein added Testing & Analysis Tests & Analyzers nexus Related to nexus labels Dec 28, 2022
smklein added a commit that referenced this issue Dec 29, 2022
- Makes use of oxidecomputer/steno#88 to test
idempotency of instance creation actions + undo actions
- Fixes all the ways in which it was *not* idempotent, including...
- (Action) Inserting the same network interface failed with a VPC subnet
error, instead of an "already exists" error.
- (Action) Inserting the instance record twice fell victim to
#1168
- (Undo) Deleting the instance record failed to execute twice, as it
could not find the instance record for the second invocation.

Part of #2094
smklein added a commit that referenced this issue Dec 29, 2022
- Makes use of oxidecomputer/steno#88 to test
idempotency of instance deletion actions
- Fixes the way in which it was not idempotent: deleting the instance
record

Part of #2094
smklein added a commit that referenced this issue Jan 6, 2023
- Makes use of oxidecomputer/steno#88 to test
idempotency of disk creation actions
- Fixes the way in which it was not idempotent: creating the disk record

Part of #2094
smklein added a commit that referenced this issue Jan 9, 2023
- Makes use of oxidecomputer/steno#88 to test
idempotency of disk deletion actions

Part of #2094
smklein added a commit that referenced this issue Jan 18, 2023
Part of #1734 ,
specifically [this
bit](#1734 (comment)).

This PR adds a table called `resource_usage`, which exists for silos,
organizations, and projects. Currently, it only contains information
about each collection's disk usage.

- [x] API exposure
- [x] Emit this information to Clickhouse (metrics are passed to the
producer on every modification)
- [x] Add a metrics-based API for querying such historical info (done,
under `/system/metrics/resource-utilization`. Happy to update this API
as it's useful, but I went with something minimal for expediency).
- [x] Correctness
  - [x] Add CTE to update all collections up to the root
  - [x] Ensure each query avoid full-table scans
- [x] Ensure that each update of `resource_usage` is atomic (part of a
transaction, saga, or CTE)
- [x] Ensure that the disk usage accounting is accurate. Currently, we
only consider region allocations / deallocations; accurately accounting
for snapshots will require incorporating
#1752.
  - [x] Add integration tests

After merging, I'd like to do the following:
- [ ] Emit some amount of "total capacity" info, to contextualize the
currently used amount. This makes much more sense at a physical view
(sled, rack, fleet) than user view.
- [x] Expand the "collections" to include a "fleet" object. This will be
particularly useful for operators.
- [ ] Make the accounting of "utilization" more accurate. It currently
is not accounting for: Metadata (e.g., crucible's sqlite dbs), system
usage (CRDB, Clickhouse, the OS itself, etc).
- [x] Expand the usage information to account for CPU usage, RAM, and
other globally-shared resources.
- [x] Ensuring idempotency has been punted to
#2094 , though this
*should* work with the new CTEs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nexus Related to nexus Testing & Analysis Tests & Analyzers
Projects
None yet
Development

No branches or pull requests

1 participant