-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for saga idempotency safety #2094
Labels
Comments
Blocked on oxidecomputer/steno#88 (Edit: No longer blocked) |
This was referenced Dec 26, 2022
Merged
This was referenced Dec 28, 2022
smklein
added a commit
that referenced
this issue
Dec 29, 2022
- Makes use of oxidecomputer/steno#88 to test idempotency of instance creation actions + undo actions - Fixes all the ways in which it was *not* idempotent, including... - (Action) Inserting the same network interface failed with a VPC subnet error, instead of an "already exists" error. - (Action) Inserting the instance record twice fell victim to #1168 - (Undo) Deleting the instance record failed to execute twice, as it could not find the instance record for the second invocation. Part of #2094
smklein
added a commit
that referenced
this issue
Dec 29, 2022
- Makes use of oxidecomputer/steno#88 to test idempotency of instance deletion actions - Fixes the way in which it was not idempotent: deleting the instance record Part of #2094
smklein
added a commit
that referenced
this issue
Jan 6, 2023
- Makes use of oxidecomputer/steno#88 to test idempotency of disk creation actions - Fixes the way in which it was not idempotent: creating the disk record Part of #2094
smklein
added a commit
that referenced
this issue
Jan 9, 2023
- Makes use of oxidecomputer/steno#88 to test idempotency of disk deletion actions Part of #2094
smklein
added a commit
that referenced
this issue
Jan 18, 2023
Part of #1734 , specifically [this bit](#1734 (comment)). This PR adds a table called `resource_usage`, which exists for silos, organizations, and projects. Currently, it only contains information about each collection's disk usage. - [x] API exposure - [x] Emit this information to Clickhouse (metrics are passed to the producer on every modification) - [x] Add a metrics-based API for querying such historical info (done, under `/system/metrics/resource-utilization`. Happy to update this API as it's useful, but I went with something minimal for expediency). - [x] Correctness - [x] Add CTE to update all collections up to the root - [x] Ensure each query avoid full-table scans - [x] Ensure that each update of `resource_usage` is atomic (part of a transaction, saga, or CTE) - [x] Ensure that the disk usage accounting is accurate. Currently, we only consider region allocations / deallocations; accurately accounting for snapshots will require incorporating #1752. - [x] Add integration tests After merging, I'd like to do the following: - [ ] Emit some amount of "total capacity" info, to contextualize the currently used amount. This makes much more sense at a physical view (sled, rack, fleet) than user view. - [x] Expand the "collections" to include a "fleet" object. This will be particularly useful for operators. - [ ] Make the accounting of "utilization" more accurate. It currently is not accounting for: Metadata (e.g., crucible's sqlite dbs), system usage (CRDB, Clickhouse, the OS itself, etc). - [x] Expand the usage information to account for CPU usage, RAM, and other globally-shared resources. - [x] Ensuring idempotency has been punted to #2094 , though this *should* work with the new CTEs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This issue tracks testing that invoking all actions / undo actions within a saga "twice" is safe, and results in the same output as being called once.
Related to #2052, #1799
The text was updated successfully, but these errors were encountered: