Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crucible datasets remain after disks are deleted. #1313

Open
leftwo opened this issue Jun 29, 2022 · 3 comments
Open

Crucible datasets remain after disks are deleted. #1313

leftwo opened this issue Jun 29, 2022 · 3 comments
Assignees
Labels
Sled Agent Related to the Per-Sled Configuration and Management storage Related to storage.
Milestone

Comments

@leftwo
Copy link
Contributor

leftwo commented Jun 29, 2022

Crucible appears to suffer the same fate

Originally posted by @leftwo in #1119 (comment)

@leftwo
Copy link
Contributor Author

leftwo commented Jul 19, 2022

On sock, after an uninstall, there are still crucible regions.
You can see these from zfs list:

alan@sock:crucible$ zfs list -o name | grep crucible
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible/regions
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b/crucible/regions/439a26ff-e6dc-4794-a8c8-2ace6d175616
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions/12ac64e4-c489-4e4b-8272-5a40847bdb28
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03/crucible/regions/e0f5e6a3-80f5-4892-9bfa-16fd09ea1e7c
rpool/data/crucible
rpool/zone/oxz_crucible_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
rpool/zone/oxz_crucible_oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03
rpool/zone/oxz_crucible_oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03

(Anything below oxp_.../crucible/regions/)

After ./tools/create_virtual_hardware.sh is run, the crucible zpools are created:

alan@sock:omicron$ zpool list -o name,size,alloc,free
NAME                                       SIZE  ALLOC   FREE
oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b  49.5G  9.79G  39.7G
oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03  49.5G  8.39G  41.1G
oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03  49.5G  8.39G  41.1G
rpool                                      952G   452G   500G

However, if ./tools/destroy_virtual_hardware.sh is not run, any regions (disks) created inside these pools will remain, even across install/uninstall of omicron.

I believe this is the desired behavior (to allow upgrades), but if one is doing development and testing, be away that old regions may come alive even if there is no upstairs for them to connect to. This can result in crucible-downstairs processes being started, and any disk space these regions are using will not be available.

smklein added a commit that referenced this issue Oct 30, 2022
Major changes:
- Uninstallation now collects all Zpool and Zone based datasets,
prompts the user, and destroys them during the uninstall process.
- Adds a "-f / --force" option to "omicron-package", allowing
callers to skip the new confirmation prompt.
- Adds a "deactivate" command to "omicron-package". This allows
callers to remove Zones and disable services, but does not delete
durable configurations and storage. A caller should be able to
call "deactivate" -> "activate" -> "deactivate" repeatedly without
losing durable state.

Minor changes:
- Updates documentation for omicron-package
- Improves handling of addresses deleted from
  `cleanup_networking_resources`, especially in cases of duplicates
- Rename "filesystem" to "dataset" in functions where it's more
  appropriate to be generic in the context of ZFS.

Fixes #1884

Part of #1119
Part of #1313
smklein added a commit that referenced this issue Nov 1, 2022
* [package] Destroy datasets as a part of uninstallation

Major changes:
- Uninstallation now collects all Zpool and Zone based datasets,
prompts the user, and destroys them during the uninstall process.
- Adds a "-f / --force" option to "omicron-package", allowing
callers to skip the new confirmation prompt.
- Adds a "deactivate" command to "omicron-package". This allows
callers to remove Zones and disable services, but does not delete
durable configurations and storage. A caller should be able to
call "deactivate" -> "activate" -> "deactivate" repeatedly without
losing durable state.

Minor changes:
- Updates documentation for omicron-package
- Improves handling of addresses deleted from
  `cleanup_networking_resources`, especially in cases of duplicates
- Rename "filesystem" to "dataset" in functions where it's more
  appropriate to be generic in the context of ZFS.

Fixes #1884

Part of #1119
Part of #1313

* Fix typo
@smklein smklein added Sled Agent Related to the Per-Sled Configuration and Management storage Related to storage. labels Nov 15, 2022
@leftwo leftwo self-assigned this Nov 18, 2022
@leftwo
Copy link
Contributor Author

leftwo commented Dec 13, 2022

I believe the issue described here is fixed, but there is one final issue I want to finish before
I'm ready to call this issue closed: oxidecomputer/crucible#542

There is a small leak in the crucible agent that needs fixing. It takes some 4000 disk add/deletes before
it uses up the ramdisk on sn21, but it's still a leak that needs to be fixed.

@morlandi7 morlandi7 added this to the MVP milestone Jan 27, 2023
@leftwo
Copy link
Contributor Author

leftwo commented Feb 22, 2023

Confirmed that regions are deleted when a disk is deleted.

Here is before:

gimlet-sn21 # zfs list | grep regions/
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543  4.84G  2.81T     4.84G  /data/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/f2da8fe9-9a4d-4e34-a308-37392c39ce77  86.6M  2.81T     86.6M  /data/regions/f2da8fe9-9a4d-4e34-a308-37392c39ce77
oxp_1bdae8d1-acde-4f44-bc9c-5b657e6f01d3/crucible/regions/f1408d10-e646-4b64-bff3-dc18e07e1cbc  86.6M  2.82T     86.6M  /data/regions/f1408d10-e646-4b64-bff3-dc18e07e1cbc
oxp_2ec1c158-3535-43c1-aca3-6d186487bbbc/crucible/regions/7bddd111-c6af-466f-9427-5e9a0151e76c  4.84G  2.81T     4.84G  /data/regions/7bddd111-c6af-466f-9427-5e9a0151e76c
oxp_4a2245f9-4f54-4a3d-86ae-103ae196959a/crucible/regions/20f4c642-85aa-455d-9033-aa881f495a94  34.3G  2.78T     34.3G  /data/regions/20f4c642-85aa-455d-9033-aa881f495a94
oxp_627cda87-085b-44af-a70e-d067599c3fe2/crucible/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61  34.3G  2.78T     34.3G  /data/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61
oxp_9f5a50c7-08ce-41f7-8efd-5d1323a1f070/crucible/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f  34.3G  2.78T     34.3G  /data/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/405213cc-3e0e-42e7-a078-4e75a6475915  86.6M  2.81T     86.6M  /data/regions/405213cc-3e0e-42e7-a078-4e75a6475915
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f  4.84G  2.81T     4.84G  /data/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f

Then after a delete:

gimlet-sn21 # zfs list | grep regions/        
oxp_0ca797a6-f467-4296-bc27-e7590c8330c2/crucible/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543  4.84G  2.81T     4.84G  /data/regions/d758e5fe-24bd-4e9a-8857-dbe0cd7ee543
oxp_2ec1c158-3535-43c1-aca3-6d186487bbbc/crucible/regions/7bddd111-c6af-466f-9427-5e9a0151e76c  4.84G  2.81T     4.84G  /data/regions/7bddd111-c6af-466f-9427-5e9a0151e76c
oxp_4a2245f9-4f54-4a3d-86ae-103ae196959a/crucible/regions/20f4c642-85aa-455d-9033-aa881f495a94  34.3G  2.78T     34.3G  /data/regions/20f4c642-85aa-455d-9033-aa881f495a94
oxp_627cda87-085b-44af-a70e-d067599c3fe2/crucible/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61  34.3G  2.78T     34.3G  /data/regions/bff2606c-c8bc-4ef9-bfe0-2bac04058c61
oxp_9f5a50c7-08ce-41f7-8efd-5d1323a1f070/crucible/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f  34.3G  2.78T     34.3G  /data/regions/5144c005-7ddd-4e69-a54c-cdb953a3fa5f
oxp_b67d5f84-5b06-4e36-bc9a-88269ca74414/crucible/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f  4.84G  2.81T     4.84G  /data/regions/7e99daca-5dbd-4c83-bba6-09a200849a0f

leftwo pushed a commit that referenced this issue Jun 26, 2024
Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts.  These scripts are extracted into
the global zone at /opt/oxide/crucible_dtrace/

Update Crucible to latest includes these updates:
Clean up dependency checking, fixing space leak (#1372)
Make a DTrace package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371)
Remove `WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366)
Move 'do work for one job' into a helper function (#1365)
Remove `DownstairsWork` from map when handling it (#1361)
Using `block_in_place` for IO operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer configuration (#1369)
Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350)
Support arbitrary Volumes during replace compare (#1349)
Remove the SQLite backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341)
Move Work into ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333)
Move disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330)
Move cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328)
Move remaining local state into a `struct ConnectionState` (#1327)
Consolidate negotiation + IO operations into one loop (#1322)
Allow replacement of a target in a read_only_parent (#1281)
Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326)
Move negotiation into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317)
Reuse a reqwest client when creating repair client (#1324)
Add % to keep buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314)
Added more DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)
leftwo added a commit that referenced this issue Jun 26, 2024
Update Crucible and Propolis to the latest

Added a new package, crucible-dtrace that pulls from buildomat a package
that contains a set of DTrace scripts. These scripts are extracted into the 
global zone at /opt/oxide/crucible_dtrace/

Crucible latest includes these updates:
Clean up dependency checking, fixing space leak (#1372) Make a DTrace
package (#1367)
Use a single context in all messages (#1363)
Remove `DownstairsWork`, because it's redundant (#1371) Remove
`WorkState`, because it's implicit (#1370)
Do work immediately upon receipt of a job, if possible (#1366) Move 'do
work for one job' into a helper function (#1365) Remove `DownstairsWork`
from map when handling it (#1361) Using `block_in_place` for IO
operations (#1357)
update omicron deps; use re-exported dropshot types in oximeter-producer
configuration (#1369) Parameterize more tests (#1364)
Misc cleanup, remove sqlite references. (#1360)
Fix `Extent::close` docstring (#1359)
Make many `Region` functions synchronous (#1356)
Remove `Workstate::Done` (unused) (#1355)
Return a sorted `VecDeque` directly (#1354)
Combine `proc_frame` and `do_work_for` (#1351)
Move `do_work_for` and `do_work` into `ActiveConnection` (#1350) Support
arbitrary Volumes during replace compare (#1349) Remove the SQLite
backend (#1352)
Add a custom timeout for buildomat tests (#1344)
Move `proc_frame` into `ActiveConnection` (#1348)
Remove `UpstairsConnection` from `DownstairsWork` (#1341) Move Work into
ConnectionState (#1340)
Make `ConnectionState` an enum type (#1339)
Parameterize `test_repair.sh` directories (#1345)
Remove `Arc<Mutex<Downstairs>>` (#1338)
Send message to Downstairs directly (#1336)
Consolidate `on_disconnected` and `remove_connection` (#1333) Move
disconnect logic to the Downstairs (#1332)
Remove invalid DTrace probes. (#1335)
Fix outdated comments (#1331)
Use message passing when a new connection starts (#1330) Move
cancellation into Downstairs, using a token to kill IO tasks (#1329)
Make the Downstairs own per-connection state (#1328) Move remaining
local state into a `struct ConnectionState` (#1327) Consolidate
negotiation + IO operations into one loop (#1322) Allow replacement of a
target in a read_only_parent (#1281) Do all IO through IO tasks (#1321)
Make `reqwest_client` only present if it's used (#1326) Move negotiation
into Downstairs as well (#1320)
Update Rust crate clap to v4.5.4 (#1301)
Reuse a reqwest client when creating Nexus clients (#1317) Reuse a
reqwest client when creating repair client (#1324) Add % to keep
buildomat happy (#1323)
Downstairs task cleanup (#1313)
Update crutest replace test, and mismatch printing. (#1314) Added more
DTrace scripts. (#1309)
Update Rust crate async-trait to 0.1.80 (#1298)

Propolis just has this one update:
Allow boot order config in propolis-standalone
---------

Co-authored-by: Alan Hanson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sled Agent Related to the Per-Sled Configuration and Management storage Related to storage.
Projects
None yet
Development

No branches or pull requests

3 participants