Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fatimage base to RL8.9 with robust volume mounts #341

Merged
merged 5 commits into from
Dec 14, 2023
Merged

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Dec 8, 2023

  • Use the OpenStack volume ID to find the appropriate block device when labeling filesystem volumes. The default/skeleton Terraform variables state_volume_device_path and home_volume_device_path are no longer required and have been deleted.
  • Update base image for fatimage build to RockyLinux 8.9. Note this does not itself change the RockyLinux version in the fat image, as dnf update is run during build anyway.

Fixes #327 and #343.

Details

Previously labeling relied on the ordering in /dev matching the order of the block device definitions in the terraform instance resource. This was the case for RockyLinux 8.8 images but does not appear to be the case for RockyLinux 8.9 images. Note that:

  • The filesystem volume label is taken from the first word of the OpenStack volume description.
  • cloud-init's runcmd happens after mounts, so bootcmd must be used to create filesystems.
  • If the image has hw_disk_bus='scsi' and hw_scsi_model='virtio-scsi' properties set, entries in /dev/disk-by-id contain the full openstack volume ID (plus a prefix). Without these properties, the entries only contain the first 20 characters of the openstack volume ID. The updated approach copes with both.
  • The x-systemd.required-by=nfs-server.service dependency in the home volume mount has been removed, as as required-by is not supported for RL8. See Mount unit inactive for RockyLinux 9.3, worked for RockyLinux 8.9 systemd/systemd#30246. This could be reinstated for a RL9 image.
  • A test set for multiple images is provided at https://github.com/sjpb/os-volume-tests

Steps

  • Document volume description
  • Build and test new image
  • Implement same for ansible-templated-tf in caas environment
  • Build and CI test final image Not required, no other updates since build

@sjpb
Copy link
Collaborator Author

sjpb commented Dec 8, 2023

Image build (for testing, needs updating to main and rebuilding before merge): https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/7141102424

@sjpb sjpb mentioned this pull request Dec 8, 2023
2 tasks
@sjpb
Copy link
Collaborator Author

sjpb commented Dec 8, 2023

Manual checks on Arcus cluster at 6d632ea:

  • monitoring: OK
  • OOD:
    • desktop: OK (starts, terminal commands OK)
    • jupyter: OK
    • shell: OK
    • jobs: OK

@sjpb
Copy link
Collaborator Author

sjpb commented Dec 8, 2023

Checks using Azimuth at 6d632ea:

  • DONE OK: deploying azimuth: http://portal.apps.128-232-226-79.sslip.io/
    • checked OK: that updated image was deployed
  • DONE OK: deploy slurm w/ hpctests
  • DONE OK: manual tests:
    • ssh: ok
    • OOD:
      • shell: ok
      • desktop: ok
      • jupyter: ok
      • monitoring: ok

@sjpb sjpb marked this pull request as ready for review December 8, 2023 14:11
@sjpb sjpb requested a review from a team as a code owner December 8, 2023 14:11
@sjpb sjpb marked this pull request as draft December 8, 2023 14:11
@sjpb
Copy link
Collaborator Author

sjpb commented Dec 8, 2023

Tests on Azimith/CaaS at 9b34524:

  • DONE: create cluster sb-test-2 w/ hpctests
  • checked OK: deleted task not in ARA
  • checked OK: control has volumes mounted
  • OOD:
    • shell: ok
    • desktop: ok
    • jupyter: OK
  • monitoring: ok, slurm jobs has data
  • DONE: delete

@sjpb sjpb marked this pull request as ready for review December 8, 2023 15:34
Copy link
Collaborator

@m-bull m-bull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for the comprehensive testing!

@sjpb sjpb merged commit 1e68a91 into main Dec 14, 2023
1 check passed
@sjpb sjpb deleted the feat/base-RL89 branch December 14, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should fix dependency on volume order
2 participants