Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[21.05] restore-single-files: reduce chance of UUID collision #1083

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

ctheune
Copy link
Member

@ctheune ctheune commented Aug 14, 2024

@flyingcircusio/release-managers

Release process

Impact:

none

Changelog:

  • Our restore-single-file utility now handles situations better where multiple disks happen to have the same filesystem UUID, which can happen in image or cloning cases. (PL-132849)

  • To avoid the issue of duplicate filesystem UUIDs (as is happening when VMs are or have been bootstrapped from the same base image) we now regenerate the XFS UUID upon every cold boot. (PL-132849)

  • Full KVM host migrations use shorter lock retry intervals to reduce long tail finishing times. The maximum delay has been reduced from 80 seconds to 20 seconds. The average delay is expected to be around 10 seconds.

PR release workflow (internal)

  • PR has internal ticket
  • internal issue ID (PL-…) part of branch name
  • internal issue ID mentioned in PR description text
  • ticket is on Platform agile board
  • ticket state set to Pull request ready
  • if ticket is more urgent than within the next few days, directly contact a member of the Platform team

Design notes

  • Provide a feature toggle if the change might need to be adjusted/reverted quickly depending on context. Consider whether the default should be on or off. Example: rate limiting.
  • All customer-facing features and (NixOS) options need to be discoverable from documentation. Add or update relevant documentation such that hosted and guided customers can understand it as well.

Security implications

This affects restore reliability and improves our capacity of restoring backups without encountering roadblocks.

  • Security requirements tested? (EVIDENCE)

restore script: manually tested
fc.qemu integration: covered with automated tests and manually tested

XFS can't mount multiple images that happen to have the same UUID.
This can happen either due to PL-132849 or if you'd like to restore
from multiple revisions of the same disk.

This change reduces the chance by changing the UUID during restore.
However, there's a small mount/umount cycle with the old UUID that
could collide, but xfs needs a clean log to change the UUID.

Re PL-132849
- introduce XFS UUID regeneration upon cold reboot
- reduce max random delay time for migration locks
@ctheune ctheune force-pushed the PL-132849-tmp-uuids-during-restore branch from 7d2cd32 to 1c54f6c Compare August 18, 2024 15:44
@ctheune ctheune requested review from osnyx and dpausp and removed request for osnyx August 18, 2024 15:59
@dpausp dpausp changed the title restore-single-files: reduce chance of UUID collision [21.05] restore-single-files: reduce chance of UUID collision Aug 21, 2024
@ctheune ctheune merged commit 2fd6063 into fc-21.05-dev Aug 26, 2024
2 checks passed
@ctheune ctheune deleted the PL-132849-tmp-uuids-during-restore branch August 26, 2024 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants