-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r/virtual_machine: New disk device naming and tracking behaviour #363
Conversation
This shifts the disk virtual device logic over to do the following: * The configuration key for virtual disks is now "label" an arbitrary name local to Terraform that only identifies the disk in configuration, and allows it to freely move around in terms of location. Unlike "name", which was the previous identifier, label does not influence the actual VMDK filename. * A new attribute, "path", has been added - this either 1) is a computed attribute that gives the path to the disk for unattached disks, or 2) defines the path for an external virtual disk when "attach" is set. This is derived from "name" when attaching without setting path. * UUID is now the source of truth once disks have been created and committed to state. This replaces device address, which now is only the fallback when no UUID exists in state yet, such as right after a VM is created or a disk is added during an update operation. This helps to account for if someone manually changes the position of a disk on the SCSI bus, which can be a serious failure scenario if the new position is one that is currently occupied by another disk in state. This allows us to lift a number of storage vMotion restrictions and possibly avoid any other issues that may arise from depending on file name so much. Tests are still pending, and we need to review the state migration code, and also check to determine if a new schema version will be necessary to ensure that short-name disks are matched up properly with long-name disks that may have been committed to state before.
* Looks like we will need to grab the disk UUIDs for this to work properly on the next read operation that happens after upgrade. Hence we've bumped the schema version, with the V2 -> V3 migration path populating the UUIDs for all present disks. * diskLabelOrName now truncates paths to their base. This is what we have been doing anyway for lining up disks in state, so there should be no risk of regressions. Path calculations are handled via another path, diskPathOrName, which is unaffected by this. * Restore the behaviour of carrying forward the name attribute during diffs, but *only* if label has not been defined in the new diff. * Made the deprecation notice for the name field a bit better to read by making it a pre-formatted string, also added in a link to the label attribute so that we can refer people to it when this goes live. * A few other updates to remove some more dead code.
This also fixes import in general (it had not been checked yet and would have more than likely been broken). Import functionality has now significantly changed and will need to be documented. In addition to validating SCSI types and what not, We now partially populate state on import. This is so that the next refresh (which happens after an import) can properly populate the rest of the state for a disk. The following fields are populated: * device_address: With the proper device address of the disk. * key: With (disknum + 1) * -1. This is to ensure DiskRefreshOperation sees the device as a new disk and will not attempt to match it with a UUID in state. * label: With a generic label that needs to be specified in config for the import to work properly. This cannot be enforced as the process does not have access to config, however keep_on_remove is set to ensure that a disk only gets detached on errors (although this could still lead to service interruptions and data corruption, depending on the situation). The convention is diskN, starting with disk0. The V1 to V3 state migration leans on this import functionality, and ultimately yields a V3 state with *no* UUID populated, unlike V2 -> V3. This is mitigated by the fact that thanks to the import functionality, the disks show up as newly created, which allows the next DiskRefreshOperation call (which happens as part of the next refresh) to pick up the UUID, along with the rest of the attributes that should make it to state.
* Updated most of the tests to use the new label convention over the name attribute. There are a few tests that have been added that specifically address the transition that will be removed in later releases. * Updated the tests for V2 and V3 state migration to reflect the new state of things. * Exported SelectDisks() from the virtual disk sub-resource helpers to assist with testing the migrations. Some tests are failing and will need to be addressed, but the vast majority of them are passing.
This allows you to run something like: make debugacc \ TEST=./vsphere \ TESTARGS=-test.run='TestAccSomeThing' To run an acceptance test in the Delve debugger. This has actually been pretty useful and has allowed me to easily confirm a couple of bugs so far. Also, given that Terraform forks out to a plugin and communicates with it over RPC, this is the only way you can reliably test providers using delve (source files and LOCs won't be available for you to set breakpoints on by running dlv exec terraform.) This, of course, requires that you have Delve installed: https://github.com/derekparker/delve
This fixes a couple of issues. It appears some changes in virtual disk management (quite a while ago) had actually broken virtual device deletion, at the very least when removing devices from the middle of a Terraform state (ie: disk 2 of 3). The effect was that no virtual disks were actually being deleted when they should have been. However, the current logic (which had been updated as part of the transition to UUIDs and also due to this problem in general) had an error in the deletion conditional and was deleting *all* disks. This has now been fixed to correct both the old and new issues. The original error was due to a duplication of the logic that we use in network interfaces, where removing a device in the middle of the stack means that the new set will be smaller than the old one, with the last device removed and the middle one reconfigured. This is not the case with virtual disks - we do extra diff customization to provide extra verbosity during the device management process to tell you exactly what is happening to your disks. The new, shrunk set is only read in after the process is completed, during the post-update read. The question remains as to how this passed acceptance tests for so long. I will be investigating the acceptance test a bit more - I already debugged the check and it seems to be working, but something is obviously amiss that is keeping it from functioning correctly.
This fixes the cloning w/extra disks test, which was failing due to a bad config.
During post-clone, we do a read-in and a diff of all cloned virtual disks to check to make sure they don't need changing (ie: growing disks that aren't linked clones). During this situation, some keys are ignored as they will always be empty or contain bad data that needs to be populated after committing the diff to state post-apply. "path" was a missed field here, and it has been added now. We don't bother skipping this if attach is set to true, as since this is during the evaluation of the template's disks specifically, there is no chance these disks will be externally attached.
Was still depending on the name field exclusively.
Update docs to reflect the new state of affairs with the name -> label transition.
This is to ensure that someone doesn't, for some reason, name their disks "orphaned_disk_x" and collide with the namespace we use for assigning orphaned disks.
The vMotion in this operation really messes with the filename (the main reason we needed to block linked clones in the first place), so it breaks our tests when using a static value. Now, we just get the new name of the disk from the path in state instead.
If I clone a template to a new VM, and then replace the template (ie via packer), can it preserve the VM, or must it destroy it? |
if err != nil { | ||
return fmt.Errorf("corrupt state: strconv.Atoi error on disk.%d.key: %s", i, err) | ||
} | ||
if key < 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be <=
the comment seems to indicate it should include v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sorry i misunderstood what key represented, ignore
LGTM, but maybe @mbfrahry or @tombuildsstuff can weigh in as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
All tests (state migration, VM snapshot, and VM) tests have passed, so merging this now. Took a while to get these to behave (the suite now takes over 3 hours to run) which was the source of the delay.
|
The VM tests are now taking over 3 hours to run. Parallelization of these resources is now sorely needed, but this will ensure they actually complete until that is taken care of.
We just compiled master branch with this change, tested it and we confirm it fixes 308 |
This resource implements the work discussed in #295, which significantly changes the way that we work with virtual disks in the
vsphere_virtual_machine
resource.The changes are:
label
attribute in configuration, rather than thename
attribute, which previously dictated both the name of the virtual disk in configuration, and the actual VMDK filename on the virtual machine. This was problematic for several reasons that are discussed in r/virtual_machine: Current limitations and roadmap for disk device management and vMotion #295.path
attribute has been implemented as a dual-purpose attribute that, depending on the setting ofattach
, either controls the path of the externally attached disk, or is a read-only computed attribute that gives the current path to the specific virtual disk (in the event of non-attached virtual disks).uuid
has been added to track the lifecycle of a disk via the UUID of its VMDK backing. This now makes locating a virtual disk that has had its position on the SCSI bus modified much more stable, and should survive such destructive operations like swapping two disks that Terraform tracks around so that disk A is at disk B's position according to the state, and vice versa. Terraform should correctly change this during the next apply operating, versus the old behaviour of detaching the disks (a safeguard against the previous behaviour of complete deletion).name
has been kept, with a deprecation notice noting its planned removal (which will probably be in version 2.0 of the provider, which should be happening sometime in the early-mid spring). It currently acts as an fallback alias of sorts tolabel
andpath
, with its old behaviour of controlling the name of the VMDK file on disk removed.This means that we have been able to lift the following restrictions:
There is still the stipulation that if you import, you must choose a format on labels that matches the convention
diskN
, where N is the disk number, ordered by the disk's position on the SCSI bus. This however is a lot more stable and generic than the previous behaviour which required knowledge of the disk's VMDK filename.Waiting on tests and will post the results when they are complete.