Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r/virtual_machine: New disk device naming and tracking behaviour #363

Merged
merged 14 commits into from
Jan 23, 2018

Conversation

vancluever
Copy link
Contributor

This resource implements the work discussed in #295, which significantly changes the way that we work with virtual disks in the vsphere_virtual_machine resource.

The changes are:

  • Virtual disks are now addressed via the label attribute in configuration, rather than the name attribute, which previously dictated both the name of the virtual disk in configuration, and the actual VMDK filename on the virtual machine. This was problematic for several reasons that are discussed in r/virtual_machine: Current limitations and roadmap for disk device management and vMotion #295.
  • The path attribute has been implemented as a dual-purpose attribute that, depending on the setting of attach, either controls the path of the externally attached disk, or is a read-only computed attribute that gives the current path to the specific virtual disk (in the event of non-attached virtual disks).
  • uuid has been added to track the lifecycle of a disk via the UUID of its VMDK backing. This now makes locating a virtual disk that has had its position on the SCSI bus modified much more stable, and should survive such destructive operations like swapping two disks that Terraform tracks around so that disk A is at disk B's position according to the state, and vice versa. Terraform should correctly change this during the next apply operating, versus the old behaviour of detaching the disks (a safeguard against the previous behaviour of complete deletion).

name has been kept, with a deprecation notice noting its planned removal (which will probably be in version 2.0 of the provider, which should be happening sometime in the early-mid spring). It currently acts as an fallback alias of sorts to label and path, with its old behaviour of controlling the name of the VMDK file on disk removed.

This means that we have been able to lift the following restrictions:

  • Cloning no longer requires you to choose a disk label (name) that matches the name of the VM.
  • Storage vMotion can now be performed on renamed virtual machines.
  • Storage vMotion no longer cares what your disks are labeled (named).
  • Storage vMotion now works on linked clones.

There is still the stipulation that if you import, you must choose a format on labels that matches the convention diskN, where N is the disk number, ordered by the disk's position on the SCSI bus. This however is a lot more stable and generic than the previous behaviour which required knowledge of the disk's VMDK filename.

Waiting on tests and will post the results when they are complete.

This shifts the disk virtual device logic over to do the following:

* The configuration key for virtual disks is now "label" an arbitrary
name local to Terraform that only identifies the disk in configuration,
and allows it to freely move around in terms of location. Unlike "name",
which was the previous identifier, label does not influence the actual
VMDK filename.
* A new attribute, "path", has been added - this either 1) is a computed
attribute that gives the path to the disk for unattached disks, or 2)
defines the path for an external virtual disk when "attach" is set. This
is derived from "name" when attaching without setting path.
* UUID is now the source of truth once disks have been created and
committed to state. This replaces device address, which now is only the
fallback when no UUID exists in state yet, such as right after a VM is
created or a disk is added during an update operation. This helps to
account for if someone manually changes the position of a disk on the
SCSI bus, which can be a serious failure scenario if the new position is
one that is currently occupied by another disk in state.

This allows us to lift a number of storage vMotion restrictions and
possibly avoid any other issues that may arise from depending on file
name so much.

Tests are still pending, and we need to review the state migration code,
and also check to determine if a new schema version will be necessary to
ensure that short-name disks are matched up properly with long-name
disks that may have been committed to state before.
* Looks like we will need to grab the disk UUIDs for this to work
properly on the next read operation that happens after upgrade. Hence
we've bumped the schema version, with the V2 -> V3 migration path
populating the UUIDs for all present disks.
* diskLabelOrName now truncates paths to their base. This is what we
have been doing anyway for lining up disks in state, so there should be
no risk of regressions. Path calculations are handled via another path,
diskPathOrName, which is unaffected by this.
* Restore the behaviour of carrying forward the name attribute during
diffs, but *only* if label has not been defined in the new diff.
* Made the deprecation notice for the name field a bit better to read by
making it a pre-formatted string, also added in a link to the label
attribute so that we can refer people to it when this goes live.
* A few other updates to remove some more dead code.
This also fixes import in general (it had not been checked yet and would
have more than likely been broken).

Import functionality has now significantly changed and will need to be
documented.

In addition to validating SCSI types and what not, We now partially
populate state on import. This is so that the next refresh (which
happens after an import) can properly populate the rest of the state for
a disk.

The following fields are populated:
* device_address: With the proper device address of the disk.
* key: With (disknum + 1) * -1. This is to ensure DiskRefreshOperation
sees the device as a new disk and will not attempt to match it with a
UUID in state.
* label:  With a generic label that needs to be specified in config for
the import to work properly. This cannot be enforced as the process does
not have access to config, however keep_on_remove is set to ensure that
a disk only gets detached on errors (although this could still lead to
service interruptions and data corruption, depending on the situation).
The convention is diskN, starting with disk0.

The V1 to V3 state migration leans on this import functionality, and
ultimately yields a V3 state with *no* UUID populated, unlike V2 -> V3.
This is mitigated by the fact that thanks to the import functionality,
the disks show up as newly created, which allows the next
DiskRefreshOperation call (which happens as part of the next refresh) to
pick up the UUID, along with the rest of the attributes that should make
it to state.
* Updated most of the tests to use the new label convention over the
name attribute. There are a few tests that have been added that
specifically address the transition that will be removed in later
releases.
* Updated the tests for V2 and V3 state migration to reflect the new
state of things.
* Exported SelectDisks() from the virtual disk sub-resource helpers to
assist with testing the migrations.

Some tests are failing and will need to be addressed, but the vast
majority of them are passing.
This allows you to run something like:

make debugacc \
 TEST=./vsphere \
 TESTARGS=-test.run='TestAccSomeThing'

To run an acceptance test in the Delve debugger. This has actually been
pretty useful and has allowed me to easily confirm a couple of bugs so
far. Also, given that Terraform forks out to a plugin and communicates
with it over RPC, this is the only way you can reliably test providers
using delve (source files and LOCs won't be available for you to set
breakpoints on by running dlv exec terraform.)

This, of course, requires that you have Delve installed:

https://github.com/derekparker/delve
This fixes a couple of issues.

It appears some changes in virtual disk management (quite a while ago)
had actually broken virtual device deletion, at the very least when
removing devices from the middle of a Terraform state (ie: disk 2 of 3).
The effect was that no virtual disks were actually being deleted when
they should have been. However, the current logic (which had been
updated as part of the transition to UUIDs and also due to this problem
in general) had an error in the deletion conditional and was deleting
*all* disks.

This has now been fixed to correct both the old and new issues.

The original error was due to a duplication of the logic that we use in
network interfaces, where removing a device in the middle of the stack
means that the new set will be smaller than the old one, with the last
device removed and the middle one reconfigured. This is not the case
with virtual disks - we do extra diff customization to provide extra
verbosity during the device management process to tell you exactly what
is happening to your disks. The new, shrunk set is only read in after
the process is completed, during the post-update read.

The question remains as to how this passed acceptance tests for so long.
I will be investigating the acceptance test a bit more - I already
debugged the check and it seems to be working, but something is
obviously amiss that is keeping it from functioning correctly.
This fixes the cloning w/extra disks test, which was failing due to a
bad config.
During post-clone, we do a read-in and a diff of all cloned virtual
disks to check to make sure they don't need changing (ie: growing
disks that aren't linked clones). During this situation, some keys are
ignored as they will always be empty or contain bad data that needs to
be populated after committing the diff to state post-apply.

"path" was a missed field here, and it has been added now. We don't
bother skipping this if attach is set to true, as since this is during
the evaluation of the template's disks specifically, there is no chance
these disks will be externally attached.
Was still depending on the name field exclusively.
Update docs to reflect the new state of affairs with the name -> label
transition.
This is to ensure that someone doesn't, for some reason, name their
disks "orphaned_disk_x" and collide with the namespace we use for
assigning orphaned disks.
The vMotion in this operation really messes with the filename (the main
reason we needed to block linked clones in the first place), so it
breaks our tests when using a static value. Now, we just get the new
name of the disk from the path in state instead.
@rismoney
Copy link

If I clone a template to a new VM, and then replace the template (ie via packer), can it preserve the VM, or must it destroy it?

if err != nil {
return fmt.Errorf("corrupt state: strconv.Atoi error on disk.%d.key: %s", i, err)
}
if key < 1 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be <= the comment seems to indicate it should include v1?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh sorry i misunderstood what key represented, ignore

@paultyng
Copy link

LGTM, but maybe @mbfrahry or @tombuildsstuff can weigh in as well?

Copy link
Member

@mbfrahry mbfrahry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vancluever
Copy link
Contributor Author

Hey @rismoney, just FYI I answered your question in detail in #353 I think.

This PR here will be merged in a few hours.

@vancluever
Copy link
Contributor Author

All tests (state migration, VM snapshot, and VM) tests have passed, so merging this now.

Took a while to get these to behave (the suite now takes over 3 hours to run) which was the source of the delay.

=== RUN   TestAccResourceVSphereVirtualMachineMigrateStateV3_fromV2
--- PASS: TestAccResourceVSphereVirtualMachineMigrateStateV3_fromV2 (1.41s)
=== RUN   TestAccResourceVSphereVirtualMachineMigrateStateV3_fromV1
--- PASS: TestAccResourceVSphereVirtualMachineMigrateStateV3_fromV1 (1.48s)
=== RUN   TestAccResourceVSphereVirtualMachineMigrateStateV2
--- PASS: TestAccResourceVSphereVirtualMachineMigrateStateV2 (1.37s)
=== RUN   TestAccResourceVSphereVirtualMachineSnapshot_Basic
--- PASS: TestAccResourceVSphereVirtualMachineSnapshot_Basic (144.65s)
=== RUN   TestAccResourceVSphereVirtualMachine
=== RUN   TestAccResourceVSphereVirtualMachine/basic
=== RUN   TestAccResourceVSphereVirtualMachine/ESXi-only_test
=== RUN   TestAccResourceVSphereVirtualMachine/shutdown_OK
=== RUN   TestAccResourceVSphereVirtualMachine/re-create_on_deletion
=== RUN   TestAccResourceVSphereVirtualMachine/multi-device
=== RUN   TestAccResourceVSphereVirtualMachine/add_devices
=== RUN   TestAccResourceVSphereVirtualMachine/remove_middle_devices
=== RUN   TestAccResourceVSphereVirtualMachine/remove_middle_devices,_change_disk_unit
=== RUN   TestAccResourceVSphereVirtualMachine/high_disk_unit_numbers
=== RUN   TestAccResourceVSphereVirtualMachine/high_disk_units_to_regular_single_controller
=== RUN   TestAccResourceVSphereVirtualMachine/cdrom
=== RUN   TestAccResourceVSphereVirtualMachine/maximum_number_of_nics
=== RUN   TestAccResourceVSphereVirtualMachine/upgrade_cpu_and_ram
=== RUN   TestAccResourceVSphereVirtualMachine/modify_annotation
=== RUN   TestAccResourceVSphereVirtualMachine/grow_disk
=== RUN   TestAccResourceVSphereVirtualMachine/swap_scsi_bus
=== RUN   TestAccResourceVSphereVirtualMachine/extraconfig
=== RUN   TestAccResourceVSphereVirtualMachine/extraconfig_swap_keys
=== RUN   TestAccResourceVSphereVirtualMachine/attach_existing_vmdk
=== RUN   TestAccResourceVSphereVirtualMachine/in_folder
=== RUN   TestAccResourceVSphereVirtualMachine/move_to_folder
=== RUN   TestAccResourceVSphereVirtualMachine/static_mac
=== RUN   TestAccResourceVSphereVirtualMachine/single_tag
=== RUN   TestAccResourceVSphereVirtualMachine/multiple_tags
=== RUN   TestAccResourceVSphereVirtualMachine/switch_tags
=== RUN   TestAccResourceVSphereVirtualMachine/orphaned_(renamed)_disk_in_place_of_existing
=== RUN   TestAccResourceVSphereVirtualMachine/block_computed_disk_name
=== RUN   TestAccResourceVSphereVirtualMachine/block_vapp_settings_on_non-clones
=== RUN   TestAccResourceVSphereVirtualMachine/block_vapp_settings_on_non-clones_after_creation
=== RUN   TestAccResourceVSphereVirtualMachine/block_disk_label_starting_with_orphaned_prefix
=== RUN   TestAccResourceVSphereVirtualMachine/clone_from_template
=== RUN   TestAccResourceVSphereVirtualMachine/clone,_multi-nic_(template_should_have_one)
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_different_timezone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_bad_timezone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_bad_eagerly_scrub_with_linked_clone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_bad_thin_provisioned_with_linked_clone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_bad_size_with_linked_clone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_bad_size_without_linked_clone
=== RUN   TestAccResourceVSphereVirtualMachine/clone_-_attempt_to_add_vapp_properties_to_a_VM_that_does_not_support_them_-_create
=== RUN   TestAccResourceVSphereVirtualMachine/clone_-_attempt_to_add_vapp_properties_to_a_VM_that_does_not_support_them_-_update
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_different_hostname
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_extra_disks
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_cdrom
=== RUN   TestAccResourceVSphereVirtualMachine/clone_with_vapp_properties
=== RUN   TestAccResourceVSphereVirtualMachine/bad_vapp_property_-_clone_create
=== RUN   TestAccResourceVSphereVirtualMachine/bad_vapp_property_-_clone_update
=== RUN   TestAccResourceVSphereVirtualMachine/cpu_hot_add
=== RUN   TestAccResourceVSphereVirtualMachine/memory_hot_add
=== RUN   TestAccResourceVSphereVirtualMachine/dual-stack_ipv4_and_ipv6
=== RUN   TestAccResourceVSphereVirtualMachine/ipv6_only
=== RUN   TestAccResourceVSphereVirtualMachine/windows_template,_customization_events_and_proper_IP
=== RUN   TestAccResourceVSphereVirtualMachine/host_vmotion
=== RUN   TestAccResourceVSphereVirtualMachine/resource_pool_vmotion
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_global_setting
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_single_disk
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_pin_datastore
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_renamed_virtual_machine
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_linked_clones
=== RUN   TestAccResourceVSphereVirtualMachine/storage_vmotion_-_block_externally_attached_disks
=== RUN   TestAccResourceVSphereVirtualMachine/single_custom_attribute
=== RUN   TestAccResourceVSphereVirtualMachine/multi_custom_attribute
=== RUN   TestAccResourceVSphereVirtualMachine/switch_custom_attribute
=== RUN   TestAccResourceVSphereVirtualMachine/transition_to_label
=== RUN   TestAccResourceVSphereVirtualMachine/prevent_revert_to_name
=== RUN   TestAccResourceVSphereVirtualMachine/transition_to_label_-_attached_disk
=== RUN   TestAccResourceVSphereVirtualMachine/import
=== RUN   TestAccResourceVSphereVirtualMachine/import_with_multiple_disks_at_different_SCSI_slots
--- PASS: TestAccResourceVSphereVirtualMachine (11289.18s)
    --- PASS: TestAccResourceVSphereVirtualMachine/basic (158.02s)
    --- SKIP: TestAccResourceVSphereVirtualMachine/ESXi-only_test (0.00s)
    	helper_test.go:88: set VSPHERE_TEST_ESXI to run ESXi-specific acceptance tests
    --- PASS: TestAccResourceVSphereVirtualMachine/shutdown_OK (158.24s)
    --- PASS: TestAccResourceVSphereVirtualMachine/re-create_on_deletion (305.02s)
    --- PASS: TestAccResourceVSphereVirtualMachine/multi-device (168.14s)
    --- PASS: TestAccResourceVSphereVirtualMachine/add_devices (170.21s)
    --- PASS: TestAccResourceVSphereVirtualMachine/remove_middle_devices (180.21s)
    --- PASS: TestAccResourceVSphereVirtualMachine/remove_middle_devices,_change_disk_unit (290.56s)
    --- PASS: TestAccResourceVSphereVirtualMachine/high_disk_unit_numbers (162.69s)
    --- PASS: TestAccResourceVSphereVirtualMachine/high_disk_units_to_regular_single_controller (300.38s)
    --- PASS: TestAccResourceVSphereVirtualMachine/cdrom (18.31s)
    --- PASS: TestAccResourceVSphereVirtualMachine/maximum_number_of_nics (181.99s)
    --- PASS: TestAccResourceVSphereVirtualMachine/upgrade_cpu_and_ram (298.06s)
    --- PASS: TestAccResourceVSphereVirtualMachine/modify_annotation (162.59s)
    --- PASS: TestAccResourceVSphereVirtualMachine/grow_disk (178.08s)
    --- PASS: TestAccResourceVSphereVirtualMachine/swap_scsi_bus (282.77s)
    --- PASS: TestAccResourceVSphereVirtualMachine/extraconfig (157.87s)
    --- PASS: TestAccResourceVSphereVirtualMachine/extraconfig_swap_keys (280.38s)
    --- PASS: TestAccResourceVSphereVirtualMachine/attach_existing_vmdk (161.56s)
    --- PASS: TestAccResourceVSphereVirtualMachine/in_folder (159.23s)
    --- PASS: TestAccResourceVSphereVirtualMachine/move_to_folder (162.68s)
    --- PASS: TestAccResourceVSphereVirtualMachine/static_mac (157.81s)
    --- PASS: TestAccResourceVSphereVirtualMachine/single_tag (160.46s)
    --- PASS: TestAccResourceVSphereVirtualMachine/multiple_tags (162.50s)
    --- PASS: TestAccResourceVSphereVirtualMachine/switch_tags (170.22s)
    --- PASS: TestAccResourceVSphereVirtualMachine/orphaned_(renamed)_disk_in_place_of_existing (370.65s)
    --- PASS: TestAccResourceVSphereVirtualMachine/block_computed_disk_name (3.49s)
    --- PASS: TestAccResourceVSphereVirtualMachine/block_vapp_settings_on_non-clones (4.68s)
    --- PASS: TestAccResourceVSphereVirtualMachine/block_vapp_settings_on_non-clones_after_creation (162.00s)
    --- PASS: TestAccResourceVSphereVirtualMachine/block_disk_label_starting_with_orphaned_prefix (0.01s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_from_template (127.69s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone,_multi-nic_(template_should_have_one) (122.49s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_different_timezone (127.88s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_bad_timezone (0.01s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_bad_eagerly_scrub_with_linked_clone (3.70s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_bad_thin_provisioned_with_linked_clone (3.68s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_bad_size_with_linked_clone (3.87s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_bad_size_without_linked_clone (3.75s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_-_attempt_to_add_vapp_properties_to_a_VM_that_does_not_support_them_-_create (13.73s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_-_attempt_to_add_vapp_properties_to_a_VM_that_does_not_support_them_-_update (123.85s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_different_hostname (127.78s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_extra_disks (296.54s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_cdrom (116.67s)
    --- PASS: TestAccResourceVSphereVirtualMachine/clone_with_vapp_properties (250.43s)
    --- PASS: TestAccResourceVSphereVirtualMachine/bad_vapp_property_-_clone_create (7.96s)
    --- PASS: TestAccResourceVSphereVirtualMachine/bad_vapp_property_-_clone_update (92.35s)
    --- PASS: TestAccResourceVSphereVirtualMachine/cpu_hot_add (137.74s)
    --- PASS: TestAccResourceVSphereVirtualMachine/memory_hot_add (132.55s)
    --- PASS: TestAccResourceVSphereVirtualMachine/dual-stack_ipv4_and_ipv6 (127.81s)
    --- PASS: TestAccResourceVSphereVirtualMachine/ipv6_only (421.13s)
    --- PASS: TestAccResourceVSphereVirtualMachine/windows_template,_customization_events_and_proper_IP (302.66s)
    --- PASS: TestAccResourceVSphereVirtualMachine/host_vmotion (147.72s)
    --- PASS: TestAccResourceVSphereVirtualMachine/resource_pool_vmotion (130.14s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_global_setting (430.50s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_single_disk (290.76s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_pin_datastore (431.02s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_renamed_virtual_machine (440.66s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_linked_clones (252.72s)
    --- PASS: TestAccResourceVSphereVirtualMachine/storage_vmotion_-_block_externally_attached_disks (279.28s)
    --- PASS: TestAccResourceVSphereVirtualMachine/single_custom_attribute (131.51s)
    --- PASS: TestAccResourceVSphereVirtualMachine/multi_custom_attribute (127.73s)
    --- PASS: TestAccResourceVSphereVirtualMachine/switch_custom_attribute (140.27s)
    --- PASS: TestAccResourceVSphereVirtualMachine/transition_to_label (162.95s)
    --- PASS: TestAccResourceVSphereVirtualMachine/prevent_revert_to_name (170.04s)
    --- PASS: TestAccResourceVSphereVirtualMachine/transition_to_label_-_attached_disk (161.43s)
    --- PASS: TestAccResourceVSphereVirtualMachine/import (158.99s)
    --- PASS: TestAccResourceVSphereVirtualMachine/import_with_multiple_disks_at_different_SCSI_slots (160.37s)
PASS

The VM tests are now taking over 3 hours to run. Parallelization of
these resources is now sorely needed, but this will ensure they actually
complete until that is taken care of.
@vancluever vancluever merged commit ba30c39 into master Jan 23, 2018
@ryancurrah
Copy link

We just compiled master branch with this change, tested it and we confirm it fixes 308

@vancluever vancluever deleted the f-disk-vmdk-naming-decouple branch February 13, 2018 15:57
@ghost ghost locked and limited conversation to collaborators Apr 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants