Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Unmanaged Disk support #421

Merged
merged 2 commits into from
Feb 21, 2017

Conversation

codenrhoden
Copy link
Contributor

This patchset builds off of work from @Andrey-mp in #372, and supersedes that PR.

Two main things missing right now are:

  • re-verifying tests (they were written, but I haven't run/updated them yet)
  • documentation

The documentation for this will be important. Setting up the required accounts and permissions is not trivial, and I found it complicated for myself.

Another issue that we may run into is that Azure is very slow when it comes to disk operations. Just throwing raw API requests (via curl) at libStorage, I often saw attach/detach requests take 50-80 seconds. The task timeout in libS seems to be 60 seconds, at which point a task ID is returned that you can check the status of. Not sure how this will play with REX-Ray.

This patchset adds support of Azure Cloud.
There are several operations implemented in patchset:
create, attach, detach, remove (all base operations
for volumes).
@codecov-io
Copy link

codecov-io commented Feb 17, 2017

Codecov Report

Merging #421 into release/0.5.0 will decrease coverage by -0.06%.
The diff coverage is n/a.

@@                Coverage Diff                @@
##           release/0.5.0     #421      +/-   ##
=================================================
- Coverage           31.1%   31.05%   -0.06%     
=================================================
  Files                 29       29              
  Lines               1752     1752              
=================================================
- Hits                 545      544       -1     
- Misses              1149     1150       +1     
  Partials              58       58
Impacted Files Coverage Δ
api/types/types_localdevices.go 77.35% <ø> (-1.89%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb80e3e...a337287. Read the comment docs.

@codenrhoden
Copy link
Contributor Author

Tests pass now! I did have to run it multiple times to avoid the 60sec timeout issue for volume attach/detach tests... finally got it.

[trhoden@test1 ~]$ ./azuredisk.test | tee testoutput.txt
PASS
coverage: 21.2% of statements in github.com/codedellemc/libstorage/drivers/storage/azuredisk, github.com/codedellemc/libstorage/drivers/storage/azuredisk/executor

@codenrhoden codenrhoden force-pushed the feature/azure branch 3 times, most recently from c47ac32 to ceb80f9 Compare February 20, 2017 22:42
@codenrhoden
Copy link
Contributor Author

Okay, missing docs, but I think this is good to go for a review and see if it can be included in -RC2.

@codenrhoden codenrhoden requested a review from akutz February 20, 2017 22:44
This patchset renames the driver to `azureud`, to signify Azure
Unmanaged Disk driver. This set of enhancements builds off of previous
work, and completes the driver for inclusion into libStorage.

Use metadata server to detect azure instance

Don't return '.vhd' as part of volume name

Don't return OS disks

Proper attachment logic, remove instance filtering

improve login/init logic

Combine toTypeVolume and toTypesVolume

No need for separate functions here. This is in preparation for a call
to get VM details when getting attached device info. When we receive a
list of several volumes, and attachments.Devices() is true, we don't
want to have to query the VM status for every volume. By combining these
functions we can ensure that we only make one additional API call,
instead of making 1 call for every volume.

make executor localDevices return LUN number

The only way we have to match up an attached disk on a VM is via its LUN
number. When we query a disk, we get the name of the VM its attached to,
and that's it. When we query the VM, we get a list of data disks
attached to it, but not it's device path, only its LUN. Therefore, in
order to match up a volume with a device, we have to match on the LUN.

The easiest way to get the LUN is with the `lsscsi` utility. This
introduces a new, external dependency. It's technically possible to
parse through /sys/bus/scsi/devices/*/block/sd* to get this without an
external utility, but that is more effor than I am willing to put in at
this point.

This is the last hurdle needed to match up a volume with a device when
attachments.Devices() is requested.

Return attachment device info when requested

Remove IID restriction from VolumeCreate

VolumeCreate was requiring an IID, which is not necessary.
@akutz akutz changed the title WIP: Introduce MS Azure driver for unmanaged disks Azure Unmanaged Disk support Feb 21, 2017
@akutz akutz merged commit 80a2257 into thecodeteam:release/0.5.0 Feb 21, 2017
@akutz akutz removed the in progress label Feb 21, 2017
@gpaul
Copy link

gpaul commented Apr 6, 2017

I often saw attach/detach requests take 50-80 seconds. The task timeout in libS seems to be 60 seconds, at which point a task ID is returned that you can check the status of. Not sure how this will play with REX-Ray.

Any update on this? Specifically: I saw REX-Ray updated to v0.5.2 of libstorage recently but I couldn't find any such async retry logic.

@codenrhoden codenrhoden deleted the feature/azure branch April 6, 2017 15:43
@codenrhoden
Copy link
Contributor Author

@gpaul The documentation for the Azure driver tells you how to work around this.

In short, bump that default 60 second timeout to something greater, like 120.

@gpaul
Copy link

gpaul commented Apr 7, 2017

@codenrhoden Thanks for the link to the docs. Does Azure provide a <120s SLA? It seems like this will just makes failures less common and therefore more devestating. It seems like libstorage does exactly the right thing by allowing async operation, the only question is whether eg., REX-Ray takes proper advantage of that. I'll open a RR issue, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants