Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vic-machine should test connectivity and name resolution on install of VCH and provide clear feedback and fail install if it doesn't work. #3066

Closed
mlh78750 opened this issue Nov 8, 2016 · 10 comments
Assignees
Labels
area/appliance area/diagnostics Utilities, procedures, and output to help to identify errors component/install source/customer Reported by a customer, directly or via an intermediary

Comments

@mlh78750
Copy link
Contributor

mlh78750 commented Nov 8, 2016

See: #3031 (comment)

If customer uses FQDN for install but it is only resolvable by the client (and possibly VC) but the VCH doesn't get configured with the same resolver then we will fail in very non-obvious ways. We should have vic-machine test the ability to resolve any passed in FQDN as well as get a list of hosts from the VC cluster/resource pool and if they are FQDN and not IP's make sure we can resolve them as well.

Example, in #3031 was a cluster of one host with local storage. When installed it looks like the VC names the host with a FQDN but the VCH cannot resolve it.

vic-machine should test these conditions and fail with a clear error if name resolution on the VCH doesn't work.

@mlh78750 mlh78750 added kind/quality area/appliance component/install source/customer Reported by a customer, directly or via an intermediary labels Nov 8, 2016
@mlh78750
Copy link
Contributor Author

mlh78750 commented Nov 8, 2016

not sure if i'd hold a hard line on this making it for 0.8.0 but I want us to try.

@mlh78750 mlh78750 added this to the VIC GA Release milestone Nov 8, 2016
@mlh78750
Copy link
Contributor Author

mlh78750 commented Nov 9, 2016

Ran into this exact issue with high touch beta customer today. We need this in GA for sure.

@mdubya66
Copy link
Contributor

mdubya66 commented Nov 9, 2016

Just in case, is there a workaround?

@mlh78750
Copy link
Contributor Author

mlh78750 commented Nov 9, 2016

it's a quality issue. The error message is non-existent and there is no way to get info from the appliance to diagnose. Ben and I literally just spent 2 hours because we didn't have this.

@mdubya66
Copy link
Contributor

mdubya66 commented Nov 9, 2016

Understood. until we have it we should document the workaround here.

@hickeng
Copy link
Member

hickeng commented Nov 9, 2016

#2811 addresses the purely functional aspect of this (protecting vicadmin against the case where vSphere isn't available) but doesn't touch on presenting data necessary to diagnose the problem.

In the case that a vSphere connection is unavailable, we need to:
a. present a check list for a diagnostics workflow with expanded information for the first failing step

  • we will prompt from credentials, then fail to connect so these basic vsphere connectivity diagnostics need to be presented even if auth fails
  • logs should not be available for download without auth
  • fail over to using local system auth to gain access to log bundles
  • the root user should have a shell configured, not /bin/false - will need to be set via vic-machine debug or have been deployed with debug level sufficient to enable local console

A basic diagnostics workflow for why vSphere connection is unavailable:

  1. Has a connection previously been made to vSphere using the configured target?
  • yes - likely network outage.
    • report traceroute vsphere.fqdn.tld output
    • try accessing proxy if configured
    • try accessing configured registry
  1. Can the target be resolved to an IP address?
    • yes
      • report the resolved IP, interface IP addresses and the route table on the VCH.
    • no
      • report the configured DNS server and the response of the lookup
  2. Can a connection be made to vsphere? (HTTP HEAD request for example)
    • yes - report successful connection
    • no - report the routing table and traceroute vsphere.fqdn.tld output
  3. Can we authenticate with vsphere?
    • yes - report successful authentication
    • no - report unable to authenticate error, but do not report credentials used
    • if vsphere returns an error other than not authorized then log and display that
  4. Can we access the endpointVM via the VIM API?
    • yes - report success
    • no - need to investigate what could go wrong on this path

@mlh78750 If there's omissions you notice in either the steps or the data to be presented to aid in diagnostics (bear in mind this is presented without authentication) please can you edit this comment to add it. --done (mlh)

@anchal-agrawal anchal-agrawal added the area/diagnostics Utilities, procedures, and output to help to identify errors label Nov 9, 2016
@mlh78750
Copy link
Contributor Author

mlh78750 commented Nov 9, 2016

@mdubya66 The best I can do for a workaround that would work for this is #3079 or the workaround listed in #3079.

@vburenin
Copy link
Contributor

vburenin commented Nov 14, 2016

I am going to run several tests on VCH side:

  1. Ping www.vmware.com
  2. ping www.emc.com
  3. ping www.dell.com
  4. ping VC
  5. ping default gw
  6. Access VC via HTTP/HTTPS - HEAD request
  7. Access VC via HTTP + Proxy if proxy is defined.

Report will be displayed what was tested and what is the outcome.
Report will be available right after deployment and via VIC Admin page.

I am not clear about endpointVM - is it VCH?

@mlh78750
Copy link
Contributor Author

I think you only need 4 through 6. (and you have two 5's).

This was more about making sure that the networking configuration that they provided will allow the VCH to reach the --target.

@hickeng
Copy link
Member

hickeng commented Nov 17, 2016

@vburenin the VCH is the composite structure of resource pool, plus distributed switch, datastores and endpointVM (aka applianceVM).
I'm not sure what you mean by "report will be available right after deployment" - do you mean via vic-machine, vic the VM console or some other medium?

I ran through some of this with @anchal-agrawal while he was here in SF, so please make sure you're not duplicating work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/appliance area/diagnostics Utilities, procedures, and output to help to identify errors component/install source/customer Reported by a customer, directly or via an intermediary
Projects
None yet
Development

No branches or pull requests

5 participants