Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connectivity diagnostic with VC/ESXi from appliance #3210

Merged
merged 15 commits into from
Nov 19, 2016
Merged

Connectivity diagnostic with VC/ESXi from appliance #3210

merged 15 commits into from
Nov 19, 2016

Conversation

vburenin
Copy link
Contributor

INFO[2016-11-17T09:21:25-08:00] Waiting for IP information
INFO[2016-11-17T09:21:31-08:00] Waiting for major appliance components to launch

INFO[2016-11-17T09:21:31-08:00] Checking connectivity with the target vCenter/ESXi
INFO[2016-11-17T09:21:31-08:00] VC/ESXi API Ping Test: https://10.0.1.152 is reacheable and responds to ICMP ping request
INFO[2016-11-17T09:21:31-08:00] VC/ESXi API Test: https://10.0.1.152 API responds as expected

INFO[2016-11-17T09:21:34-08:00] Initialization of appliance successful
INFO[2016-11-17T09:21:34-08:00]
INFO[2016-11-17T09:21:34-08:00] VCH Admin Portal:
INFO[2016-11-17T09:21:34-08:00] https://10.0.1.156:2378
INFO[2016-11-17T09:21:34-08:00]
INFO[2016-11-17T09:21:34-08:00] Published ports can be reached at:
INFO[2016-11-17T09:21:34-08:00] 10.0.1.156
INFO[2016-11-17T09:21:34-08:00]
INFO[2016-11-17T09:21:34-08:00] Docker environment variables:
INFO[2016-11-17T09:21:34-08:00] DOCKER_HOST=10.0.1.156:2375
INFO[2016-11-17T09:21:34-08:00]
INFO[2016-11-17T09:21:34-08:00] Environment saved in vic-docker/vic-docker.env
INFO[2016-11-17T09:21:34-08:00]
INFO[2016-11-17T09:21:34-08:00] Connect to docker:
INFO[2016-11-17T09:21:34-08:00] docker -H 10.0.1.156:2375 info
INFO[2016-11-17T09:21:34-08:00] Installer completed successfully
INFO[2016-11-17T09:19:08-08:00] Waiting for IP information
INFO[2016-11-17T09:19:13-08:00] Waiting for major appliance components to launch
INFO[2016-11-17T09:19:13-08:00] Checking connectivity with the target vCenter/ESXi
ERRO[2016-11-17T09:19:14-08:00] VC/ESXi API Ping Test: https://brokenesx is unknown host name
INFO[2016-11-17T09:19:14-08:00] Collecting ha-host hostd.log
ERRO[2016-11-17T09:19:14-08:00] --------------------
ERRO[2016-11-17T09:19:14-08:00] vic-machine-linux failed: Access to VC/ESXi failed a ping test

Fixes #3066

@cgtexmex
Copy link
Contributor

It would be nice if the error message vic-machine-linux failed: Access to VC/ESXi failed a ping test indicated that the issue is between the appliance and vc/ESXi and not some where else...potentially even listing the source and target.

@vburenin
Copy link
Contributor Author

@cgtexmex Good point! I would really appreciate if people could take a look at error texts to rephrase them in a more human friendly way.

Copy link
Contributor

@caglar10ur caglar10ur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not portable, there is no guarantee that ping binary will be there to use. Also parsing a command output is show stopper for me. Please consider using golang.org/x/net/icmp. Eg; bonneville-appliance (internal reference) https://enatai-git.eng.vmware.com/bonneville/bonneville-appliance/blob/master/setup.go#L74

@vburenin
Copy link
Contributor Author

@caglar10ur golang.org/x/net/icmp is a new vendored dependency. I can create an issue to ICMP when vendor is unlocked.

@vburenin
Copy link
Contributor Author

@caglar10ur #3219

@vburenin vburenin dismissed caglar10ur’s stale review November 17, 2016 20:46

it a hard piece for this PR. Here is a new one specifically for your request: #3219

// such as connections, opened files, etc. Thus, ControlWaitGroup fits well to limit the number
// of running goroutines.
type ControlWaitGroup struct {
sem *Semaphore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this defined?

// dynamic so it only make sense to use it for logging purposes.
func (cwg *ControlWaitGroup) Waiting() int {
cwg.mu.Lock()
w := cwg.waiting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Semaphore is a buffered channel where the channel size is the value of the semaphore, then waiting just turns into value - len(semaphor).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also using a buffered channel means you don't need mu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, it's accidentally sneaked in. I will remove this file.

@caglar10ur
Copy link
Contributor

using icmp pkg produces way less code than this diag pkg (and much easier to wortk with IMHO) so I'm still not sure why we prefer that. From that link above

// PrivilegedPing sends a ipv4.ICMPTypeEcho message to the given IP and returns true if it responds with a ipv4.ICMPTypeEchoReply
// Terminates if an error occurs
func PrivilegedPing(ip net.IP, seq int) bool {
    c, err := icmp.ListenPacket("ip4:icmp", "0.0.0.0")
    if err != nil {
        GuestRPCErrorf("Unable to listen: %s", err)
        os.Exit(ERR)
    }
    defer c.Close()

    // craft the icmp message
    wm := icmp.Message{
        Type: ipv4.ICMPTypeEcho,
        Code: 0,
        Body: &icmp.Echo{
            ID:   os.Getpid() & 0xffff,
            Seq:  1 << uint(seq),
            Data: []byte("KNOCK-IF-YOU-CAN-HEAR-ME"),
        },
    }

    // marshal it to write buffer
    wb, err := wm.Marshal(nil)
    if err != nil {
        GuestRPCErrorf("Unable to marshall the icmp message: %s", err)
        os.Exit(ERR)
    }

    dst := &net.IPAddr{IP: ip}
    // write it to wire
    if n, err := c.WriteTo(wb, dst); err != nil {
        GuestRPCErrorf("Unable to write: %s", err)
        os.Exit(ERR)
    } else if n != len(wb) {
        GuestRPCErrorf("Unable to write: got %v; want %v", n, len(wb))
        os.Exit(ERR)
    }

    // create read buffer
    rb := make([]byte, 1500)

    // Set read deadling to 3 sec.
    if err := c.SetReadDeadline(time.Now().Add(3 * time.Second)); err != nil {
        GuestRPCErrorf("Unable to set deadline: %s", err)
        os.Exit(ERR)
    }

    // read it from wire to read buffer
    n, _, err := c.ReadFrom(rb)
    if err != nil {
        GuestRPCErrorf("Unable to read: %s", err)
        os.Exit(ERR)
    }

    // parse the icmp message
    rm, err := icmp.ParseMessage(iana.ProtocolICMP, rb[:n])
    if err != nil {
        GuestRPCErrorf("Unable to parse: %s", err)
        os.Exit(ERR)
    }

    // type assert to see whether we have a response
    switch rm.Type {
    case ipv4.ICMPTypeEchoReply:
        return true
    default:
        return false
    }
}

but I'm fine with this if others are OK with it so I'll let them to decide

@vburenin
Copy link
Contributor Author

@caglar10ur it is just PING for IPv4. I will need to take care about name resolution, IPv6, etc. I agreed that this needs to be done, but not today. Issue has been filed, so I will take care about this after this PR for the next release.

@caglar10ur
Copy link
Contributor

we don't support anything other than ipv4 at this time and we have net.LookupAddr for name resolution, just saying...

return fmt.Errorf("Could not run network diagnostic on appliance")
}

// In case of fatal error, log error and exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exist/exit

}

// Checking if VC/ESXi can respond to ping request.
const pingTestTxt = "VC/ESXi API Ping Test:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to log that we're running a ping test. We could just report that we can't reach the VCH if the ping test fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I am on the other side about it ;)


const (
// PingStatusOk Host name was resolved and all pings went through.
PingStatusOk = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we need such detailed information about the ping output. How about just the output and exit code from the command, and the client can figure out what went wrong? In our case, we just need to know if the exit code != 0 to report bad connectivity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Giving more detailed diagnostics is better I think.

@vburenin
Copy link
Contributor Author

@caglar10ur This code example is not the good one. It doesn't take into account so many things: It doesn't check reponse ID, doesn't check sequence, doesn't track sequences that are being sent and received, it doesn't take into account possible intermitent delays and that packets may be delivered in different order, etc. I am working on my own code for this - but it is already a large one. Definitely not for this PR.

net.LookupAddr -> net.LookupIP

@@ -1160,9 +1157,66 @@ func (c *Create) Run(cliContext *cli.Context) (err error) {
return err
}

// vic-init will try to reach out to the vCenter/ESXi host.
log.Info("Checking connectivity with the target vCenter/ESXi")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is unclear:
Checking VCH connectivity with vSphere target


// Checking if VC/ESXi can respond to ping request.
const pingTestTxt = "VC/ESXi API Ping Test:"
cd, err := executor.CheckVCPingFromAppliance(ctx, vch, vchConfig.Target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put a ping test in here - there are plenty of people who will be blocking ICMP packets.
The only thing we care about is whether the endpointVM can talk to vSphere API - just the HEAD request will suffice.

Ping is solely for diagnostics in the failure case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No ICMP responses is totally ok test result. It is not in the fatal category, it will be just logged on warning level saying ping is not working.

@hickeng
Copy link
Member

hickeng commented Nov 18, 2016

@vburenin
This issue is about providing a means to determine:
a. can the target be resolved?
b. can we get an HTTP response from the target?

Additional diagnostics to aid in troubleshooting don't belong in vic-machine but in vic-admin.
If we wish to be able to report health status via vic-machine in general, that's a completely different PR.

I'd much prefer we add a structured mechanism to allow the components to report status, rather than add narrow checks like this into vic-machine. Done properly this can be leveraged to report health via vSphere alerts and in vic-admine as well as in all the other vic-machine operations.
Until we have a structured mechanism my request are:

  1. drop/icebox this PR in it's current form
  2. modify components to update the vchconfig "status" field for the session with an appropriate (concise) error message if connection to vsphere fails (and potentially other fatal paths), and another message e.g. "initialization successful", when the portlayer is fully up
  3. modify ensureApplianceInitializes to recognize the success message and wait for that in vic-machine, reporting and error and existing if an unexpected status is found.

This allows for reporting a MUCH broader range of errors, with the errors determined by inline product code rather than sideline diagnostics. This status can also be checked by vic-machine ls and vic-machine inspect to provide a quick gauge of health.

@@ -0,0 +1,229 @@
// Copyright 2016 VMware, Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be under pkg/vsphere - it's specifically checking the vsphere API

// CheckPing runs ping to check if vSphere target is resolvable and pingable.
func CheckPing(hn string) int {
cmd := exec.Command("ping", "-c", "4", "-W", "3", "-i", "0.1", hn)
return runPing(cmd.CombinedOutput())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely to fail intermittently as it's in a race with the childReaper.

Copy link
Contributor Author

@vburenin vburenin Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hickeng hm. Any suggestion on that?

@vburenin
Copy link
Contributor Author

@hickeng I reduced scope of vSphere API test to focus purely on responses from client.Get
The rest will can do next time.

if err != nil {
errTxt := err.Error()
op.Errorf("Query error: %s", err)
if strings.Contains(errTxt, "no such host") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to actually check for errno's here instead of comparing error strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. I wish it was the case.

@hmahmood
Copy link
Contributor

lgtm

@andrewtchin
Copy link
Contributor

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants