-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connectivity diagnostic with VC/ESXi from appliance #3210
Conversation
# Conflicts: # cmd/vic-machine/create/create.go
It would be nice if the error message |
@cgtexmex Good point! I would really appreciate if people could take a look at error texts to rephrase them in a more human friendly way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not portable, there is no guarantee that ping binary will be there to use. Also parsing a command output is show stopper for me. Please consider using golang.org/x/net/icmp. Eg; bonneville-appliance (internal reference) https://enatai-git.eng.vmware.com/bonneville/bonneville-appliance/blob/master/setup.go#L74
@caglar10ur golang.org/x/net/icmp is a new vendored dependency. I can create an issue to ICMP when vendor is unlocked. |
it a hard piece for this PR. Here is a new one specifically for your request: #3219
// such as connections, opened files, etc. Thus, ControlWaitGroup fits well to limit the number | ||
// of running goroutines. | ||
type ControlWaitGroup struct { | ||
sem *Semaphore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this defined?
// dynamic so it only make sense to use it for logging purposes. | ||
func (cwg *ControlWaitGroup) Waiting() int { | ||
cwg.mu.Lock() | ||
w := cwg.waiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Semaphore
is a buffered channel where the channel size is the value of the semaphore, then waiting
just turns into value - len(semaphor)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using a buffered channel means you don't need mu
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, it's accidentally sneaked in. I will remove this file.
using icmp pkg produces way less code than this diag pkg (and much easier to wortk with IMHO) so I'm still not sure why we prefer that. From that link above
but I'm fine with this if others are OK with it so I'll let them to decide |
@caglar10ur it is just PING for IPv4. I will need to take care about name resolution, IPv6, etc. I agreed that this needs to be done, but not today. Issue has been filed, so I will take care about this after this PR for the next release. |
we don't support anything other than ipv4 at this time and we have net.LookupAddr for name resolution, just saying... |
return fmt.Errorf("Could not run network diagnostic on appliance") | ||
} | ||
|
||
// In case of fatal error, log error and exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/exist/exit
} | ||
|
||
// Checking if VC/ESXi can respond to ping request. | ||
const pingTestTxt = "VC/ESXi API Ping Test:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to log that we're running a ping test. We could just report that we can't reach the VCH if the ping test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I am on the other side about it ;)
|
||
const ( | ||
// PingStatusOk Host name was resolved and all pings went through. | ||
PingStatusOk = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we need such detailed information about the ping output. How about just the output and exit code from the command, and the client can figure out what went wrong? In our case, we just need to know if the exit code != 0 to report bad connectivity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giving more detailed diagnostics is better I think.
@caglar10ur This code example is not the good one. It doesn't take into account so many things: It doesn't check reponse ID, doesn't check sequence, doesn't track sequences that are being sent and received, it doesn't take into account possible intermitent delays and that packets may be delivered in different order, etc. I am working on my own code for this - but it is already a large one. Definitely not for this PR. net.LookupAddr -> net.LookupIP |
@@ -1160,9 +1157,66 @@ func (c *Create) Run(cliContext *cli.Context) (err error) { | |||
return err | |||
} | |||
|
|||
// vic-init will try to reach out to the vCenter/ESXi host. | |||
log.Info("Checking connectivity with the target vCenter/ESXi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is unclear:
Checking VCH connectivity with vSphere target
|
||
// Checking if VC/ESXi can respond to ping request. | ||
const pingTestTxt = "VC/ESXi API Ping Test:" | ||
cd, err := executor.CheckVCPingFromAppliance(ctx, vch, vchConfig.Target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't put a ping test in here - there are plenty of people who will be blocking ICMP packets.
The only thing we care about is whether the endpointVM can talk to vSphere API - just the HEAD request will suffice.
Ping is solely for diagnostics in the failure case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No ICMP responses is totally ok test result. It is not in the fatal category, it will be just logged on warning level saying ping is not working.
@vburenin Additional diagnostics to aid in troubleshooting don't belong in vic-machine but in vic-admin. I'd much prefer we add a structured mechanism to allow the components to report status, rather than add narrow checks like this into vic-machine. Done properly this can be leveraged to report health via vSphere alerts and in vic-admine as well as in all the other vic-machine operations.
This allows for reporting a MUCH broader range of errors, with the errors determined by inline product code rather than sideline diagnostics. This status can also be checked by |
@@ -0,0 +1,229 @@ | |||
// Copyright 2016 VMware, Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be under pkg/vsphere
- it's specifically checking the vsphere API
// CheckPing runs ping to check if vSphere target is resolvable and pingable. | ||
func CheckPing(hn string) int { | ||
cmd := exec.Command("ping", "-c", "4", "-W", "3", "-i", "0.1", hn) | ||
return runPing(cmd.CombinedOutput()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely to fail intermittently as it's in a race with the childReaper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hickeng hm. Any suggestion on that?
@hickeng I reduced scope of vSphere API test to focus purely on responses from client.Get |
if err != nil { | ||
errTxt := err.Error() | ||
op.Errorf("Query error: %s", err) | ||
if strings.Contains(errTxt, "no such host") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to actually check for errno's here instead of comparing error strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. I wish it was the case.
lgtm |
lgtm |
Fixes #3066