-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2.2] Stream Envoy metrics to the cloud #4053
Conversation
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
…amata/agent-metrics-stream
Signed-off-by: Flynn <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Flynn <[email protected]>
…amata/agent-metrics-stream
…gress/emissary into dcamata/agent-metrics-stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very cool! We need to change the port number to 8006, sadly: sorry about that! Looking forward to getting this in. 🙂
Also add it to the table of ports being used in python/README.md Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
…amata/agent-metrics-stream
Signed-off-by: Douglas Camata <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! Thanks! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops -- that gotest
failure looks like it might be real? 😐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll approve this again with the understanding that we must track down the race in the tests.
pkg/agent/envoy_metrics_server.go
Outdated
|
||
dlog.Infof(ctx, "metrics service listening on %s", listener.Addr().String()) | ||
s.logCtx = ctx | ||
return grpcServer.Serve(listener) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use github.com/datawire/dlib/dhttp, rather than the google.golang.org/grpc HTTP server. (For an example of how to do this, see the cmd/example-envoy-metrics-sink/
which this PR also edits.)
port = int(parts[1]) | ||
else: | ||
raise ValueError("too many colons") | ||
return host, port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will definitely raise an exception for IPv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there should be type annotations on the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest way to do this is probably
from urllib.parse import urlparse
from typing import Tuple
def split_host_port(value: str) -> Tuple[str, int]:
parsed = urlparse("//"+value)
return parsed.hostname, int(parsed.port or 80)
cmd/agent/main.go
Outdated
if err := metricsServer.StartServer(ctx); err != nil { | ||
dlog.Error(ctx, err) | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't ever launch a goroutine that you don't have a way to wait for it to shut down. github.com/datawire/dlib/dgroup can help with this, but you are free to use other solutions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, 3 concerns:
- the split_host_port routine
- not keeping track of goroutines
- using the google.golang.org/grpc HTTP server
I'm not super-opposed to merging this for -rc.0 and then fixing those later. But if we weren't pushing for an RC ASAP, this'd be a "request changes".
Add some type checking on top Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Flynn <[email protected]>
Signed-off-by: Flynn <[email protected]>
…gress/emissary into dcamata/agent-metrics-stream
…amata/agent-metrics-stream
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Douglas Camata <[email protected]>
Signed-off-by: Flynn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapproving after fixing merge conflicts and pinning pytest
to version 6.2.5 -- let's land this thing!!
Description
This pull request intends to stream metrics from Envoy to Ambassador's cloud. We're interested the following metrics for both stable and canary clusters:
The code currently drops all the other metrics in the agent. In the future, this might be done at Envoy's configuration to completely avoid all the extra network traffic between agent and Emissary Ingress pods.
The flow of the metrics is:
Related Issues
This is based in the work done at #3657.
Testing
I could test this manually in a local cluster and a server that behaves like Ambassador's cloud.
Checklist
I made sure to update
CHANGELOG.md
.Remember, the CHANGELOG needs to mention:
This is unlikely to impact how Ambassador performs at scale.
Remember, things that might have an impact at scale include:
My change is adequately tested.
Remember when considering testing:
I updated
DEVELOPING.md
with any any special dev tricks I had to use to work on this code efficiently.