Skip to content

Commit

Permalink
fix grpc_stats default behavior with envoy upgrade
Browse files Browse the repository at this point in the history
The existing behavior in Envoy 1.17 was to produce grpc_stats with
the following naming schemes `envoy_cluster_grpc_{ServiceName}_*`
when setting `all_methods` or `services` within the grpc_stats config
on a `Module`. When neither were set it would aggregate it to
`envoy_cluster_grpc_*`.

The new behavior since Envoy 1.18 will produce metrics with the following
naming scheme `envoy_cluster_grpc_{MethodName}_*` and `envoy_cluster_grpc_*`.

Based on great investigation from Luke Shumaker, both versions seems to be
improperly parsing the ServiceName. Especially, when a more complex ServiceName
such as `cncf.telepresence.Manager/Status` is used. This parsing logic will
need to be fixed in upstream Envoy.

This commit also adds unit test for how the `grpc_stats` are parsed from the
Module and used to configure the filter. If a user provided an empty set:

```yaml
apiVersion: getambassador.io/v3alpha1
kind:  Module
metadata:
  name: ambassador
spec:
  config:
    grpc_stats: {}
```

Then this would be equivalent to setting `grpc_stats.all_methods=false`. When
setting it with missing fields like such:

```yaml
apiVersion: getambassador.io/v3alpha1
kind:  Module
metadata:
  name: ambassador
spec:
  config:
    grpc_stats:
```

This will be equivalent to not being set at all and `grpc_stats` will be disabled.

Co-authored-by: Luke Shumaker <[email protected]>
Signed-off-by: Lance Austin <[email protected]>
  • Loading branch information
Lance Austin and LukeShu committed Jun 22, 2022
1 parent 01a93be commit c2933af
Show file tree
Hide file tree
Showing 5 changed files with 212 additions and 32 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,21 @@ it will be removed; but as it won't be user-visible this isn't considered a brea
migration instructions, and while the `*-agent.yaml` files remained part of the instructions they
were actually unnescessary.

- Change: The previous version of Emissary-ingress was based on Envoy 1.17 and when using grpc_stats
with `all_methods` or `services` set, it would output metrics in the following format
`envoy_cluster_grpc_{ServiceName}_{statname}`. When neither of these fields are set it would be
aggregated to `envoy_cluster_grpc_{statname}`.
The new behavior since Envoy 1.18 will produce
metrics in the following format `envoy_cluster_grpc_{MethodName}_statsname` and
`envoy_cluster_grpc_statsname`.
After further investigation we found that Envoy doesn't properly
parse service names such as `cncf.telepresence.Manager/Status`. In the future, we will work
upstream Envoy to get this parsing logic fixed to ensure consistent metric naming.

- Bugfix: Previously setting `grpc_stats` in the `ambassador` `Module` without setting either
`grpc_stats.services` or `grpc_stats.all_methods` would result in crashing. Now it behaves as if
`grpc_stats.all_methods=false`.

## [2.3.1] June 09, 2022
[2.3.1]: https://github.com/emissary-ingress/emissary/compare/v2.3.0...v2.3.1

Expand Down
20 changes: 20 additions & 0 deletions docs/releaseNotes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,26 @@ items:
<code>*-migration.yaml</code> files have not been part of the migration instructions, and
while the <code>*-agent.yaml</code> files remained part of the instructions they were
actually unnescessary.
- title: Metric naming change for grpc_stats
type: change
body: >-
The previous version of $productName$ was based on Envoy 1.17 and when using grpc_stats
with <code>all_methods</code> or <code>services</code> set, it would output metrics in
the following format <code>envoy_cluster_grpc_{ServiceName}_{statname}</code>. When
neither of these fields are set it would be aggregated to <code>envoy_cluster_grpc_{statname}</code>.
The new behavior since Envoy 1.18 will produce metrics in the following format
<code>envoy_cluster_grpc_{MethodName}_statsname</code> and <code>envoy_cluster_grpc_statsname</code>.
After further investigation we found that Envoy doesn't properly parse service
names such as <code>cncf.telepresence.Manager/Status</code>. In the future, we will work
upstream Envoy to get this parsing logic fixed to ensure consistent metric naming.
- title: Default behavior for empty grpc_stats changed
type: bugfix
body: >-
Previously setting <code>grpc_stats</code> in the <code>ambassador</code> <code>Module</code>
without setting either <code>grpc_stats.services</code> or <code>grpc_stats.all_methods</code>
would result in crashing. Now it behaves as if <code>grpc_stats.all_methods=false</code>.
- version: 2.3.1
date: '2022-06-09'
notes:
Expand Down
27 changes: 8 additions & 19 deletions python/ambassador/ir/irambassador.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,32 +251,21 @@ def finalize(self, ir: 'IR', aconf: Config) -> bool:
self.grpc_web.sourced_by(amod)
ir.save_filter(self.grpc_web)

if amod and ('grpc_stats' in amod):
grpc_stats = amod.grpc_stats

if amod and (grpc_stats := amod.get('grpc_stats')) is not None:
# grpc_stats = { 'all_methods': False} if amod.grpc_stats is None else amod.grpc_stats
# default config with safe values
config = {
'individual_method_stats_allowlist': {
'services': []
},
'stats_for_all_methods': False,
config: Dict[str, Any] = {
'enable_upstream_stats': False
}

if ('services' in grpc_stats):
# Only one of config['individual_method_stats_allowlist'] or
# config['stats_for_all_methods'] can be set.
if 'services' in grpc_stats:
config['individual_method_stats_allowlist'] = {
'services': grpc_stats['services']
}
# remove stats_for_all_methods key from config. only one of individual_method_stats_allowlist or
# stats_for_all_methods can be set
config.pop('stats_for_all_methods')

# if 'services' is present, ignore 'all_methods'
if ('all_methods' in grpc_stats) and ('services' not in grpc_stats):
config['stats_for_all_methods'] = bool(grpc_stats['all_methods'])
# remove individual_method_stats_allowlist key from config. only one of individual_method_stats_allowlist or
# stats_for_all_methods can be set
config.pop('individual_method_stats_allowlist')
else:
config['stats_for_all_methods'] = bool(grpc_stats.get('all_methods', False))

if ('upstream_stats' in grpc_stats):
config['enable_upstream_stats'] = bool(grpc_stats['upstream_stats'])
Expand Down
26 changes: 13 additions & 13 deletions python/tests/kat/t_grpc_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,14 @@ def check(self):
stats = self.results[-1].text

metrics = [
'envoy_cluster_grpc_EchoService_0',
'envoy_cluster_grpc_EchoService_13',
'envoy_cluster_grpc_EchoService_request_message_count',
'envoy_cluster_grpc_EchoService_response_message_count',
'envoy_cluster_grpc_EchoService_success',
'envoy_cluster_grpc_EchoService_total',
'envoy_cluster_grpc_Echo_0',
'envoy_cluster_grpc_Echo_13',
'envoy_cluster_grpc_Echo_request_message_count',
'envoy_cluster_grpc_Echo_response_message_count',
'envoy_cluster_grpc_Echo_success',
'envoy_cluster_grpc_Echo_total',
# present only when enable_upstream_stats is true
'envoy_cluster_grpc_EchoService_upstream_rq_time'
'envoy_cluster_grpc_Echo_upstream_rq_time'
]

# these metrics SHOULD NOT be there based on the filter config
Expand Down Expand Up @@ -164,12 +164,12 @@ def check(self):
stats = self.results[-1].text

metrics = [
'envoy_cluster_grpc_EchoService_0',
'envoy_cluster_grpc_EchoService_13',
'envoy_cluster_grpc_EchoService_request_message_count',
'envoy_cluster_grpc_EchoService_response_message_count',
'envoy_cluster_grpc_EchoService_success',
'envoy_cluster_grpc_EchoService_total',
'envoy_cluster_grpc_Echo_0',
'envoy_cluster_grpc_Echo_13',
'envoy_cluster_grpc_Echo_request_message_count',
'envoy_cluster_grpc_Echo_response_message_count',
'envoy_cluster_grpc_Echo_success',
'envoy_cluster_grpc_Echo_total',
]

# these metrics SHOULD NOT be there based on the filter config
Expand Down
156 changes: 156 additions & 0 deletions python/tests/unit/test_ambassador_module_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,3 +153,159 @@ def test_invalid_set_current_client_cert_details_value():
require_errors(r2["ir"], [
( "ambassador.default.1", "'set_current_client_cert_details' value for key 'subject' may only be 'true' or 'false', not 'invalid'")
])

@pytest.mark.compilertest
def test_valid_grpc_stats_all_methods():
yaml = """
---
apiVersion: getambassador.io/v3alpha1
kind: Module
metadata:
name: ambassador
namespace: default
spec:
config:
grpc_stats:
all_methods: true
"""

cache = Cache(logger)
r1 = Compile(logger, yaml, k8s=True)
r2 = Compile(logger, yaml, k8s=True, cache=cache)

require_no_errors(r1["ir"])
require_no_errors(r2["ir"])

ir = r1["ir"].as_dict()
stats_filters = [f for f in ir["filters"] if f["name"] == "grpc_stats"]
assert len(stats_filters) == 1
assert stats_filters[0]["config"] == {
'enable_upstream_stats': False,
'stats_for_all_methods': True
}


@pytest.mark.compilertest
def test_valid_grpc_stats_services():
yaml = """
---
apiVersion: getambassador.io/v3alpha1
kind: Module
metadata:
name: ambassador
namespace: default
spec:
config:
grpc_stats:
services:
- name: echo.EchoService
method_names: [Echo]
"""

cache = Cache(logger)
r1 = Compile(logger, yaml, k8s=True)
r2 = Compile(logger, yaml, k8s=True, cache=cache)

require_no_errors(r1["ir"])
require_no_errors(r2["ir"])

ir = r1["ir"].as_dict()
stats_filters = [f for f in ir["filters"] if f["name"] == "grpc_stats"]
assert len(stats_filters) == 1
assert stats_filters[0]["config"] == {
'enable_upstream_stats': False,
'individual_method_stats_allowlist': {
'services': [
{
'name': 'echo.EchoService',
'method_names': ['Echo']

}
]
}
}

@pytest.mark.compilertest
def test_valid_grpc_stats_upstream():
yaml = """
---
apiVersion: getambassador.io/v3alpha1
kind: Module
metadata:
name: ambassador
namespace: default
spec:
config:
grpc_stats:
upstream_stats: true
"""

cache = Cache(logger)
r1 = Compile(logger, yaml, k8s=True)
r2 = Compile(logger, yaml, k8s=True, cache=cache)

require_no_errors(r1["ir"])
require_no_errors(r2["ir"])

ir = r1["ir"].as_dict()
stats_filters = [f for f in ir["filters"] if f["name"] == "grpc_stats"]
assert len(stats_filters) == 1
assert stats_filters[0]["config"] == {
'enable_upstream_stats': True,
'stats_for_all_methods': False
}

@pytest.mark.compilertest
def test_invalid_grpc_stats():
yaml = """
---
apiVersion: getambassador.io/v3alpha1
kind: Module
metadata:
name: ambassador
namespace: default
spec:
config:
grpc_stats:
"""

cache = Cache(logger)
r1 = Compile(logger, yaml, k8s=True)
r2 = Compile(logger, yaml, k8s=True, cache=cache)

require_no_errors(r1["ir"])
require_no_errors(r2["ir"])

ir = r1["ir"].as_dict()
stats_filters = [f for f in ir["filters"] if f["name"] == "grpc_stats"]
assert len(stats_filters) == 0


@pytest.mark.compilertest
def test_valid_grpc_stats_empty():
yaml = """
---
apiVersion: getambassador.io/v3alpha1
kind: Module
metadata:
name: ambassador
namespace: default
spec:
config:
grpc_stats: {}
"""

cache = Cache(logger)
r1 = Compile(logger, yaml, k8s=True)
r2 = Compile(logger, yaml, k8s=True, cache=cache)

require_no_errors(r1["ir"])
require_no_errors(r2["ir"])

ir = r1["ir"].as_dict()
stats_filters = [f for f in ir["filters"] if f["name"] == "grpc_stats"]
assert len(stats_filters) == 1
assert stats_filters[0]["config"] == {
'enable_upstream_stats': False,
'stats_for_all_methods': False
}

0 comments on commit c2933af

Please sign in to comment.