Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples: add an example for tls inspector #14665

Merged
merged 1 commit into from
Apr 1, 2021

Conversation

LuyaoZhong
Copy link
Contributor

Provide a sandboxe using Docker Compose that set up environments to
test out Envoy’s tls inspector feature and show sample configurations.

Signed-off-by: Luyao Zhong [email protected]

Commit Message: Provide a sandboxe using Docker Compose that set up environments to test out Envoy’s tls inspector feature and show sample configurations.
Additional Description: N/A
Risk Level: Low
Testing: manual testing
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

@repokitteh-read-only
Copy link

Hi @LuyaoZhong, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #14665 was opened by LuyaoZhong.

see: more, trace.

@htuch htuch assigned phlax and lizan Jan 12, 2021
@phlax
Copy link
Member

phlax commented Jan 13, 2021

hi @LuyaoZhong reading through the documentation here - https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector - im thinking that for this sandbox to be useful it requires some documentation showing what the tls inspector adds.

is that something you would be able to do ?

@LuyaoZhong
Copy link
Contributor Author

LuyaoZhong commented Jan 14, 2021

@phlax Sure, I can add some documentation. Do you mean updating the doc you mentioned is enough? I just noticed other docs https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/.

Besides, I found there is short of sandboxes for other listener filters. Is there any special reasons? If no, I can help add some of them after I figure out how they work.(I am a newbie in Envoy). :D

@phlax
Copy link
Member

phlax commented Jan 14, 2021

Do you mean updating the doc you mentioned is enough?

no, i more meant that looking at this page, i could see that the inspector has some features that could be documented in a sandbox

Besides, I found there is short of sandboxes for other listener filters. Is there any special reasons?

other than having time to implement/maintain, no.

the only other constraint is the amount of time the sandboxes take in CI

I can help add some of them after I figure out how they work

sandboxes demonstrating other filters or techniques would definitely be welcome where they are not covered already.

in terms of progressing this PR take a look at the existing sandbox docs (docs/root/start/sandboxes in repo)

i think in this case its important to showcase the admin interface (iiuc that is what the inspector adds)

the docs are written in .rst format, and you can view the docs rendered in CI (for this PR) with the URL:

https://storage.googleapis.com/envoy-pr/14665/docs/index.html

the other thing to work on is the verify script (which you added already)

see here for info on how that should work:

https://github.com/envoyproxy/envoy/blob/master/examples/DEVELOPER.md

Base automatically changed from master to main January 15, 2021 23:02
@repokitteh-read-only
Copy link

🙀 Error while processing event:

evaluation error
error: context deadline exceeded
🐱

Caused by: #14665 was edited by mattklein123.

see: more, trace.

lizan pushed a commit that referenced this pull request Jan 27, 2021
…tions and timeouts (#14731)

Update the flaky test script to print more details and detect flaky, unexpected test errors like exceptions and timeouts, with the goal of making the notifications more actionable.

Signed-off-by: Randy Miller <[email protected]>

Additional Description:
The flaky test script, ci/flaky/process_xml.py, is executed on every CI run, delivering a notification to the Slack channel "test-flaky" if there are any flaky test failures.  Those notifications aren't as useful as they could be though, for a number of reasons:
 1) Direct links to the CI run, the related commit, and the related PR are not included.
 2) There's no indication of which stage or job experienced the flake.
 3) The notifications are not uniformly formatted, so they can be a bit hard to read.
 4) Some notifications do not include any information about the flake(s).

The goal of this PR is to make these flaky test notifications more actionable by addressing the 4 bullets above.  Below is what a notification would look like should this PR get merged.  The last 2 flakes are not captured at all today by the current state of the script, as those flakes are unexpected test "errors" (eg, exceptions or timeouts) rather than test "failures" (eg, test assert failed).

```
Target:         bazel.release
Stage:          Windows release
Pull request:   #14665
Commmit:        f1184f2
CI results:     https://dev.azure.com/cncf/envoy/_build/results?buildId=63454

Origin:         https://github.com/rmiller14/envoy
Upstream:       https://github.com/envoyproxy/envoy
Latest ref:     heads/flaky_test_script

Last commit:
        commit f1184f2d74d052942f7484beecf98d7cfde137e0
        Author: Randy Miller <[email protected]>
        Date:   Fri Jan 15 00:58:51 2021 -0800

            Update flaky test script to print more actionable details as well as detect flaky, unexpected test errors like exceptions and timeouts.

            Signed-off-by: Randy Miller <[email protected]>

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   IpVersions/DnsImplTest
- Test case:    LocalLookup/IPv4
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/common/network/dns_impl_test/test_attempts/attempt_1.log
- Details:
        test/common/network/dns_impl_test.cc:609
        Expected equality of these values:
          nullptr
            Which is: NULL
          resolveWithExpectations("localhost", DnsLookupFamily::V4Only, DnsResolver::ResolutionStatus::Success, {"127.0.0.1"}, {"::1"}, absl::nullopt)
            Which is: 0000017500F82190
        Stack trace:
          00007FF69688B586: (unknown)
          00007FF6968A4468: (unknown)
          00007FF6968A462D: (unknown)
          00007FF6968A508D: (unknown)
        ... Google Test internal frames ...

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   ThriftConnManagerIntegrationTest
- Test case:    IDLException/HeaderCompact
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/extensions/filters/network/thrift_proxy/integration_test/shard_1_of_4/test_attempts/attempt_1.log
- Error:        Exited with error code 3 (No such process)
- Relevant snippet:
        Traceback (most recent call last):
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\envoy\test\extensions\filters\network\thrift_proxy\driver\server.py", line 232, in <module>
            main(cfg)
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\envoy\test\extensions\filters\network\thrift_proxy\driver\server.py", line 175, in main
            server.serve()
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\thrift_pip3_pypi__thrift_0_13_0\thrift\server\TServer.py", line 121, in serve
            self.serverTransport.listen()
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\thrift_pip3_pypi__thrift_0_13_0\thrift\transport\TSocket.py", line 208, in listen
            self.handle.bind(res[4])
        OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions
        Could not connect to any of [('127.0.0.1', 50670)]
        Unhandled Thrift Exception: Could not connect to any of [('127.0.0.1', 50670)]
        C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh: line 1: kill: (1819) - No such process
        Failed bash -c "PYTHONPATH=$(dirname C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh) C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh idl-exception header compact -H x-header-1=x-value-1,x-header-2=0.6,x-header-3=150,x-header-4=user_id:10,x-header-5=garbage_asdf -T C:/_eb/execroot/envoy/_tmp/2540819d34883b5a5d1e62d549fbcdeb execute "
        [2021-01-15 11:35:27.958][5624][critical][assert] [test/test_common/environment.cc:414] assert failure: false.

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   TcpProxyIntegrationTest
- Test case:    TestCloseOnHealthFailure/IPv6_OriginalConnPool
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/integration/tcp_proxy_integration_test/shard_1_of_2/test_attempts/attempt_1.log
- Error:        Exited with error code 142 (Unknown error)
- Note:         This error is likely a timeout (test duration == 300, a well known timeout value).
- Last 1 line(s):
        [ RUN      ] TcpProxyIntegrationTestParams/TcpProxyIntegrationTest.TestCloseOnHealthFailure/IPv6_OriginalConnPool

---------------------------------------------------------------------------------------------------
```


Risk Level: N/A for code/test, low for the flaky test script due to the amount of churn.

Testing: Ran locally many, many times.  For a portion of those runs, I treated normal failures as flakes to get better coverage on the parsing helpers.  Not sure how to test the changes to bazel.yml though.

Signed-off-by: Randy Miller <[email protected]>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Feb 22, 2021
@phlax
Copy link
Member

phlax commented Feb 27, 2021

@LuyaoZhong did you manage to get anywhere with docs ?

I can possibly help out on that if necessary

checking the existing docs for the inspector

https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector

...there is not much in the way of example configuration, so this is something which might be useful to update too.

there is an example of this filter being used in the existing tls-sni sandbox tho

@github-actions github-actions bot removed the stale stalebot believes this issue/PR has not been touched recently label Feb 27, 2021
@LuyaoZhong
Copy link
Contributor Author

@phlax Sorry for the delay response. I had some emergent work. I'll update soon, possibly in next few days.

@phlax
Copy link
Member

phlax commented Feb 27, 2021

cool - checking further - it seems like this filter doesnt have any config;

https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/listener/tls_inspector/v3/tls_inspector.proto

mostly i guess we want to just demonstrate how it can be used with a filter chain match and also the admin stats that are exposed

@LuyaoZhong
Copy link
Contributor Author

@phlax yeah, I check the tls inspector usage, besides sni , it can also be used to select filter chain via application_protocols and transport_protocol, I'm thinking of demonstrating them in the single sandbox. And for admin stats do you think we need another example sandbox?

@phlax
Copy link
Member

phlax commented Mar 2, 2021

I'm thinking of demonstrating them in the single sandbox. And for admin stats do you think we need another example sandbox?

all in one sandbox should be good - with the admin stats it will be enough to show how they are affected by the actions in the sandbox i think

Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @LuyaoZhong thanks for updating this

ive left a couple of small nits, and i think it should be ready to land

docs/root/start/sandboxes/tls-inspector.rst Outdated Show resolved Hide resolved
docs/root/start/sandboxes/tls-inspector.rst Outdated Show resolved Hide resolved
Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of small nits and then i think it really is ready @LuyaoZhong

examples/tls-inspector/verify.sh Outdated Show resolved Hide resolved
docs/root/start/sandboxes/tls-inspector.rst Outdated Show resolved Hide resolved
docs/root/start/sandboxes/tls-inspector.rst Outdated Show resolved Hide resolved
docs/root/start/sandboxes/tls-inspector.rst Outdated Show resolved Hide resolved
Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant work @LuyaoZhong - thank you so much

ive left one small nit - apologies for not thinking of it before

cc @mattklein123 this should be ready for final review i think

# shellcheck source=examples/verify-common.sh
. "$(dirname "${BASH_SOURCE[0]}")/../verify-common.sh"

run_log "Test tls inspector"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this to something like

 run_log "Curl tls inspector: HTTPS -> HTTP/1.1"

and then also do similar before each of the following commands

it means that if the test fails in CI its much easier to find exactly which command failed

it also makes it more useful running and seeing what is being tested

(apologies for last minute nits!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, don't worry about that. I'm glad to make it looks better, thanks for your comments! 😄

Provide a sandboxe using Docker Compose that set up environments to
test out Envoy’s tls inspector feature and show sample configurations.

Signed-off-by: Luyao Zhong <[email protected]>
Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, great work, thanks again @LuyaoZhong

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making our documentation better!

@mattklein123 mattklein123 merged commit 29a042d into envoyproxy:main Apr 1, 2021
@LuyaoZhong LuyaoZhong deleted the tls-inspector branch April 2, 2021 01:08
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this pull request Oct 15, 2021
…tions and timeouts (#14731)

Update the flaky test script to print more details and detect flaky, unexpected test errors like exceptions and timeouts, with the goal of making the notifications more actionable.

Signed-off-by: Randy Miller <[email protected]>

Additional Description:
The flaky test script, ci/flaky/process_xml.py, is executed on every CI run, delivering a notification to the Slack channel "test-flaky" if there are any flaky test failures.  Those notifications aren't as useful as they could be though, for a number of reasons:
 1) Direct links to the CI run, the related commit, and the related PR are not included.
 2) There's no indication of which stage or job experienced the flake.
 3) The notifications are not uniformly formatted, so they can be a bit hard to read.
 4) Some notifications do not include any information about the flake(s).

The goal of this PR is to make these flaky test notifications more actionable by addressing the 4 bullets above.  Below is what a notification would look like should this PR get merged.  The last 2 flakes are not captured at all today by the current state of the script, as those flakes are unexpected test "errors" (eg, exceptions or timeouts) rather than test "failures" (eg, test assert failed).

```
Target:         bazel.release
Stage:          Windows release
Pull request:   envoyproxy/envoy#14665
Commmit:        envoyproxy/envoy@f1184f2
CI results:     https://dev.azure.com/cncf/envoy/_build/results?buildId=63454

Origin:         https://github.com/rmiller14/envoy
Upstream:       https://github.com/envoyproxy/envoy
Latest ref:     heads/flaky_test_script

Last commit:
        commit f1184f2d74d052942f7484beecf98d7cfde137e0
        Author: Randy Miller <[email protected]>
        Date:   Fri Jan 15 00:58:51 2021 -0800

            Update flaky test script to print more actionable details as well as detect flaky, unexpected test errors like exceptions and timeouts.

            Signed-off-by: Randy Miller <[email protected]>

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   IpVersions/DnsImplTest
- Test case:    LocalLookup/IPv4
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/common/network/dns_impl_test/test_attempts/attempt_1.log
- Details:
        test/common/network/dns_impl_test.cc:609
        Expected equality of these values:
          nullptr
            Which is: NULL
          resolveWithExpectations("localhost", DnsLookupFamily::V4Only, DnsResolver::ResolutionStatus::Success, {"127.0.0.1"}, {"::1"}, absl::nullopt)
            Which is: 0000017500F82190
        Stack trace:
          00007FF69688B586: (unknown)
          00007FF6968A4468: (unknown)
          00007FF6968A462D: (unknown)
          00007FF6968A508D: (unknown)
        ... Google Test internal frames ...

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   ThriftConnManagerIntegrationTest
- Test case:    IDLException/HeaderCompact
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/extensions/filters/network/thrift_proxy/integration_test/shard_1_of_4/test_attempts/attempt_1.log
- Error:        Exited with error code 3 (No such process)
- Relevant snippet:
        Traceback (most recent call last):
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\envoy\test\extensions\filters\network\thrift_proxy\driver\server.py", line 232, in <module>
            main(cfg)
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\envoy\test\extensions\filters\network\thrift_proxy\driver\server.py", line 175, in main
            server.serve()
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\thrift_pip3_pypi__thrift_0_13_0\thrift\server\TServer.py", line 121, in serve
            self.serverTransport.listen()
          File "\\?\C:\Windows\TEMP\Bazel.runfiles_3v_lc0rq\runfiles\thrift_pip3_pypi__thrift_0_13_0\thrift\transport\TSocket.py", line 208, in listen
            self.handle.bind(res[4])
        OSError: [WinError 10013] An attempt was made to access a socket in a way forbidden by its access permissions
        Could not connect to any of [('127.0.0.1', 50670)]
        Unhandled Thrift Exception: Could not connect to any of [('127.0.0.1', 50670)]
        C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh: line 1: kill: (1819) - No such process
        Failed bash -c "PYTHONPATH=$(dirname C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh) C:/envoy/test/extensions/filters/network/thrift_proxy/driver/generate_fixture.sh idl-exception header compact -H x-header-1=x-value-1,x-header-2=0.6,x-header-3=150,x-header-4=user_id:10,x-header-5=garbage_asdf -T C:/_eb/execroot/envoy/_tmp/2540819d34883b5a5d1e62d549fbcdeb execute "
        [2021-01-15 11:35:27.958][5624][critical][assert] [test/test_common/environment.cc:414] assert failure: false.

---------------------------------------------------------------------------------------------------

Test flake details:
- Test suite:   TcpProxyIntegrationTest
- Test case:    TestCloseOnHealthFailure/IPv6_OriginalConnPool
- Log path:     C:/_eb/_bazel_LocalAdmin/sonr4fdz/external/envoy/bazel-testlogs/test/integration/tcp_proxy_integration_test/shard_1_of_2/test_attempts/attempt_1.log
- Error:        Exited with error code 142 (Unknown error)
- Note:         This error is likely a timeout (test duration == 300, a well known timeout value).
- Last 1 line(s):
        [ RUN      ] TcpProxyIntegrationTestParams/TcpProxyIntegrationTest.TestCloseOnHealthFailure/IPv6_OriginalConnPool

---------------------------------------------------------------------------------------------------
```


Risk Level: N/A for code/test, low for the flaky test script due to the amount of churn.

Testing: Ran locally many, many times.  For a portion of those runs, I treated normal failures as flakes to get better coverage on the parsing helpers.  Not sure how to test the changes to bazel.yml though.

Signed-off-by: Randy Miller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants