Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: Fix quiche test compilation #12898

Merged
merged 16 commits into from
Sep 25, 2020
Merged

Windows: Fix quiche test compilation #12898

merged 16 commits into from
Sep 25, 2020

Conversation

sunjayBhatia
Copy link
Member

Commit Message:
Windows: Fix quiche test compilation

  • Add patch to fix various quiche compilation issues with MSVC
  • Skip uses of GSO as it is not available on Windows
  • Fix syscall result types
  • Fix usage of Edge event trigger
  • move ifdef that avoids a test on non-Linux to ensure test doesn't
    appear to pass in test results
  • quiche integration test is still failing, requires further
    investigation

Additional Description:
Regarding quiche patch:

Risk Level: Low
Testing: Enables quiche tests on Windows, verified locally on Windows
Docs Changes: N/A
Release Notes: N/A
Fixes #10420

++consecutive_pto_count_;
pending_timer_transmission_count_ = max_probe_packets_per_pto_;
return PTO_MODE;
+ default:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the patch can be revised if there is a quiche codebase convention for aborting/panicking etc., this is related to the compilation error:

Execution platform: @local_config_platform//:host
C:\_eb\execroot\envoy\bazel-out\x64_windows-opt\bin\external\com_googlesource_quiche\quiche\quic\core\quic_sent_packet_manager.cc(770) : error C2220: the following warning is treated as an error
C:\_eb\execroot\envoy\bazel-out\x64_windows-opt\bin\external\com_googlesource_quiche\quiche\quic\core\quic_sent_packet_manager.cc(770) : warning C4715: 'quic::QuicSentPacketManager::OnRetransmissionTimeout': not all control paths return a value
Target //test/extensions/quic_listeners/quiche:envoy_quic_utils_test failed to build

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing it in #13066

@mattklein123
Copy link
Member

Assigning over to @danzh2010. Can we potentially just fix these issues upstream?

@danzh2010
Copy link
Contributor

Assigning over to @danzh2010. Can we potentially just fix these issues upstream?

yes, I can fix it in QUICHE

@sunjayBhatia
Copy link
Member Author

Assigning over to @danzh2010. Can we potentially just fix these issues upstream?

yes, I can fix it in QUICHE

Thanks @danzh2010 whenever your changes are ready we can drop the quiche patch and bump the dependency

@wrowe
Copy link
Contributor

wrowe commented Sep 8, 2020

Hope you had a good holiday weekend @danzh2010 - just a weekly ping to check in on this issue and unblock us. (Let us know if it will take a few weeks, and we can always patch and revert later once a fix lands upstream.)

# GSO not supported on Windows
"//bazel:windows_x86_64": [],
"//conditions:default": [":udp_gso_batch_writer_lib"],
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 it isn't at all apparent to me why this is still required after the last quiche refactoring merged to this patch, but dropping it does break compilation on linux release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the select is still needed.

deps = [
":quic_test_utils_for_envoy_lib",
":test_utils_lib",
"//source/extensions/quic_listeners/quiche:active_quic_listener_config_lib",
"//source/extensions/quic_listeners/quiche:active_quic_listener_lib",
"//source/extensions/quic_listeners/quiche:envoy_quic_utils_lib",
"//source/extensions/quic_listeners/quiche:udp_gso_batch_writer_lib",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 should I try reverting the block above in source/e/q/q/BUILD and also guard this in a select? Or is there a better component to target now in this deps list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try reverting the changes in active_quic_listener_lib and also remove this line here?

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope you had a good holiday weekend @danzh2010 - just a weekly ping to check in on this issue and unblock us. (Let us know if it will take a few weeks, and we can always patch and revert later once a fix lands upstream.)

Sorry for late reply, I was OOO till today. I'm working on the upstream fix now. It's not gonna take a few weeks.

# GSO not supported on Windows
"//bazel:windows_x86_64": [],
"//conditions:default": [":udp_gso_batch_writer_lib"],
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the select is still needed.

deps = [
":quic_test_utils_for_envoy_lib",
":test_utils_lib",
"//source/extensions/quic_listeners/quiche:active_quic_listener_config_lib",
"//source/extensions/quic_listeners/quiche:active_quic_listener_lib",
"//source/extensions/quic_listeners/quiche:envoy_quic_utils_lib",
"//source/extensions/quic_listeners/quiche:udp_gso_batch_writer_lib",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try reverting the changes in active_quic_listener_lib and also remove this line here?

@wrowe
Copy link
Contributor

wrowe commented Sep 10, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Cannot retry non-completed check: envoy-presubmit (Linux-x64 compile_time_options), please wait.

🐱

Caused by: a #12898 (comment) was created by @wrowe.

see: more, trace.

@wrowe
Copy link
Contributor

wrowe commented Sep 10, 2020

@danzh2010 Looks like this patch is back in shape with master, if you could take a look at what remains of the initial delta. The net effect is

Tests bypassed as skip_on_windows: 18 known unbuildable or inapplicable tests (Previously 29)
Tests bypassed as fails_on_windows: 13 known incompatible tests (Addition of 1 formerly unbuildable, now failing quic test)
Tests bypassed as flaky_on_windows: 20 known unstable tests

@wrowe
Copy link
Contributor

wrowe commented Sep 11, 2020

@danzh2010 my efforts to pick up the new work at quiche are proving fruitless, I expect we need you to bump and then we can clean up this PR and get the tests running. Dinking with bazel/repository_locations.bzl;

     com_googlesource_quiche = dict(

...

+        # 2020-09-09: Pick up fix for Windows compilation errors
+        version = "7f547143b82d4ea8fc34161a0c014f655449c1f5",
+        sha256 = "f8e7ee8d31f952f709785a9c9f5921fd2efa7f6e64ad4ae0565df51f50a91dee",

I experimented with multiple hacks to test this.

Of the list below, the version isn't recognized at storage.googleapis.com, on quiche.googlesource.com the sha256 changes every blinking time it gets downloaded (generated on the fly with the current timestamp perhaps?) and finally, from the third mirror/fork below...

+        urls = [
+            # "https://storage.googleapis.com/quiche-envoy-integration/{version}.tar.gz",
+            # "https://quiche.googlesource.com/quiche/+archive/{version}.tar.gz",
+            "https://github.com/flyaways/quiche/archive/{version}.tar.gz",
+        ],

the branch's quiche.BUILD appears broken or out-of-sync;

ERROR: C:/_eb/external/com_googlesource_quiche/BUILD.bazel:1277:14: @com_googlesource_quiche//:quic_core_proto_source_address_token_proto: missing input file 'external/com_googlesource_quiche/quiche/quic/core/proto/cached_network_parameters.proto', owner: '@com_googlesource_quiche//:quiche/quic/core/proto/cached_network_parameters.proto'
ERROR: C:/_eb/external/com_googlesource_quiche/BUILD.bazel:1877:17 1 input file(s) do not exist

and haven't been able to work out precisely what broke here.

@wrowe
Copy link
Contributor

wrowe commented Sep 15, 2020

We are on hold for #13066 to be fixed in gcc and merged, at which point this patch should become simpler.

/wait

- Skip uses of GSO as it is not available on Windows
- Fix syscall result types
- Fix usage of Edge event trigger
- move ifdef that avoids a test on non-Linux to ensure test doesn't
  appear to pass in test results
- quiche integration test is still failing, requires further
  investigation

Signed-off-by: Sunjay Bhatia <[email protected]>
Signed-off-by: William A Rowe Jr <[email protected]>
@wrowe
Copy link
Contributor

wrowe commented Sep 19, 2020

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Check envoy-presubmit didn't fail.

🐱

Caused by: a #12898 (comment) was created by @wrowe.

see: more, trace.

@wrowe
Copy link
Contributor

wrowe commented Sep 19, 2020

/retest-circle

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: api (failed build)

🐱

Caused by: a #12898 (comment) was created by @wrowe.

see: more, trace.

@wrowe wrowe requested a review from danzh2010 September 19, 2020 01:46
@wrowe
Copy link
Contributor

wrowe commented Sep 19, 2020

@danzh2010 after all of your efforts, the proposal is reduced to this to bring two handfuls of skipped tests back to action on Windows. I'd like you to do a first-pass over the remaining bits, I've force-merged a rebase to avoid the confusion of all the small tweaks we had had to apply. If you can give it a pass we can drop the skipped on windows by a third.

wrowe
wrowe previously approved these changes Sep 19, 2020
Copy link
Contributor

@wrowe wrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I did work on this personally with sunjay so we would both appreciate more eyeballs, the resulting state of this PR looks great to me.

sunjayBhatia and others added 2 commits September 22, 2020 15:54
Signed-off-by: Sunjay Bhatia <[email protected]>
Co-authored-by: William A Rowe Jr <[email protected]>
…tests-compilation

Co-authored-by: Sunjay Bhatia <[email protected]>
Signed-off-by: William A Rowe Jr <[email protected]>
@wrowe
Copy link
Contributor

wrowe commented Sep 22, 2020

@ggreenway and @danzh2010 can you review and confirm our approach for getting quiche up on Windows, LGTM? Would like to accelerate this particular patch since it sits ahead of toggling clang-cl (we've already validated this quiche logic against that bump.)

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an alternate suggestion for all the pragmas to disable warnings: Could we create quiche_include_pre.h and quiche_include_post.h, and bracket those around quiche includes, and put all the warning-disable flags, and compiler ifdefs for warning-disable flags, into the pre and post, so we don't duplicate all that junk in every file?

Network::UdpPacketWriterPtr udp_packet_writer =
std::make_unique<Quic::UdpGsoBatchWriter>(io_handle, scope);
std::make_unique<Network::UdpDefaultWriter>(io_handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this change will go away after #13086 merges.

@danzh2010 what was the reason for using the GSO batch writer in this test instead of the default writer? Coverage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait for the other PR to merge and then come back to this. It should merge soon.

/wait

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this change will go away after #13086 merges.

@danzh2010 what was the reason for using the GSO batch writer in this test instead of the default writer? Coverage?

Just to test more the interaction between the batch writer and QUIC listener because mostly likely the batch writer will be used in production.

@wrowe
Copy link
Contributor

wrowe commented Sep 22, 2020

I have an alternate suggestion for all the pragmas to disable warnings: Could we create quiche_include_pre.h and quiche_include_post.h, and bracket those around quiche includes, and put all the warning-disable flags, and compiler ifdefs for warning-disable flags, into the pre and post, so we don't duplicate all that junk in every file?

I would not do that. precisely.... rather have a quiche_core_includes.h header that handles the 80/20 of the most commonly needed quiche headers by many sources, and handles the pragmas and inclusions.

The other rare exceptions can be handled by the sources which consume them.

The actual pragmas warnings ignored required do vary by included header and compiler (including clang/gcc differences which are glossed over at present.)

@ggreenway
Copy link
Contributor

I have an alternate suggestion for all the pragmas to disable warnings: Could we create quiche_include_pre.h and quiche_include_post.h, and bracket those around quiche includes, and put all the warning-disable flags, and compiler ifdefs for warning-disable flags, into the pre and post, so we don't duplicate all that junk in every file?

I would not do that. precisely.... rather have a quiche_core_includes.h header that handles the 80/20 of the most commonly needed quiche headers by many sources, and handles the pragmas and inclusions.

The other rare exceptions can be handled by the sources which consume them.

The actual pragmas warnings ignored required do vary by included header and compiler (including clang/gcc differences which are glossed over at present.)

For compiling external-to-envoy source, I'd say we can just disable all the warnings that occur in any quiche header, for every quiche header.

Copy link
Contributor

@danzh2010 danzh2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danzh2010 I'm not sure where you would like these notes recorded, but this is feedback of the latest effort to upstream problems we are hitting with quiche;

We still have to disable -wd4068, because of unguarded #pragma GCC (originally, clang) occurring in quiche external prototype headers. But that isn't a big problem if we modify quiche.genrule_cmd to do the right thing with respect to clang, vs gcc, vs msvc, and I'm adding a note for bazel_internal.bzl to that effect.

This patch now includes many commits just to undo the #pragma GCC headache that existed at envoy. The guards could be more carefully focused as needed, by checking the GCC version, use clang or msvc equivalents where appropriate, and even be fancier about it if we want; https://stackoverflow.com/questions/3378560/how-to-disable-gcc-warnings-for-a-few-lines-of-code - this is just a straightforward stopgap.

Globally from bazel/external/quiche.BUILD, this is a newer addition to gcc since (Ubuntu 9.3.0-11ubuntu0~18.04.1) 9.3.0;;

cc1plus: error: unrecognized command line option '-Wno-range-loop-analysis' [-Werror]
cc1plus: all warnings being treated as errors

It seems we need a feature detection here and dodge specific gcc flavors.

We have this compilation failure on gcc (Ubuntu 9.3.0-11ubuntu0~18.04.1) 9.3.0;

bazel-out/k8-fastbuild/bin/external/com_googlesource_quiche/quiche/quic/core/quic_framer.cc: In member function 'bool quic::QuicFramer::AppendIetfAckFrameAndTypeByte(const quic::QuicAckFrame&, quic::QuicDataWriter*)':
bazel-out/k8-fastbuild/bin/external/com_googlesource_quiche/quiche/quic/core/quic_framer.cc:5546:61: error: comparison of integer expressions of different signedness: 'size_t' {aka 'long unsigned int'} and 'int' [-Werror=sign-compare]
 5546 |         static_cast<size_t>(writer->remaining() - ecn_size) <
      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 5547 |             QuicDataWriter::GetVarInt62Len(gap) +
      |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 5548 |                 QuicDataWriter::GetVarInt62Len(ack_range)) {
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Perhaps a judicious -Wno-sign-compare would work around this, but that's just a hack. What about ssize_t rather than of size_t? I'm curious why only we see it on gcc default bazel build, and it isn't observed in CI.

Fastbuild does not set _DEBUG. It also does not set NDEBUG; examining this error on windows msvc-cl;

ERROR: C:/_eb/external/com_googlesource_quiche/BUILD.bazel:2192:17: C++ compilation of rule '@com_googlesource_quiche//:quic_core_frames_frames_lib' failed (Exit 2)
C:\_eb\execroot\envoy\bazel-out\x64_windows-fastbuild\bin\external\com_googlesource_quiche\quiche/quic/core/frames/quic_frame.h(99): error C2926: 'quic::QuicFrame::<unnamed-tag>::delete_forbidden': a default member initializer is not allowed for a member of an anonymous struct within a union

It isn't clear if this defect is specific to Windows (due to the previous error on gcc), but by papering over this in quiche.BUILD with a feature define to force QUIC_FRAME_DEBUG=0 got us past this new problem.

I will fix these 3 issues in upstream. As to #pragma GCC override, I don't have strong preference in either way, they all seems to be one time change and eventually, QUICHE will be in a state that we don't need them.

tags = [
"fails_on_windows",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious which test fails on Windows. Can you file an issue about the test failure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup (actually, now two failures since the last merge) #13273

Network::UdpPacketWriterPtr udp_packet_writer =
std::make_unique<Quic::UdpGsoBatchWriter>(io_handle, scope);
std::make_unique<Network::UdpDefaultWriter>(io_handle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The need for this change will go away after #13086 merges.

@danzh2010 what was the reason for using the GSO batch writer in this test instead of the default writer? Coverage?

Just to test more the interaction between the batch writer and QUIC listener because mostly likely the batch writer will be used in production.

@wrowe
Copy link
Contributor

wrowe commented Sep 24, 2020

I will fix these 3 issues in upstream. As to #pragma GCC override, I don't have strong preference in either way, they all seems to be one time change and eventually, QUICHE will be in a state that we don't need them.

That's awesome news, thank you. Lets please still land this patch in the interim? I'd rather we didn't miss the 1.16 window due to upstream/elsewhere issues.

@ggreenway I think you might agree that there is zero return for investing more cycles at refactoring the existing pragma GCC mess, if we expect to eliminate them wholesale as the quiche project catches up with the language standards?

@ggreenway
Copy link
Contributor

I will fix these 3 issues in upstream. As to #pragma GCC override, I don't have strong preference in either way, they all seems to be one time change and eventually, QUICHE will be in a state that we don't need them.

That's awesome news, thank you. Lets please still land this patch in the interim? I'd rather we didn't miss the 1.16 window due to upstream/elsewhere issues.

@ggreenway I think you might agree that there is zero return for investing more cycles at refactoring the existing pragma GCC mess, if we expect to eliminate them wholesale as the quiche project catches up with the language standards?

Makes sense. Let's go with the pragma changes in this PR; we can revisit later if the situation changes.

Signed-off-by: William A Rowe Jr <[email protected]>
wrowe and others added 4 commits September 25, 2020 09:25
Signed-off-by: Sunjay Bhatia <[email protected]>
Signed-off-by: William A Rowe Jr <[email protected]>
Co-authored-by: Sunjay Bhatia <[email protected]>
Co-authored-by: William A Rowe Jr <[email protected]>
Signed-off-by: William A Rowe Jr <[email protected]>
Co-authored-by: Sunjay Bhatia <[email protected]>
- Mark 2 tests failing
  - one due to expecting buffered Gso style support
  - expect that the integration test is failing for a common defect
    between other failing integration tests
  - not regressions of this PR
- explicitly discard result in test

Signed-off-by: Sunjay Bhatia <[email protected]>
Co-authored-by: William A Rowe Jr <[email protected]>
@wrowe
Copy link
Contributor

wrowe commented Sep 25, 2020

@mattklein123 I think this satisfies all your questions and concerns; the ASAN failure is our shared CI issue, not a regression.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Let's ship and iterate to avoid future breakage.

@mattklein123 mattklein123 merged commit 5a87f1e into envoyproxy:master Sep 25, 2020
@wrowe wrowe deleted the windows-fix-quiche-tests-compilation branch September 26, 2020 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MSVC Compilation errors with current quiche implementation
5 participants