-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qaenv_test.go:169: Get "https://www.example.com/": generic_timeout_error #2527
Comments
Uh, I did not notice it initially, but the logs also contain this error:
So, another QUIC based test ( |
So, apparently I need to disable all the new tests I have added. One strange "thing" is that I can also make tests work if I am using different IP addresses for different tests (see ooni/probe-cli@3145e55). I think I should discuss this issue with @kelmenhorst. For now, I think it's ~fine to merge the code with the ASCII ART "WTF" and tests working as intended notwithstanding the need to use different IP addresses to make it work. The issue should still be open, though. |
This diff continues improving and refactoring netemx with the objective of unifying how we create all kind of servers. Here, specifically, we modify the HTTPS server implementing NetStackServerFactory implemented in the previous commit and obtain an HTTP3 server honouring NetStackServerFactory. Crucially, this diff also adds support for overriding the TLS config passed to the server, which enables us to test for expired certificates, self-signed certificates, and so forth. While working on this diff, I noticed a weird behavior with HTTP/3 tests using the same address, which is documented at ooni/probe#2527. I modified the tests to make them pass. To this end, I changed the IP addresses used by HTTP/3 tests to avoid reusing www.example.com's IP address. It seems fine, for now, to merge this code, because HTTP/3 is not a cornerstone of how we measure, for now. But we should investigate further in the future! ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#1803 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
This diff converts the ScenarioRoleWebServer case to using QAEnvOptionNetStack. While there, recognize that ooni/probe#2527 is really making all QUIC tests fragile, and scale them down a bit. Part of ooni/probe#1803
This diff converts the ScenarioRoleWebServer case to using QAEnvOptionNetStack. While there, recognize that ooni/probe#2527 is really making all QUIC tests fragile, and scale them down a bit. ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#1803 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
This diff fixes ooni/probe#2527. The comment I originally wrote in the code to explain the issue read: > I am wondering whether I am not fully understanding how quic-go/quic-go works. It turns out this comment was true. I was assuming http3.Server was closing the listener, however, that was not the case. I noticed this issue when reading again netemx/qaenv.go. So, it all boiled down to PEBCAK 😬.
This diff fixes ooni/probe#2527. The comment I originally wrote in the code to explain the issue read: > I am wondering whether I am not fully understanding how quic-go/quic-go works. It turns out this comment was true. I was assuming http3.Server was closing the listener, however, that was not the case. I noticed this issue when reading again netemx/qaenv.go. So, it all boiled down to PEBCAK 😬. ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#2527 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
Upgrade to [email protected] and adapt closing policy after [changes in [email protected]](https://github.com/quic-go/quic-go/releases/tag/v0.40.0) and quic-go/quic-go#4072 in particular. We're creating an UDP conn and [passing it to server.Serve](https://github.com/ooni/probe-cli/pull/1428/files#diff-efda3daa51e9aed0b3444a327e64b7e5c412938a1fe894a3c850d533179c2425R105), which [calls serveConn](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L242), which [calls quicListen](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L316) and then [ServeListener](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L321). In turn, ServeListener [is interrupted by ErrServerClosed](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L268), which seems to be generated by [server.Close calling Close for each listener](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L657). The following is what happened before updating the shutdown protocol: ``` goroutine 247 [sync.Mutex.Lock]: [...] github.com/quic-go/quic-go.(*Transport).closeServer(0x1400054a000?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:298 +0x90 github.com/quic-go/quic-go.(*baseServer).close.func1() /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:344 +0x84 [...] github.com/quic-go/quic-go.(*baseServer).close(0x14000cc1c10?, {0x102faa7a0?, 0x140002557b0?}, 0xec?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:338 +0x64 github.com/quic-go/quic-go.(*baseServer).Close(...) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:333 github.com/quic-go/quic-go.(*EarlyListener).Close(0x14000120700?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:165 +0x34 github.com/quic-go/quic-go/http3.(*Server).Close(0x140007344d0) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/http3/server.go:657 +0xe8 github.com/ooni/probe-cli/v3/internal/netemx.(*http3Server).Close(0x140001345a0) /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/http3.go:66 +0xe4 github.com/ooni/probe-cli/v3/internal/netemx.(*QAEnv).Close.func1() /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/qaenv.go:291 +0x4c [...] github.com/ooni/probe-cli/v3/internal/netemx.(*QAEnv).Close(0x140001e5570?) /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/qaenv.go:288 +0x48 goroutine 136 [sync.Mutex.Lock]: [...] github.com/quic-go/quic-go.(*baseServer).close(0x14000127200?, {0x102faa7a0?, 0x140001920f0?}, 0xd8?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:338 +0x64 github.com/quic-go/quic-go.(*Transport).close(0x1400054a000, {0x102faa7a0, 0x140001920f0}) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:325 +0x110 github.com/quic-go/quic-go.(*Transport).listen(0x1400054a000, {0x102fbd3a8, 0x14000526108}) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:358 +0x364 created by github.com/quic-go/quic-go.(*Transport).init.func1 /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:242 +0x45c ``` Looking at the above traces, both end up at server.go:338 locking the same sync.Once. It seems the structures are such that attempting to close both the server and the listener leads to self locking in v0.40.{0,1}. The original expectation was that the server closed the connection used for listening anyway, and ooni/probe#2527 documented how that was not the case. It seems that now this is the case, so we can comment out the original ooni/probe#2527 fix without any test hangs. Also, if the original bug was indeed that the server did not own the listener, and considering that now it seems the server owns the listener, it makes sense that the fix for v0.40.1 is to revert the ooni/probe#2527 fix. See ooni/probe#2556
) This diff continues improving and refactoring netemx with the objective of unifying how we create all kind of servers. Here, specifically, we modify the HTTPS server implementing NetStackServerFactory implemented in the previous commit and obtain an HTTP3 server honouring NetStackServerFactory. Crucially, this diff also adds support for overriding the TLS config passed to the server, which enables us to test for expired certificates, self-signed certificates, and so forth. While working on this diff, I noticed a weird behavior with HTTP/3 tests using the same address, which is documented at ooni/probe#2527. I modified the tests to make them pass. To this end, I changed the IP addresses used by HTTP/3 tests to avoid reusing www.example.com's IP address. It seems fine, for now, to merge this code, because HTTP/3 is not a cornerstone of how we measure, for now. But we should investigate further in the future! ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#1803 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
) This diff converts the ScenarioRoleWebServer case to using QAEnvOptionNetStack. While there, recognize that ooni/probe#2527 is really making all QUIC tests fragile, and scale them down a bit. ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#1803 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
This diff fixes ooni/probe#2527. The comment I originally wrote in the code to explain the issue read: > I am wondering whether I am not fully understanding how quic-go/quic-go works. It turns out this comment was true. I was assuming http3.Server was closing the listener, however, that was not the case. I noticed this issue when reading again netemx/qaenv.go. So, it all boiled down to PEBCAK 😬. ## Checklist - [x] I have read the [contribution guidelines](https://github.com/ooni/probe-cli/blob/master/CONTRIBUTING.md) - [x] reference issue for this pull request: ooni/probe#2527 - [x] if you changed anything related to how experiments work and you need to reflect these changes in the ooni/spec repository, please link to the related ooni/spec pull request: N/A - [x] if you changed code inside an experiment, make sure you bump its version number: N/A
Upgrade to [email protected] and adapt closing policy after [changes in [email protected]](https://github.com/quic-go/quic-go/releases/tag/v0.40.0) and quic-go/quic-go#4072 in particular. We're creating an UDP conn and [passing it to server.Serve](https://github.com/ooni/probe-cli/pull/1428/files#diff-efda3daa51e9aed0b3444a327e64b7e5c412938a1fe894a3c850d533179c2425R105), which [calls serveConn](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L242), which [calls quicListen](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L316) and then [ServeListener](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L321). In turn, ServeListener [is interrupted by ErrServerClosed](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L268), which seems to be generated by [server.Close calling Close for each listener](https://github.com/quic-go/quic-go/blob/v0.40.1/http3/server.go#L657). The following is what happened before updating the shutdown protocol: ``` goroutine 247 [sync.Mutex.Lock]: [...] github.com/quic-go/quic-go.(*Transport).closeServer(0x1400054a000?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:298 +0x90 github.com/quic-go/quic-go.(*baseServer).close.func1() /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:344 +0x84 [...] github.com/quic-go/quic-go.(*baseServer).close(0x14000cc1c10?, {0x102faa7a0?, 0x140002557b0?}, 0xec?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:338 +0x64 github.com/quic-go/quic-go.(*baseServer).Close(...) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:333 github.com/quic-go/quic-go.(*EarlyListener).Close(0x14000120700?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:165 +0x34 github.com/quic-go/quic-go/http3.(*Server).Close(0x140007344d0) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/http3/server.go:657 +0xe8 github.com/ooni/probe-cli/v3/internal/netemx.(*http3Server).Close(0x140001345a0) /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/http3.go:66 +0xe4 github.com/ooni/probe-cli/v3/internal/netemx.(*QAEnv).Close.func1() /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/qaenv.go:291 +0x4c [...] github.com/ooni/probe-cli/v3/internal/netemx.(*QAEnv).Close(0x140001e5570?) /Users/sbs/src/github.com/ooni/probe-cli/internal/netemx/qaenv.go:288 +0x48 goroutine 136 [sync.Mutex.Lock]: [...] github.com/quic-go/quic-go.(*baseServer).close(0x14000127200?, {0x102faa7a0?, 0x140001920f0?}, 0xd8?) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/server.go:338 +0x64 github.com/quic-go/quic-go.(*Transport).close(0x1400054a000, {0x102faa7a0, 0x140001920f0}) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:325 +0x110 github.com/quic-go/quic-go.(*Transport).listen(0x1400054a000, {0x102fbd3a8, 0x14000526108}) /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:358 +0x364 created by github.com/quic-go/quic-go.(*Transport).init.func1 /Users/sbs/go/pkg/mod/github.com/quic-go/[email protected]/transport.go:242 +0x45c ``` Looking at the above traces, both end up at server.go:338 locking the same sync.Once. It seems the structures are such that attempting to close both the server and the listener leads to self locking in v0.40.{0,1}. The original expectation was that the server closed the connection used for listening anyway, and ooni/probe#2527 documented how that was not the case. It seems that now this is the case, so we can comment out the original ooni/probe#2527 fix without any test hangs. Also, if the original bug was indeed that the server did not own the listener, and considering that now it seems the server owns the listener, it makes sense that the fix for v0.40.1 is to revert the ooni/probe#2527 fix. See ooni/probe#2556
In ooni/probe-cli#1224, there is a test (
TestQAEnv/we_can_hijack_HTTP3_requests
) that fails withgeneric_timeout_error
when executed along with other tests, but works when run alone. I have read the diff a couple of times and I do not fully understand why this error happens only when run along with other tests. Ideally, one would say that there is some state across calls that create issues. However, I do not see such a state in our code. So, I think probably the error we have here is something subtle. Because having this test WAI is not a priority now, I am going to document the bug and mark the test as one that should be skipped. I will eventually figure out what is wrong here.The text was updated successfully, but these errors were encountered: