Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test-http2-reset-flood #29802

Closed
sam-github opened this issue Oct 1, 2019 · 12 comments
Closed

flaky test-http2-reset-flood #29802

sam-github opened this issue Oct 1, 2019 · 12 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem.

Comments

@sam-github
Copy link
Contributor

  • Version: master
  • Platform: freebsd
  • Subsystem: http2
Stacktrace
<--- Last few GCs --->

[26169:0x80448acc0]    25362 ms: Scavenge 989.7 (993.9) -> 988.7 (998.4) MB, 5.8 / 0.0 ms  (average mu = 0.162, current mu = 0.042) allocation failure
[26169:0x80448acc0]    25385 ms: Scavenge 992.9 (999.2) -> 990.7 (1000.2) MB, 7.9 / 0.0 ms  (average mu = 0.162, current mu = 0.042) allocation failure
[26169:0x80448acc0]    25410 ms: Scavenge 994.1 (1000.2) -> 992.2 (1001.7) MB, 11.1 / 0.0 ms  (average mu = 0.162, current mu = 0.042) allocation failure


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1281d39]
Security context: 0x2031aba008a1 <JSObject>
    1: valueOf [0x2031aba02591](this=0x16fe6ced18a9 <JSArray[1]>)
    2: writeRequests(aka writeRequests) [0x29a6852d0881] [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/test/parallel/test-http2-reset-flood.js:~63] [pc=0x3b2f916da4bd](this=0x3d037bd404a9 <undefined>)
    3: processTicksAndRejections [0x130a019c4e21] [internal/process...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x9a7e30 node::Abort(void) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 2: 0x9a7fa2 node::OnFatalError(char const*, char const) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 3: 0xad76a7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 4: 0xc4edd5 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 5: 0xc50613 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 6: 0xc4d16c v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 7: 0xc4b01e v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 8: 0xc4a00e v8::internal::Heap::HandleGCRequest(void) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
 9: 0xc11275 v8::internal::StackGuard::HandleInterrupts(void) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]
10: 0xf52f28 v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [/usr/home/iojs/build/workspace/node-test-commit-freebsd/nodes/freebsd11-x64/out/Release/node]

From #29727, which seems wholly unrelated, since it changes the link flags on linux, and this is a freebsd failure.

@nodejs/platform-freebsd

@sam-github sam-github added freebsd Issues and PRs related to the FreeBSD platform. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Oct 1, 2019
MylesBorins added a commit to MylesBorins/node that referenced this issue Apr 1, 2020
This test is somewhat regularly flaky on 12.x and it appears master as
well. There appears to be an issue tracking flakes for 6+ months. Seems
reasonable to mark this to keep us from having red CI.

Refs: nodejs#29802
MylesBorins added a commit that referenced this issue Apr 1, 2020
This test is somewhat regularly flaky on 12.x and it appears master as
well. There appears to be an issue tracking flakes for 6+ months. Seems
reasonable to mark this to keep us from having red CI.

Refs: #29802

PR-URL: #32595
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins added a commit that referenced this issue Apr 1, 2020
This test is somewhat regularly flaky on 12.x and it appears master as
well. There appears to be an issue tracking flakes for 6+ months. Seems
reasonable to mark this to keep us from having red CI.

Refs: #29802

PR-URL: #32595
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@MylesBorins
Copy link
Contributor

MylesBorins commented Apr 1, 2020

We are seeing this as flaky on 12.x on a bunch of other linux distros as well

https://ci.nodejs.org/job/node-test-commit/37126/

Not sure what we should do about this

@MylesBorins
Copy link
Contributor

@addaleax also seeing this fail on windows now too (all on 12.x).

Do you think this is a real failure? Should we mark this flaky across all release lines? Expand scope of this issue?

@MylesBorins
Copy link
Contributor

Here is an example of it failing across almost every platform

https://ci.nodejs.org/job/node-test-commit/37128/

@addaleax
Copy link
Member

addaleax commented Apr 1, 2020

@MylesBorins The v12.x failures seem very different from the failure on master. I could reproduce the v12.x failures locally on Linux. This patch fixes it for me and shouldn’t affect the test’s functionaliy:

diff --git a/test/parallel/test-http2-reset-flood.js b/test/parallel/test-http2-reset-flood.js
index 7ee534cf4faa..d721384ab374 100644
--- a/test/parallel/test-http2-reset-flood.js
+++ b/test/parallel/test-http2-reset-flood.js
@@ -69,9 +69,10 @@ const worker = new Worker(__filename).on('message', common.mustCall((port) => {
       h2header.writeIntBE(streamId, 5, 4);  // Stream ID
       streamId += 2;
       // 0x88 = :status: 200
-      conn.write(Buffer.concat([h2header, Buffer.from([0x88])]));
+      if (conn.writable)
+        conn.write(Buffer.concat([h2header, Buffer.from([0x88])]));
     }
-    if (!gotError)
+    if (conn.writable && !gotError)
       setImmediate(writeRequests);
   }
 

I’m not sure why it’s not needed on master but it’s probably because @ronag put a lot of work into streams finally making sense.

Either way, feel free to take that patch and open a PR with it 🙂

MylesBorins pushed a commit to MylesBorins/node that referenced this issue Apr 2, 2020
MylesBorins pushed a commit that referenced this issue Apr 2, 2020
BethGriggs pushed a commit that referenced this issue Apr 7, 2020
This test is somewhat regularly flaky on 12.x and it appears master as
well. There appears to be an issue tracking flakes for 6+ months. Seems
reasonable to mark this to keep us from having red CI.

Refs: #29802

PR-URL: #32595
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@mhdawson
Copy link
Member

mhdawson commented Apr 8, 2020

targos pushed a commit that referenced this issue Apr 12, 2020
This test is somewhat regularly flaky on 12.x and it appears master as
well. There appears to be an issue tracking flakes for 6+ months. Seems
reasonable to mark this to keep us from having red CI.

Refs: #29802

PR-URL: #32595
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@ronag
Copy link
Member

ronag commented Apr 13, 2020

still flaky on various platforms, linuxone in this case https://ci.nodejs.org/job/node-test-pull-request/30678/

@sam-github
Copy link
Contributor Author

failed in https://ci.nodejs.org/job/node-test-commit-linux-containered/19466/, linux-containered, x86

@sam-github sam-github removed the freebsd Issues and PRs related to the FreeBSD platform. label Apr 13, 2020
sam-github added a commit to sam-github/node that referenced this issue Apr 13, 2020
@Trott Trott changed the title flaky test-http2-reset-flood on freebsd flaky test-http2-reset-flood Apr 14, 2020
sam-github added a commit that referenced this issue Apr 14, 2020
Refs: #29802
Refs: #32595

PR-URL: #32825
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
MylesBorins pushed a commit that referenced this issue Apr 17, 2020
Refs: #29802
Refs: #32595

PR-URL: #32825
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
targos pushed a commit to targos/node that referenced this issue Apr 25, 2020
Refs: nodejs#29802
Refs: nodejs#32595

PR-URL: nodejs#32825
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
BridgeAR pushed a commit that referenced this issue Apr 28, 2020
Refs: #29802
Refs: #32595

PR-URL: #32825
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
targos pushed a commit that referenced this issue Apr 28, 2020
Refs: #29802
Refs: #32595

PR-URL: #32825
Reviewed-By: Gerhard Stöbich <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Myles Borins <[email protected]>
@Trott
Copy link
Member

Trott commented Jul 9, 2020

Timed out at https://ci.nodejs.org/job/node-test-commit-linux/36018/nodes=debian9-64/console on test-digitalocean-debian9-x64-1:

00:12:17 not ok 1363 parallel/test-http2-reset-flood # TODO : Fix flaky test
00:12:17   ---
00:12:17   duration_ms: 120.14
00:12:17   severity: fail
00:12:17   exitcode: -15
00:12:17   stack: |-
00:12:17     timeout
00:12:17   ...

Trott added a commit to Trott/io.js that referenced this issue Jul 9, 2020
Move boolean check to start of writeRequest() to avoid race condition.

Fixes: nodejs#29802
@Trott
Copy link
Member

Trott commented Jul 9, 2020

I changed the test to track calls and to timeout after two seconds (because it normally takes about 100 milliseconds on the debian9 machine in CI) and it is showing things like 167450 calls to the test's writeRequests() function. So the flood is not causing the expected error. This seems like a bug in http2 but I'm not sure what to do next. @nodejs/http2

Here's the modified test I ran: https://github.com/nodejs/node/blob/09713d420bfd65ab550840eb55e10e4996bbcba2/test/parallel/test-http2-reset-flood.js

And here's a sample failure in the stress test on CI:

1 1 167450 0
assert.js:143
  throw err;
  ^

AssertionError [ERR_ASSERTION]: timed out
    at Timeout._onTimeout (/home/iojs/build/workspace/node-stress-single-test/nodes/debian9-64/test/parallel/test-http2-reset-flood.js:21:21)
    at listOnTimeout (internal/timers.js:555:17)
    at processTimers (internal/timers.js:498:7) {
  generatedMessage: false,
  code: 'ERR_ASSERTION',
  actual: undefined,
  expected: undefined,
  operator: 'fail'
}

The key part is that first line that shows that writeRequests() was called 167450 times.

I had previously inserted a bunch of console.log() statements to see what was going on but that made the test much more reliable. So seems like a timing/race condition somewhere.

@Trott Trott added the http2 Issues or PRs related to the http2 subsystem. label Jul 9, 2020
@Trott
Copy link
Member

Trott commented Jul 11, 2020

The timeouts we're seeing now are different than the issue that was first reported (which appears to have been fixed). I did a bisect using CI today (since it's easy to make this happen in the stress tests) and came up with efefdd6 as the first commit where the problem was introduced. I'll do some more tests to confirm, but if someone knowledgable wants to take a look there, that would be great.

@Trott
Copy link
Member

Trott commented Jul 12, 2020

@ronag In #31806, you wrote:

During this refactoring due to slightly different timing of things other bugs got activated and needed to be resolved.

Does the test timeout problem in this issue seem a likely candidate for a similar bug that may have been missed because it only happens sometimes?

@Trott
Copy link
Member

Trott commented Jul 12, 2020

Setting allowHalfOpen: true for the client fixes the problem. I don't think that invalidates the test or anything, but would like someone else's opinion. I'll open a pull request.

Trott added a commit to Trott/io.js that referenced this issue Jul 12, 2020
Set `allowHalfOpen: true` in the client.

Fixes: nodejs#29802
Refs: nodejs#31806
MylesBorins pushed a commit that referenced this issue Jul 14, 2020
Set `allowHalfOpen: true` in the client.

Fixes: #29802
Refs: #31806

PR-URL: #34318
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 16, 2020
Set `allowHalfOpen: true` in the client.

Fixes: #29802
Refs: #31806

PR-URL: #34318
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
cjihrig pushed a commit that referenced this issue Jul 23, 2020
Set `allowHalfOpen: true` in the client.

Fixes: #29802
Refs: #31806

PR-URL: #34318
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
addaleax pushed a commit that referenced this issue Sep 22, 2020
Set `allowHalfOpen: true` in the client.

Fixes: #29802
Refs: #31806

PR-URL: #34318
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
addaleax pushed a commit that referenced this issue Sep 22, 2020
Set `allowHalfOpen: true` in the client.

Fixes: #29802
Refs: #31806

PR-URL: #34318
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Robert Nagy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. http2 Issues or PRs related to the http2 subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants