Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky: test-child-process-pipe-dataflow #25988

Closed
refack opened this issue Feb 7, 2019 · 34 comments
Closed

flaky: test-child-process-pipe-dataflow #25988

refack opened this issue Feb 7, 2019 · 34 comments
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.

Comments

@refack
Copy link
Contributor

refack commented Feb 7, 2019

Job: https://ci.nodejs.org/job/node-test-binary-windows/23616/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=3/testReport/junit/(root)/test/parallel_test_child_process_pipe_dataflow/
Worker: https://ci.nodejs.org/computer/test-azure_msft-win10-x64-3/
Code:
https://github.com/gireeshpunathil/node/blob/b1f82e4342f8a630b1ef83cd33781a725428f569/test/parallel/test-child-process-pipe-dataflow.js#L44-L46
Message:

assert.js:86
  throw new AssertionError(obj);
  ^

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:

1 !== 0

    at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-pipe-dataflow.js:45:12)
    at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\common\index.js:367:15)
    at ChildProcess.emit (events.js:197:13)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:254:12)
@refack refack added windows Issues and PRs related to the Windows platform. test Issues and PRs related to the tests. flaky-test Issues and PRs related to the tests with unstable failures on the CI. labels Feb 7, 2019
@gireeshpunathil
Copy link
Member

if grep found less bytes than expected, that should result in another assertion.
if grep did not find any bytes, then it should hang (our pipe is fused into its stdin) leading to timeout
if grep shows up exit code 1 would mean strange: Is it that grep is not available on the system? in which case the failure should be consistent!

Is this system picked part of regular CI? @refack - can you please check if it has grep in the first place? My test semantics expects UNIX simulated grep, not the windows native one (though I am not sure how different they are in terms of arguments). thanks!

@gireeshpunathil
Copy link
Member

ran a 3K times on the failing system but no reproduce.

It turns out that the assumptions about grep were not right - it works like in any other system, and there is nothing special. At the moment the best thing I believe is to catch the standard error streams of the child process as well, so that we know what happened to those.

gireeshpunathil added a commit to gireeshpunathil/node that referenced this issue Feb 11, 2019
If the test fails with errors from the child commands,
there is no debug info. Suppliment the stderr data
so that we know what to look for.

Refs: nodejs#25988
PR-URL: nodejs#26007
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
targos pushed a commit that referenced this issue Feb 11, 2019
If the test fails with errors from the child commands,
there is no debug info. Suppliment the stderr data
so that we know what to look for.

Refs: #25988
PR-URL: #26007
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@Trott
Copy link
Member

Trott commented Mar 16, 2019

https://ci.nodejs.org/job/node-test-binary-windows/24562/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=0/console

test-azure_msft-win10-x64-4

00:22:57 not ok 61 parallel/test-child-process-pipe-dataflow
00:22:57   ---
00:22:57   duration_ms: 0.175
00:22:57   severity: fail
00:22:57   exitcode: 1
00:22:57   stack: |-
00:22:57     got unexpected data from child #0:
00:22:57     cat
00:22:57     got unexpected data from child #0:
00:22:57     :
00:22:57     got unexpected data from child #0:
00:22:57      
00:22:57     got unexpected data from child #0:
00:22:57     write error
00:22:57     got unexpected data from child #0:
00:22:57     : 
00:22:57     got unexpected data from child #0:
00:22:57     No space left on device
00:22:57     got unexpected data from child #0:
00:22:57     
00:22:57     
00:22:57     assert.js:85
00:22:57       throw new AssertionError(obj);
00:22:57       ^
00:22:57     
00:22:57     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
00:22:57     
00:22:57     1 !== 0
00:22:57     
00:22:57         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-pipe-dataflow.js:42:14)
00:22:57         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\common\index.js:358:15)
00:22:57         at ChildProcess.emit (events.js:198:13)
00:22:57         at Process.ChildProcess._handle.onexit (internal/child_process.js:254:12)
00:22:57   ...

@refack
Copy link
Contributor Author

refack commented Mar 16, 2019

No space left on device

Is this expected?

[cat, grep, wc].forEach((child, index) => {
child.stderr.on('data', (d) => {
// Don't want to assert here, as we might miss error code info.
console.error(`got unexpected data from child #${index}:\n${d}`);
});
child.on('exit', common.mustCall(function(code) {
assert.strictEqual(code, 0);
}));
});

P.S. Worker had 60GB free disk space.

@refack
Copy link
Contributor Author

refack commented Mar 16, 2019

P.S. I want to add a lint rule that looks for asserts with no message in loops 🤔

@Trott
Copy link
Member

Trott commented Mar 20, 2019

https://ci.nodejs.org/job/node-test-binary-windows/24683/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=1/console

test-azure_msft-win10-x64-5

10:32:10 not ok 62 parallel/test-child-process-pipe-dataflow
10:32:10   ---
10:32:10   duration_ms: 0.212
10:32:10   severity: fail
10:32:10   exitcode: 1
10:32:10   stack: |-
10:32:10     assert.js:85
10:32:10       throw new AssertionError(obj);
10:32:10       ^
10:32:10     
10:32:10     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
10:32:10     
10:32:10     1 !== 0
10:32:10     
10:32:10         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\parallel\test-child-process-pipe-dataflow.js:42:14)
10:32:10         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows\test\common\index.js:358:15)
10:32:10         at ChildProcess.emit (events.js:198:13)
10:32:10         at Process.ChildProcess._handle.onexit (internal/child_process.js:254:12)
10:32:10   ...

@gireeshpunathil
Copy link
Member

looking for suggestions here!
the need for data flow integrity check between a -> b -> c is valid, and essential. However, selection of a, b, c and their interplay (being independent OS processes) seem to be proving less robust and less trustworthy?

@refack
Copy link
Contributor Author

refack commented Apr 13, 2019

@Trott
Copy link
Member

Trott commented May 21, 2019

https://ci.nodejs.org/job/node-test-binary-windows-2/890/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=1/console

test-azure_msft-win10-x64-2

18:23:33 not ok 130 parallel/test-child-process-pipe-dataflow
18:23:33   ---
18:23:33   duration_ms: 0.147
18:23:33   severity: fail
18:23:33   exitcode: 1
18:23:33   stack: |-
18:23:33     assert.js:89
18:23:33       throw new AssertionError(obj);
18:23:33       ^
18:23:33     
18:23:33     AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:
18:23:33     
18:23:33     1 !== 0
18:23:33     
18:23:33         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\parallel\test-child-process-pipe-dataflow.js:46:14)
18:23:33         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\common\index.js:374:15)
18:23:33         at ChildProcess.emit (events.js:200:13)
18:23:33         at Process.ChildProcess._handle.onexit (internal/child_process.js:272:12) {
18:23:33       generatedMessage: true,
18:23:33       code: 'ERR_ASSERTION',
18:23:33       actual: 1,
18:23:33       expected: 0,
18:23:33       operator: 'strictEqual'
18:23:33     }
18:23:33   ...

@gireeshpunathil
Copy link
Member

as there was no associated stderr messages from child, I would assume this is an early exit of one of the child process (with exit code 1) for an unknown reason. I know this info does not help much, but just stating, to localize the problem statement.

@Trott
Copy link
Member

Trott commented May 21, 2019

I'll open a PR to add more debugging data so that if this happens again, at least we'll know which subprocess is failing.

@Trott
Copy link
Member

Trott commented May 21, 2019

I'll open a PR to add more debugging data so that if this happens again, at least we'll know which subprocess is failing.

#27790

@Trott
Copy link
Member

Trott commented Nov 25, 2019

https://ci.nodejs.org/job/node-test-binary-windows-2/4422/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=0/console

test-azure_msft-win10-x64-2

00:11:03 not ok 135 parallel/test-child-process-pipe-dataflow
00:11:03   ---
00:11:03   duration_ms: 0.212
00:11:03   severity: fail
00:11:03   exitcode: 1
00:11:03   stack: |-
00:11:03     Can't clean tmpdir: c:\workspace\node-test-binary-windows-2\test\.tmp.134
00:11:03     Files blocking: [ 'data.txt' ]
00:11:03     
00:11:03     assert.js:93
00:11:03       throw new AssertionError(obj);
00:11:03       ^
00:11:03     
00:11:03     AssertionError [ERR_ASSERTION]: child 1 exited with code 1
00:11:03         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\parallel\test-child-process-pipe-dataflow.js:49:14)
00:11:03         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\common\index.js:372:15)
00:11:03         at ChildProcess.emit (events.js:223:5)
00:11:03         at Process.ChildProcess._handle.onexit (internal/child_process.js:270:12) {
00:11:03       generatedMessage: false,
00:11:03       code: 'ERR_ASSERTION',
00:11:03       actual: 1,
00:11:03       expected: 0,
00:11:03       operator: 'strictEqual'
00:11:03     }
00:11:03   ...

@gireeshpunathil
Copy link
Member

this does not look like the usual symptom this test was manifesting - cat failure or grep failure. There was an infra failure yesterday (25/11) - could this be an indirect side effect of that?

@Trott
Copy link
Member

Trott commented Dec 2, 2019

this does not look like the usual symptom this test was manifesting - cat failure or grep failure. There was an infra failure yesterday (25/11) - could this be an indirect side effect of that?

It happened again a few hours ago:

https://ci.nodejs.org/job/node-test-binary-windows-2/4716/COMPILED_BY=vs2017,RUNNER=win10,RUN_SUBSET=0/console

test-azure_msft-win10-x64-2

00:09:26 not ok 135 parallel/test-child-process-pipe-dataflow
00:09:26   ---
00:09:26   duration_ms: 0.194
00:09:26   severity: fail
00:09:26   exitcode: 1
00:09:26   stack: |-
00:09:26     assert.js:98
00:09:26       throw new AssertionError(obj);
00:09:26       ^
00:09:26     
00:09:26     AssertionError [ERR_ASSERTION]: child 1 exited with code 1
00:09:26         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\parallel\test-child-process-pipe-dataflow.js:49:14)
00:09:26         at ChildProcess.<anonymous> (c:\workspace\node-test-binary-windows-2\test\common\index.js:372:15)
00:09:26         at ChildProcess.emit (events.js:219:5)
00:09:26         at Process.ChildProcess._handle.onexit (internal/child_process.js:274:12) {
00:09:26       generatedMessage: false,
00:09:26       code: 'ERR_ASSERTION',
00:09:26       actual: 1,
00:09:26       expected: 0,
00:09:26       operator: 'strictEqual'
00:09:26     }
00:09:26   ...

@Trott
Copy link
Member

Trott commented Dec 2, 2019

Since stdio can still be open when the subprocess exits, I wonder if we're losing information by asserting so early on the error code?

Trott added a commit to Trott/io.js that referenced this issue Mar 6, 2021
When the test fails, it usually ends up failing on the assertion that
`wc` did not find as many bytes as it expected. That may not be
helpful if it is caused by a failure earlier in the command
pipeline ( cat | grep | wc ). Move the byte check to
`process.on('exit')` so that other error handlers that report the
existence of errors run first.

Refs: nodejs#25988
Trott added a commit to Trott/io.js that referenced this issue Mar 9, 2021
When the test fails, it usually ends up failing on the assertion that
`wc` did not find as many bytes as it expected. That may not be
helpful if it is caused by a failure earlier in the command
pipeline ( cat | grep | wc ). Move the byte check to
`process.on('exit')` so that other error handlers that report the
existence of errors run first.

Refs: nodejs#25988

PR-URL: nodejs#37632
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@Trott
Copy link
Member

Trott commented Mar 14, 2021

Different way of failing than the last time. A timeout this time.

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/8791/RUN_SUBSET=0,nodes=win10-COMPILED_BY-vs2019/console

00:33:15 not ok 781 parallel/test-child-process-pipe-dataflow
00:33:15   ---
00:33:15   duration_ms: 120.257
00:33:15   severity: fail
00:33:15   exitcode: 1
00:33:15   stack: |-
00:33:15     timeout
00:33:15   ...

danielleadams pushed a commit that referenced this issue Mar 16, 2021
When the test fails, it usually ends up failing on the assertion that
`wc` did not find as many bytes as it expected. That may not be
helpful if it is caused by a failure earlier in the command
pipeline ( cat | grep | wc ). Move the byte check to
`process.on('exit')` so that other error handlers that report the
existence of errors run first.

Refs: #25988

PR-URL: #37632
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
targos pushed a commit that referenced this issue May 1, 2021
Make sure all the `wc` process stdout data is received before checking
its validity.

Fixes: #25988

PR-URL: #36366
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: James M Snell <[email protected]>
targos pushed a commit that referenced this issue May 1, 2021
When the test fails, it usually ends up failing on the assertion that
`wc` did not find as many bytes as it expected. That may not be
helpful if it is caused by a failure earlier in the command
pipeline ( cat | grep | wc ). Move the byte check to
`process.on('exit')` so that other error handlers that report the
existence of errors run first.

Refs: #25988

PR-URL: #37632
Reviewed-By: Gireesh Punathil <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@targos targos pinned this issue Nov 17, 2021
@targos
Copy link
Member

targos commented Nov 17, 2021

This test started to fail again. Almost 100% of Windows CI runs are red.

@lpinca
Copy link
Member

lpinca commented Nov 17, 2021

@targos this seems to fix the issue in my Windows machine:

diff --git a/test/parallel/test-child-process-pipe-dataflow.js b/test/parallel/test-child-process-pipe-dataflow.js
index 2e2edc65e9..5b34ea1429 100644
--- a/test/parallel/test-child-process-pipe-dataflow.js
+++ b/test/parallel/test-child-process-pipe-dataflow.js
@@ -27,7 +27,7 @@ const MB = KB * KB;
   // So cut the buffer into lines at some points, forcing
   // data flow to be split in the stream.
   for (let i = 1; i < KB; i++)
-    buf.write(os.EOL, i * KB);
+    buf.write('\n', i * KB);
   fs.writeFileSync(file, buf.toString());

   cat = spawn('cat', [file]);

lpinca added a commit to lpinca/node that referenced this issue Nov 17, 2021
lpinca added a commit to lpinca/node that referenced this issue Nov 17, 2021
lpinca added a commit to lpinca/node that referenced this issue Nov 17, 2021
@Trott Trott closed this as completed in 0c2011c Nov 18, 2021
@targos targos unpinned this issue Nov 18, 2021
targos pushed a commit that referenced this issue Nov 21, 2021
Fixes: #25988

PR-URL: #40838
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
richardlau pushed a commit that referenced this issue Nov 23, 2021
Fixes: #25988

PR-URL: #40838
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
richardlau pushed a commit that referenced this issue Nov 23, 2021
Fixes: #25988

PR-URL: #40838
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
richardlau pushed a commit that referenced this issue Nov 25, 2021
Fixes: #25988

PR-URL: #40838
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
BethGriggs pushed a commit that referenced this issue Nov 29, 2021
Fixes: #25988

PR-URL: #40838
Reviewed-By: Michaël Zasso <[email protected]>
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Michael Dawson <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@joyeecheung joyeecheung reopened this Aug 18, 2023
@joyeecheung
Copy link
Member

This almost always fail in the Windows CI now

@lpinca
Copy link
Member

lpinca commented Sep 23, 2024

I'm closing this as the test is now skipped (#49563) and the current flakiness is caused by a Cygwin bug. See also #48300.

@lpinca lpinca closed this as completed Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. test Issues and PRs related to the tests. windows Issues and PRs related to the Windows platform.
Projects
None yet
9 participants