Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Fail CDP commands on unhandled exceptions #9639

Merged
merged 3 commits into from
Jul 27, 2018

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Jul 11, 2018

Install a Chrome Devtools Protocol handler for unhandled exceptions, and
fail the next CDP command with it.

This got lost with the move from PhantomJS to Chrome.

  • adjust naughty override for testSerialConsole
  • investigate crash in testNfsMissingPackages
  • fix failures of tests which expect ph_* rejections
  • fix Cannot read property 'data_types' of null crash in test/verify/check-openshift TestOpenshift.testKubevirtMachinesList

@martinpitt
Copy link
Member Author

Note: I did not include the playground test from #9539 yet, as this needs some more thought on how to turn this into a successful test. Let's first see which other tests cause an unhandled exception - it seems at least the serial console test should fail in a more useful way here.

@@ -72,6 +73,8 @@ function setupLogging(client) {
resolveLogPromise();
});

client.Runtime.exceptionThrown(info => unhandledExceptions.push(info.exceptionDetails));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be written to the console. Is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could (and indeed I had a local test version that did that). It would replicate the same message as the testlib.Error, though, so it wouldn't tell anything new.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple exceptions, I really want to see them all, not just the last one. Similar to what we do with unexpected journal messages.

@@ -295,7 +298,7 @@ CDP.New(options)
if (unhandledExceptions.length === 0) {
success(reply);
} else {
fail(unhandledExceptions[0].exception.description);
fail(unhandledExceptions[0].exception.message);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you figure it is the line breaks that break the protocol? Maybe we should still escape them here, in case the message also has multiple lines?

@martinpitt
Copy link
Member Author

Still fails; I have an alternative approach locally, but I didn't work it out fully yet (higher priority things to do, sorry).

@stefwalter
Copy link
Contributor

The driver is sending an empty JSON Object {}

@stefwalter
Copy link
Contributor

Did a fixup which should fix the missing cases. Lets let this validate and then squash before merging.

@@ -288,7 +299,19 @@ CDP.New(options)
pageLoadPromise = new Promise((resolve, reject) => { pageLoadResolve = resolve; pageLoadReject = reject; });

// run the command
eval(command).then(success, fail);
eval(command).then(reply => {
var details, message;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be let

@@ -72,6 +77,12 @@ function setupLogging(client) {
resolveLogPromise();
});

client.Runtime.exceptionThrown(info => {
var details = info.exceptionDetails;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be let

@stefwalter
Copy link
Contributor

Fixed for review.

@martinpitt
Copy link
Member Author

This works in general, thanks! I made a few TODOs at the top for particular crashes that this now picks up.

@martinpitt
Copy link
Member Author

Unlike the serial console one, the test/verify/check-storage-nfs TestStoragePackages.testNfsMissingPackages failure reproduces well and reliably locally. Unfortunately the exception has no stack trace, and it does not show up on the JS console when manually clicking the "Install" button in the test case - it just shows the (expected) warning about fake-nfs-utils not being available from any repo.

So I wonder if that is actually from our injected test code:

-> wait: !ph_is_present("#dialog")
{"exceptionId":1,"text":"Uncaught (in promise)","lineNumber":0,"columnNumber":0,"scriptId":"1","url":"http://127.0.0.2:9091/cockpit/$c6f8515ba7f3c89612824b3b8f11b506ea999c43dae4dfd394db0dc4b1e55d82/storage/index.html#/","exception":{"type":"undefined"},"executionContextId":5}
<- raise {"type":"undefined"}

I'll add some debugging to ph_wait_cond().

@martinpitt
Copy link
Member Author

I fixed the two issues above. Next move: bots.

@martinpitt
Copy link
Member Author

There are three (really four, but the check-metrics test failures have the same root cause) remaining test failures. I have a local fix for check-machines, the other two still need investigation.

@martinpitt
Copy link
Member Author

I now did a more general change to avoid the three new test failures, by filtering out unhandled exceptions from our own test suite. My TestMachines change from yesterday is still a nice cleanup, but I'll send it as a separate PR.

Install a Chrome Devtools Protocol handler for unhandled exceptions, and
fail the next CDP command with it.  This got lost with the move from
PhantomJS to Chrome.

Some of our tests rely on failed ph_* function invocations, so we must
not log failures of ph_wait_cond(); these  are also technically
unhandled exceptions. So turn this into a proper `PhWaitCondTimeout`
error class, so that it can be identified reliably.

Handle cases where undefined value is passed to fail/success in
cdp-driver.js.

Adjust the naughty override for the XTerm.js crash (cockpit-project#9641) as this now
shows the actual underlying crash instead of just the followup element
wait timeout.

Closes cockpit-project#9639
@martinpitt
Copy link
Member Author

The crash in testKubevirtMachinesList seems real, adding TODO item.

@martinpitt
Copy link
Member Author

testKubevirtMachinesList crash: This happens while deleting a VM, and crashes in c3/c3. It seems the usage charts need to be stopped right before deleting a VM:

TypeError: Cannot read property 'data_types' of null
    at ChartInternal.c3_chart_internal_fn.isPieType (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:130108:29)
    at ChartInternal.c3_chart_internal_fn.isArcType (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:130119:22)
    at ChartInternal.c3_chart_internal_fn.getArc (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:125358:31)
    at http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:125701:28
    at SVGPathElement.<anonymous> (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:75525:34)
    at Object.tick [as c] (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:75706:22)
    at d3_timer_mark (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:68916:36)
    at d3_timer_step (http://127.0.0.2:9091/cockpit/$a4bd17eb0dd459c340eb353b38f5eca269f0a7e1a87b8fa1362469fedf0dd530/kubernetes/kubernetes.js:68897:16)

@martinpitt
Copy link
Member Author

There are several other reports about similar errors in c3. This seems to be some race condition when switching pages, which I suppose also means "(un)render in React". If I disable the DonutChart in pkg/kubernetes/scripts/virtual-machines/components/VmMetricsTab.jsx, the crash does not occur, so it's definitively this component.

I also logged the podMetrics.cpu.usageNanoCores, used, and available values, and they are never invalid. E. g.

CPUChart usageNanoCores 129560717 used 13 available 87

I even tried to pin down const used = 20 to ensure it's not because of invalid data; the crash still happens. As this is outside of our control, does not really affect the user experience, and is already reported to c3, I think it's best to ignore this particular crash.

@martinpitt
Copy link
Member Author

still fails, this needs to be generalized - it does not only fail in c3:

TypeError: Cannot read property 'data_types' of null
    at r.L.isPieType (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:56270:42)
    at r.L.isArcType (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:56278:25)
    at r.L.getArc (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:54177:30)
    at http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:54303:47
    at SVGPathElement.<anonymous> (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:32170:42)
    at Object.s [as c] (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:29308:77)
    at Re (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:26664:65)
    at Ye (http://127.0.0.2:9091/cockpit/$2cbb8f6e7523f0913b38bae2946d7d4f682ee68a565d8200743cd7ecbf2bcac1/kubernetes/kubernetes.js:26659:21)

This gets triggered by the CPUUsage DonutChart in
pkg/kubernetes/scripts/virtual-machines/components/VmMetricsTab.jsx.

This seems to be some race condition when switching pages, which
supposedly also means "(un)render in React". Cockpit code has no control
over this, and it also does not break the user experience, so ignore the
exception.

See c3js/c3#2187 and simlilar issues
bcbcarl/react-c3js#22 or
c3js/c3#1205.
@martinpitt
Copy link
Member Author

Yay! The remaining two failures on fedora-28 are well-known flakes, so retrying. This is ready for review now.

@martinpitt martinpitt requested review from allisonkarlitskaya and larskarlitski and removed request for allisonkarlitskaya July 24, 2018 07:31
@mareklibra mareklibra mentioned this pull request Jul 26, 2018
44 tasks
Copy link
Contributor

@larskarlitski larskarlitski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@larskarlitski larskarlitski merged commit 436bfe4 into cockpit-project:master Jul 27, 2018
larskarlitski pushed a commit that referenced this pull request Jul 27, 2018
Install a Chrome Devtools Protocol handler for unhandled exceptions, and
fail the next CDP command with it.  This got lost with the move from
PhantomJS to Chrome.

Some of our tests rely on failed ph_* function invocations, so we must
not log failures of ph_wait_cond(); these  are also technically
unhandled exceptions. So turn this into a proper `PhWaitCondTimeout`
error class, so that it can be identified reliably.

Handle cases where undefined value is passed to fail/success in
cdp-driver.js.

Adjust the naughty override for the XTerm.js crash (#9641) as this now
shows the actual underlying crash instead of just the followup element
wait timeout.

Closes #9639
@martinpitt martinpitt deleted the test-oopses branch July 27, 2018 06:16
martinpitt added a commit to martinpitt/cockpit that referenced this pull request Jul 27, 2018
Install a Chrome Devtools Protocol handler for unhandled exceptions, and
fail the next CDP command with it.  This got lost with the move from
PhantomJS to Chrome.

Some of our tests rely on failed ph_* function invocations, so we must
not log failures of ph_wait_cond(); these  are also technically
unhandled exceptions. So turn this into a proper `PhWaitCondTimeout`
error class, so that it can be identified reliably.

Handle cases where undefined value is passed to fail/success in
cdp-driver.js.

Adjust the naughty override for the XTerm.js crash (cockpit-project#9641) as this now
shows the actual underlying crash instead of just the followup element
wait timeout.

Closes cockpit-project#9639

Cherry-picked from upstream master commit 9d3c17b, so that the
naughty overrides match correctly.
martinpitt added a commit that referenced this pull request Jul 29, 2018
Install a Chrome Devtools Protocol handler for unhandled exceptions, and
fail the next CDP command with it.  This got lost with the move from
PhantomJS to Chrome.

Some of our tests rely on failed ph_* function invocations, so we must
not log failures of ph_wait_cond(); these  are also technically
unhandled exceptions. So turn this into a proper `PhWaitCondTimeout`
error class, so that it can be identified reliably.

Handle cases where undefined value is passed to fail/success in
cdp-driver.js.

Adjust the naughty override for the XTerm.js crash (#9641) as this now
shows the actual underlying crash instead of just the followup element
wait timeout.

Closes #9639

Cherry-picked from upstream master commit 9d3c17b, so that the
naughty overrides match correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants