Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch @hapi/podium errors #84575

Merged
merged 3 commits into from
Dec 3, 2020
Merged

Catch @hapi/podium errors #84575

merged 3 commits into from
Dec 3, 2020

Conversation

watson
Copy link
Contributor

@watson watson commented Nov 30, 2020

If an async error occurred within the call to this.events.emit, the error would be very hard to trace. In my case I would not get any meaningful stack trace and it took me many days to track it down to this location. Adding this catch solves that issue. I think exiting as I do here is ok, as the alternative (not having the catch) would resolve in an unhandled rejection which has the same effect. But please let me know if it can be done better 😃

@watson watson added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes v7.11.0 labels Nov 30, 2020
@watson watson self-assigned this Nov 30, 2020
@watson watson requested a review from a team as a code owner November 30, 2020 19:25
Comment on lines +129 to +130
console.error('An unexpected error occurred while writing to the log:', err.stack);
process.exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is identified, maybe we would like to just log the error and not exit the process? This seems a bit overkill for an error occurring inside logging? @restrry wdyt?

Copy link
Contributor

@mshustov mshustov Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit overkill for an error occurring inside logging?

Maybe it's an overkill, but it's easy to miss that log message in stderr, so debugging a Kibana crash without logs might be a nightmare. The current PR replicates the current behavior, and it doesn't seem we've ever had any complaints about it. CC @joshdover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit has been in my working directory for about a week, so some of the details about the original behaviour is unfortunately a little bit fuzzy, but I just tried to recreate my original issue which I was trying to debug when discovering this, and found that under normal circumstances, we do get a regular unhandled rejection crash which fairly easily can be traced to the right location. However, my special circumstance was that the server was shut down prior to the unhandled rejection event being fired, and so it was never logged.

I'm fine with either logging an continuing, re-throwing, or using process.exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's an overkill, but it's easy to miss that log message in stderr, so debugging a Kibana crash without logs might be a nightmare.

I'm probably missing something, but how does crashing the process make debugging this easier for the user?

That said, does the current audit logging implementation go through this mechanism? If so, we may want a crash here if we're unable to write audit log events. Even if not, I also lean towards just leaving this as the current behavior until we remove legacy logging altogether in 8.0 and make the behavior change then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does crashing the process make debugging this easier for the user?

I think it's just meant to make it easier for us as developers 😅 The end-user not so much.

Today the process crashes on unhandled rejections as the process is considered to be in an unknown state and that it's unsafe to continue:

// While the above warning listener would also be called on
// unhandledRejection warnings, we can give a better error message if we
// handle them separately:
process.on('unhandledRejection', function (reason) {
console.error('Unhandled Promise rejection detected:');
console.error();
console.error(reason);
console.error();
console.error('Terminating process...');
process.exit(1);
});

does the current audit logging implementation go through this mechanism?

I'm actually not sure. @thomheymann Do you know if audit logging uses the same logger class as the rest of Kibana?

Copy link
Contributor

@mshustov mshustov Dec 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something, but how does crashing the process make debugging this easier for the user?

Kibana could stop logging messages at any time due to an error in logging system. Should it crash later (for example, due to a lack of disk space), users aren't be able to diagnosis a problem without logs.

That said, does the current audit logging implementation go through this mechanism? If so, we may want a crash here if we're unable to write audit log events.

Yes, it does.

Even if not, I also lean towards just leaving this as the current behavior until we remove legacy logging altogether in 8.0 and make the behavior change then.

Ideally, we should initiate a graceful shutdown for such cases, but right now the core doesn't provide such functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@restrry So does that mean, that until such a time when we have a graceful shutdown mechanism, that the current approach in this PR is ok to merge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@watson Yes, I think so

@watson
Copy link
Contributor Author

watson commented Dec 2, 2020

@elasticmachine merge upstream

@watson
Copy link
Contributor Author

watson commented Dec 2, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@watson watson merged commit 770a005 into elastic:master Dec 3, 2020
@watson watson deleted the catch-podium-errors branch December 3, 2020 08:33
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 3, 2020
* master: (40 commits)
  fix: 🐛 don't add separator befor group on no main items (elastic#83166)
  [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323)
  [APM] Add APM agent config options (elastic#84678)
  Fixed a11y issue on rollup jobs table selection (elastic#84567)
  [Discover] Refactor getContextUrl to separate file (elastic#84503)
  [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654)
  [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749)
  [Lens] Migrate legacy es client and remove total hits as int (elastic#84340)
  Improve logging pipeline in @kbn/legacy-logging (elastic#84629)
  Catch @hapi/podium errors (elastic#84575)
  [Discover] Unskip date histogram test (elastic#84727)
  Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791)
  [Enterprise Search] Fix schema errors button (elastic#84842)
  [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589)
  [Maps] Always initialize routes on server-startup (elastic#84806)
  [Fleet] EPM support to handle uploaded file paths (elastic#84708)
  [Snapshot Restore] Fix initial policy form state (elastic#83928)
  Upgrade Node.js to version 14 (elastic#83425)
  [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992)
  [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 3, 2020
* master: (236 commits)
  fix: 🐛 don't add separator befor group on no main items (elastic#83166)
  [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323)
  [APM] Add APM agent config options (elastic#84678)
  Fixed a11y issue on rollup jobs table selection (elastic#84567)
  [Discover] Refactor getContextUrl to separate file (elastic#84503)
  [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654)
  [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749)
  [Lens] Migrate legacy es client and remove total hits as int (elastic#84340)
  Improve logging pipeline in @kbn/legacy-logging (elastic#84629)
  Catch @hapi/podium errors (elastic#84575)
  [Discover] Unskip date histogram test (elastic#84727)
  Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791)
  [Enterprise Search] Fix schema errors button (elastic#84842)
  [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589)
  [Maps] Always initialize routes on server-startup (elastic#84806)
  [Fleet] EPM support to handle uploaded file paths (elastic#84708)
  [Snapshot Restore] Fix initial policy form state (elastic#83928)
  Upgrade Node.js to version 14 (elastic#83425)
  [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992)
  [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Dec 3, 2020
…overy-action-group

* upstream/master: (48 commits)
  [Lens] accessibility screen reader issues (elastic#84395)
  [Logs UI] Fetch single log entries via a search strategy (elastic#81710)
  fix: 🐛 don't add separator befor group on no main items (elastic#83166)
  [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323)
  [APM] Add APM agent config options (elastic#84678)
  Fixed a11y issue on rollup jobs table selection (elastic#84567)
  [Discover] Refactor getContextUrl to separate file (elastic#84503)
  [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654)
  [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749)
  [Lens] Migrate legacy es client and remove total hits as int (elastic#84340)
  Improve logging pipeline in @kbn/legacy-logging (elastic#84629)
  Catch @hapi/podium errors (elastic#84575)
  [Discover] Unskip date histogram test (elastic#84727)
  Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791)
  [Enterprise Search] Fix schema errors button (elastic#84842)
  [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589)
  [Maps] Always initialize routes on server-startup (elastic#84806)
  [Fleet] EPM support to handle uploaded file paths (elastic#84708)
  [Snapshot Restore] Fix initial policy form state (elastic#83928)
  Upgrade Node.js to version 14 (elastic#83425)
  ...
watson added a commit that referenced this pull request Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants