-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch @hapi/podium errors #84575
Catch @hapi/podium errors #84575
Conversation
console.error('An unexpected error occurred while writing to the log:', err.stack); | ||
process.exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this is identified, maybe we would like to just log the error and not exit the process? This seems a bit overkill for an error occurring inside logging? @restrry wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit overkill for an error occurring inside logging?
Maybe it's an overkill, but it's easy to miss that log message in stderr
, so debugging a Kibana crash without logs might be a nightmare. The current PR replicates the current behavior, and it doesn't seem we've ever had any complaints about it. CC @joshdover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit has been in my working directory for about a week, so some of the details about the original behaviour is unfortunately a little bit fuzzy, but I just tried to recreate my original issue which I was trying to debug when discovering this, and found that under normal circumstances, we do get a regular unhandled rejection crash which fairly easily can be traced to the right location. However, my special circumstance was that the server was shut down prior to the unhandled rejection event being fired, and so it was never logged.
I'm fine with either logging an continuing, re-throwing, or using process.exit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's an overkill, but it's easy to miss that log message in
stderr
, so debugging a Kibana crash without logs might be a nightmare.
I'm probably missing something, but how does crashing the process make debugging this easier for the user?
That said, does the current audit logging implementation go through this mechanism? If so, we may want a crash here if we're unable to write audit log events. Even if not, I also lean towards just leaving this as the current behavior until we remove legacy logging altogether in 8.0 and make the behavior change then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does crashing the process make debugging this easier for the user?
I think it's just meant to make it easier for us as developers 😅 The end-user not so much.
Today the process crashes on unhandled rejections as the process is considered to be in an unknown state and that it's unsafe to continue:
kibana/src/setup_node_env/exit_on_warning.js
Lines 62 to 72 in 5420177
// While the above warning listener would also be called on | |
// unhandledRejection warnings, we can give a better error message if we | |
// handle them separately: | |
process.on('unhandledRejection', function (reason) { | |
console.error('Unhandled Promise rejection detected:'); | |
console.error(); | |
console.error(reason); | |
console.error(); | |
console.error('Terminating process...'); | |
process.exit(1); | |
}); |
does the current audit logging implementation go through this mechanism?
I'm actually not sure. @thomheymann Do you know if audit logging uses the same logger class as the rest of Kibana?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something, but how does crashing the process make debugging this easier for the user?
Kibana could stop logging messages at any time due to an error in logging system. Should it crash later (for example, due to a lack of disk space), users aren't be able to diagnosis a problem without logs.
That said, does the current audit logging implementation go through this mechanism? If so, we may want a crash here if we're unable to write audit log events.
Yes, it does.
Even if not, I also lean towards just leaving this as the current behavior until we remove legacy logging altogether in 8.0 and make the behavior change then.
Ideally, we should initiate a graceful shutdown for such cases, but right now the core doesn't provide such functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@restrry So does that mean, that until such a time when we have a graceful shutdown mechanism, that the current approach in this PR is ok to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@watson Yes, I think so
@elasticmachine merge upstream |
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]
History
To update your PR or re-run it, just comment with: |
* master: (40 commits) fix: 🐛 don't add separator befor group on no main items (elastic#83166) [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323) [APM] Add APM agent config options (elastic#84678) Fixed a11y issue on rollup jobs table selection (elastic#84567) [Discover] Refactor getContextUrl to separate file (elastic#84503) [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654) [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749) [Lens] Migrate legacy es client and remove total hits as int (elastic#84340) Improve logging pipeline in @kbn/legacy-logging (elastic#84629) Catch @hapi/podium errors (elastic#84575) [Discover] Unskip date histogram test (elastic#84727) Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791) [Enterprise Search] Fix schema errors button (elastic#84842) [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589) [Maps] Always initialize routes on server-startup (elastic#84806) [Fleet] EPM support to handle uploaded file paths (elastic#84708) [Snapshot Restore] Fix initial policy form state (elastic#83928) Upgrade Node.js to version 14 (elastic#83425) [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992) [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006) ...
* master: (236 commits) fix: 🐛 don't add separator befor group on no main items (elastic#83166) [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323) [APM] Add APM agent config options (elastic#84678) Fixed a11y issue on rollup jobs table selection (elastic#84567) [Discover] Refactor getContextUrl to separate file (elastic#84503) [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654) [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749) [Lens] Migrate legacy es client and remove total hits as int (elastic#84340) Improve logging pipeline in @kbn/legacy-logging (elastic#84629) Catch @hapi/podium errors (elastic#84575) [Discover] Unskip date histogram test (elastic#84727) Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791) [Enterprise Search] Fix schema errors button (elastic#84842) [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589) [Maps] Always initialize routes on server-startup (elastic#84806) [Fleet] EPM support to handle uploaded file paths (elastic#84708) [Snapshot Restore] Fix initial policy form state (elastic#83928) Upgrade Node.js to version 14 (elastic#83425) [Security Solution] Keep Endpoint policies up to date with license changes (elastic#83992) [Security Solution][Exceptions] Implement exceptions for ML rules (elastic#84006) ...
…overy-action-group * upstream/master: (48 commits) [Lens] accessibility screen reader issues (elastic#84395) [Logs UI] Fetch single log entries via a search strategy (elastic#81710) fix: 🐛 don't add separator befor group on no main items (elastic#83166) [Security Solution][Detections] Implements indicator match rule cypress test (elastic#84323) [APM] Add APM agent config options (elastic#84678) Fixed a11y issue on rollup jobs table selection (elastic#84567) [Discover] Refactor getContextUrl to separate file (elastic#84503) [Embeddable] Export CSV action for Lens embeddables in dashboard (elastic#83654) [TSVB] [Cleanup] Remove extra dateFormat props (elastic#84749) [Lens] Migrate legacy es client and remove total hits as int (elastic#84340) Improve logging pipeline in @kbn/legacy-logging (elastic#84629) Catch @hapi/podium errors (elastic#84575) [Discover] Unskip date histogram test (elastic#84727) Rename server.xsrf.whitelist to server.xsrf.allowlist (elastic#84791) [Enterprise Search] Fix schema errors button (elastic#84842) [APM] Removes react-sticky dependency in favor of using CSS (elastic#84589) [Maps] Always initialize routes on server-startup (elastic#84806) [Fleet] EPM support to handle uploaded file paths (elastic#84708) [Snapshot Restore] Fix initial policy form state (elastic#83928) Upgrade Node.js to version 14 (elastic#83425) ...
If an async error occurred within the call to
this.events.emit
, the error would be very hard to trace. In my case I would not get any meaningful stack trace and it took me many days to track it down to this location. Adding thiscatch
solves that issue. I think exiting as I do here is ok, as the alternative (not having thecatch
) would resolve in an unhandled rejection which has the same effect. But please let me know if it can be done better 😃