-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting/Screenshots] Handle page setup errors and capture the page, don't fail the job #58683
[Reporting/Screenshots] Handle page setup errors and capture the page, don't fail the job #58683
Conversation
x-pack/legacy/plugins/reporting/export_types/common/lib/screenshots/check_for_toast.ts
Show resolved
Hide resolved
4026286
to
4b2a977
Compare
14bcff1
to
c96095d
Compare
.default(30000), | ||
renderComplete: Joi.number() | ||
.integer() | ||
.default(30000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: These get passed to Puppeteer waitForSelector
calls. Puppeteer's default for these is 30 seconds internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do call https://pptr.dev/#?product=Puppeteer&version=v2.1.1&show=api-pagesetdefaulttimeouttimeout and pass in the overall job timeout, so we might be overriding the default timeouts for these APIS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, we do it here: https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/reporting/server/browsers/chromium/driver_factory/index.ts#L120. Maybe we just default these to the global job timeout in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ If we're passing the timeout to each of the calls, then setting page.setDefaultTimeout
doesn't make a lot of sense.
I don't think we should use the global job timeout, which is the "queue timeout" because once that timeout hits there is no buffer room to recover a completed screenshot for the job. Hitting the "queue timeout" should be avoided as much as possible.
We need to think about this... Should the page.setDefaultTimeout
be the max of these 3 new timeouts?
At any rate, server/browsers/chromium/driver_factory/index.ts#L120
is no longer valid in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joelgriffith wdyt about the changes in 926b694 ?
specifically 926b694#diff-874db4343510c2d0fb661b14af4a97e9R115
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this makes sense:
// All navigation methods default to 30 seconds,
// which can cause the job to fail even if we bump timeouts in // All waitFor methods have their own timeout config
// the config. Help alleviate errors like page.setDefaultTimeout(this.captureConfig.timeouts.openUrl);```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Individual methods can still override this, but I like setting it to the openUrl
which is where I think the bulk of the operation is for PDF reports
Here is a test plan on how this PR will affect the different ways that jobs have been known to fail:
|
// know how many items to expect since gridster incrementally adds panels | ||
// we have to use this hint to wait for all of them | ||
await browser.waitForSelector( | ||
`${renderCompleteSelector},[${itemsCountAttribute}]`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block of code was moved over from the scanPage
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, nice logging additions + try/catch blocks. Think this will help us a ton 👍
2ff57cd
to
b41acab
Compare
… with errors shown
Pinging @elastic/kibana-reporting-services (Team:Reporting Services) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of the changes LGTM
💚 Build SucceededHistory
To update your PR or re-run it, just comment with: |
…, don't fail the job (elastic#58683) * [Reporting] Handle error if intercepted request could not be continued * [Reporting/Screenshots] Handle page setup errors and capture the page with errors shown * show warnings in UI * i18n todos * Cleanup an old troubleshooting task * set the default for all new timeout settings to 30 seconds * fix some tests * update error strings * Cleanup 2 * fix tests 2 * polish the job info map status items * More error message updating * Log the error that was caught * Oops fix ts * add documentation * fix i18n * fix mocha test * use the openUrl timeout as the default for navigation * fix comment Co-authored-by: Elastic Machine <[email protected]>
…x-closed-index * 'master' of github.com:elastic/kibana: (32 commits) [ML] Use Kibana's HttpHandler for HTTP requests (elastic#59320) [APM] Create settings page to manage Custom Links (elastic#57788) [Upgrade Assistant] Server-side batch reindexing (elastic#58598) completes navigation test (elastic#59141) [SIEM] Fixes dragging entries to the Timeline while data is loading may trigger a partial page reload (elastic#59476) [Reporting/Screenshots] Handle page setup errors and capture the page, don't fail the job (elastic#58683) [SIEM] [CASES] API with io-ts validation (elastic#59265) Use camelCase rather than snakeCase for plugin name (elastic#59461) [Maps] top term percentage field property (elastic#59386) Add custom action to registry and show actions list in siem (elastic#58395) [Search service] Add enhanced ES search strategy (elastic#59224) [Logs UI] Speed up stream rendering using memoization (elastic#59163) expand max-old-space-size for xpack jest tests (elastic#59455) Added possibility to embed connectors create and edit flyouts (elastic#58514) Revert "Temporarily disabling PR project mappings (elastic#59485)" (elastic#59491) Temporarily disabling PR project mappings (elastic#59485) [Endpoint] Fix alert list functional test error (elastic#59357) Rename status_page to statusPage (elastic#59186) Fix visual baseline job (elastic#59348) Extended AlertContextValue with metadata optional property (elastic#59391) ... # Conflicts: # x-pack/plugins/upgrade_assistant/common/types.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_actions.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.test.ts # x-pack/plugins/upgrade_assistant/server/lib/reindexing/reindex_service.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.test.ts # x-pack/plugins/upgrade_assistant/server/routes/reindex_indices/reindex_indices.ts
…, don't fail the job (#58683) (#59519) * [Reporting] Handle error if intercepted request could not be continued * [Reporting/Screenshots] Handle page setup errors and capture the page with errors shown * show warnings in UI * i18n todos * Cleanup an old troubleshooting task * set the default for all new timeout settings to 30 seconds * fix some tests * update error strings * Cleanup 2 * fix tests 2 * polish the job info map status items * More error message updating * Log the error that was caught * Oops fix ts * add documentation * fix i18n * fix mocha test * use the openUrl timeout as the default for navigation * fix comment Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: Elastic Machine <[email protected]>
Closes #57008
Closes #44305
Release note: Kibana Reporting can now tolerate timeouts when trying to read visualization information on the Kibana page, and will capture a screenshot of the page even when such errors happen. If a timeout error happens on a Reporting job, the errors will become part of the job document and will be shown in the listing of Reporting jobs in Management > Kibana > Reporting.
Opening the Kibana URL for screenshot capture and reading the Kibana visualization element info from the page involves a lot of moving parts that could fail the entire reporting job if an error happens somewhere in the flow. This PR changes that by breaking up the flow to setup setups and a capture step. If there are errors in the setup step, they are logged and added to the Reporting job metadata, but the capture step is still allowed to continue.
New configuration settings for Reporting
Each of these settings allows the user to control how much time to allow for each of the 3 main parts of
setup$
.openUrl
: the time allowed for Reporting to wait for the initial data of the Kibana page in the browser (time to resolve the.application
selector)waitForElements
: the time allowed for Reporting to wait for the visualization panels to load (time to resolve thedata-shared-items-container
selector)renderComplete
: the time allowed for Reporting to wait for each visualization to signal that rendering is doneNote that
xpack.reporting.queue.timeout
still exists, and represents the overall time that the Reporting job is allowed to finish, and still has a default of 120 seconds. None of the new timeout settings should exceed thexpack.reporting.queue.timeout
setting.Other changes in this PR:
scanPage
module. It had the job of waiting for visualization selectors while also checking the page for errors. The latter is a troubleshooting step that is no longer needed with this PR.i18n.translate
If a Reporting job completed with "warnings" it gets a new treatment on the Management > Reporting job listing page:
However, there will still be a download available:
Why does the download show an empty Kibana page? Because when the page loaded initially, there was a toast message error saying "Saved Object not found". That toast message dismissed itself after a while, but before Reporting captured the page.
Checklist
Delete any items that are not applicable to this PR.
[] Documentation was added for features that require explanation or tutorials[ ] This was checked for keyboard-only and screenreader accessibility[ ] This renders correctly on smaller devices using a responsive layout. (You can test this in your browser[ ] This was checked for cross-browser compatibility, including a check against IE11For maintainers