Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Reporting] Add an error code field to explain the nature of the job failure #125139

Closed
tsullivan opened this issue Feb 9, 2022 · 2 comments
Closed
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort

Comments

@tsullivan
Copy link
Member

tsullivan commented Feb 9, 2022

Part 1: A report job can fail for a number of reasons. The code tries to capture any error that is available, but the nature of some errors makes that infeasible:

  1. User shut down Kibana while a report job was executing.
    • We should try to utilize the stop cycle of Kibana plugins: if there is a job running, then fail the job with a specific error message
    • This may still not be feasible: it's not clear if the job can be updated in ES while Kibana is shutting down
  2. The system ran out of memory / resources, or a worker thread ran out of memory
    • This is only feasible if Kibana is still running after the out-of-memory error
  3. If kibana.yml has xpack.screenshotting.networkPolicy.rules defined, Reporting shuts down the browser when a URL that violates the policy is encountered

Part 2: Today, a security error is ignored if encountered during CSV report execution. This creates confusion if authentication worked at the beginning of execution, but stopped working during execution - which can happen when using authentication tokens that have a short expiration. We should handle this event specifically with an error code. When this happens, the user should be able to download as much CSV as could be compiled, so the report job status should be completed_with_warnings.

In total, this issue covers adding codes for the following events:

The error code will be saved as a new field in the report job mapping. It will be a mapped type, allowing us to aggregate by different types of errors. That becomes useful for telemetry and Monitoring of Reporting.

@tsullivan tsullivan added the bug Fixes for quality problems that affect the customer experience label Feb 9, 2022
@botelastic botelastic bot added the needs-team Issues missing a team label label Feb 9, 2022
@tsullivan tsullivan added (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Team:AppServicesUx labels Feb 9, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app-services (Team:AppServicesUx)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Feb 9, 2022
@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 9, 2022
@jloleysens
Copy link
Contributor

I think the only remaining work is to add these error_codes to our telemetry payloads, then wait until it hits production. Opened an issue for doing so: #127460

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:large Large Level of Effort
Projects
None yet
Development

No branches or pull requests

3 participants