-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Monitoring] De-duplicate pipeline ids based on the ephemeral_id changing #49978
[Monitoring] De-duplicate pipeline ids based on the ephemeral_id changing #49978
Conversation
Pinging @elastic/stack-monitoring (Team:Monitoring) |
💚 Build Succeeded
|
💚 Build Succeeded
|
💚 Build Succeeded
|
@chrisronline Can you please merge the fix to 7.4 branch as well? As of now, 7.4 is the current branch as per documentation so I would request you to release this fix for 7.4 as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Works as expected 👍
Thanks for the instructions on reproducing the behavior
Approved so long as @cachedout addresses your concerns in the description and merge upstream passes
This could work but it has an important caveat that I'll describe below. I think that what we may want here is actually to ensure that the Scenario 1A simple pipeline definition
Below is the JSON output for two keys:
Scenario 2Using the same pipeline definition as Scenario 1 and simply restarting Logstash after collecting data from Scenario 1
As we can see, the ephemeral identifier has changed but the hash (and the ID which is not shown above) both remain the same. Scenario 3Changing the pipeline definition and restarting LogstashHere we change the pipeline definition by altering the sleep timer to 2 seconds.
SummaryIn every scenario, the pipeline identifier is the same and is returned as However, when the pipeline definition is modified it becomes an ostensibly different pipeline, despite having the same ID as a previously defined pipeline. As such, we may wish to consider it as a different entity for the purposes of monitoring. The counterpoint, of course, is that this may be surprising to the user especially if they make many changes to the same pipeline definition. So, it may be that using the ID is the right thing to do from the perspective of the user who might be more inclined to believe that they're working with the same pipeline over time even though Logstash sees these internally as different, but I think this is worthy of a further discussion. I'll leave it there and wait for comments before we come to a decision. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left my review notes in comment on the PR itself.
Sure! |
@cachedout As a user, the following matters to me. When a pipeline is defined as follows:
The changes to the pipeline are done in the Taking a step back, I would like to pose the following situation. ** Box: 1 **
** Box: 2 **
The error above can be deliberate or it can also be a typo by the user. From a design perspective, one option is to say, all pipelines with the same id will be considered together in the aggregation on the monitoring page. It is an easier to implement option and puts the onus of maintaining correct config on the user. Another option is to aggregate on id as long as the config matches. It would require deep comparision of each pipeline. Yet another option would be not let the option of having such a config as shown above. In that case, monitoring page should throw an error when loaded. This however rules out the cases when the user is creating such a config deliberately. |
Thanks all for the thoughts!
I ran some tests locally and I don't think this has much merit. For the listing page (versus the actual pipeline page), we are showing aggregated data from the last two buckets in the sequence (two so we can perform any necessary derivatives). It's unlikely that the hash/ephemeral_id change so frequently that in a Assuming everyone is okay with it, I think we ignore the concern and move on with the PR. |
@elasticmachine merge upstream |
💚 Build Succeeded
|
💔 Build Failed |
💚 Build Succeeded |
…ging (elastic#49978) * De-duplicate pipeline ids based on the ephemeral_id changing * Add tests
…ging (elastic#49978) * De-duplicate pipeline ids based on the ephemeral_id changing * Add tests
…ging (elastic#49978) * De-duplicate pipeline ids based on the ephemeral_id changing * Add tests
…ger-ace-theme * 'master' of github.com:elastic/kibana: (54 commits) [ML] Fixes word wrap in Overview page sidebar on IE (elastic#50668) Upgrade to TypeScript 3.7.2 (elastic#47188) fix: hide 'edit' button for mobile for dashboards (elastic#50639) fixes conditional links tests (elastic#50642) [SIEM] Fix IE11 timeline drag and drop issue (elastic#50528) [SIEM] Add SavedQuery in Timeline (elastic#49813) chore(NA): remove code plugin from codeowners (elastic#50451) [DOCS] Adds documentation on telemetry settings (elastic#50739) [Logs UI] Add IE11-specific CSS fixes for anomalies table (elastic#49980) [DOCS][SIEM]: Change Kibana advanced settings to match UI (elastic#50679) Change URLs for support menu (elastic#50700) [Reporting] Remove any types and references to Hapi (elastic#49250) [DOCS] Adds note about backups to Upgrade doc (elastic#50525) [Logs UI] Improve infra plugin compatibility with TS 3.7 (elastic#50491) [Task manager] Adds ensureScheduling api to allow safer rescheduling of existing tasks (elastic#50232) [DOCS] Adds link to content security policy doc (elastic#50698) Remove duplicate but in error message (elastic#50530) [ML] DF Analytics: Ensure creation flyout can be opened when no jobs exist (elastic#50417) Add filebeat notice (elastic#49065) [Monitoring] De-duplicate pipeline ids based on the ephemeral_id changing (elastic#49978) ... # Conflicts: # x-pack/legacy/plugins/grokdebugger/public/components/grok_debugger/brace_imports.ts
Resolves #49462
This PR fixes an issue where users saw duplicate pipeline ids in their pipeline listing page because they had restarted their pipelines (which causes the
ephemeral_id
to change).In our logic to get the entire list of pipeline ids, we are doing a composite agg across
id
,hash
andephemeral_id
because we need those lists in order to reduce the number of buckets created in the subsequent queries (see here for more information). If theephemeral_id
changes over time for a given pipeline, two results will show for the singleid
resulting in the behavior described in #49462To reproduce, simply setup logstash with multiple pipelines and enable monitoring. Navigate to the pipelines listing page to verify they show up there, then restart logstash. Before this fix, duplicates should show up.
Notes
I'm fairly sure it isn't likely we show incorrect data - by de-duplicating, we're only fetching data for eachid
once, even if there are multipleephemeral_id
ids for thatid
. Maybe @cachedout can help educate me a bit hereTODO
Add tests