-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of autodiscover #28524
Conversation
This pull request does not have a backport label. Could you fix it @vjsamuel? 🙏
NOTE: |
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
|
/test |
/package |
Pinging @elastic/integrations (Team:Integrations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks @vjsamuel!
return | ||
} | ||
// accumulate updates for every event that comes in so that we dont overwhelm | ||
// the reloading engine with too many updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea for high loads 👍 but maybe 10 seconds is too much wait for normal use cases?
Maybe an option is to look for an algorithm that waits till a maximum of 10 (or more seconds) if there is a continuous stream of events, but continues earlier if there are isolated events. It could be basically the same as now, but with a case branch with a timer that is reset every time an event arrives (or a time.After()
).
var burstPeriod = 1 * time.Second
...
burstTicker := time.NewTicker(burstPeriod)
...
for {
burstTicker.Reset(burstPeriod)
select{
case event ...:
...
case <- ticker.C:
doBreak = true
case <- burstTicker.C:
doBreak = true
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just my 2 cents, should hash result be cached? the reason for this ask is in autodiscover flow, config hash is calculated twice, firstly during processing, then in reload, and reload actually reloads all the configs that are passed to the reload function, in autodiscover case, it's all the "discovered" configs, which can be a huge list of configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, #29048 can certainly reduce the number of configurations reloaded, but it doesn't do anything to reuse hash calculations. It can be a good idea to calculate them only once for each config, but I think this can be done as a different change.
go wait.Until(func() { | ||
for w.process(w.ctx) { | ||
} | ||
}, time.Second*1, w.ctx.Done()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this have the risk of out of order processing updates and deletes of the same resource? This could have the risk of updating, and then adding again a resource that has been previously deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vjsamuel any follow-up on this?
@vjsamuel several tests related to autodiscover are failing, could you take a look? |
This pull request is now in conflicts. Could you fix it? 🙏
|
Thank you for pushing these @vjsamuel! Any update about the status of this PR? |
No updates since 2 months, conflicts in PR, closing. |
Ok to closing, but I think it would be worth looking at the ideas here, specially accumulating events can help a lot on startup on big clusters. @ChrsMark can you keep track of this? |
Enhancement
What does this PR do?
This PR does the following:
Why is it important?
On startup the creation of metricbeat/filebeat modules can take too long without these changes.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs