Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider the timeout value of connection_idle_timeout / idle_connection_timeout #40749

Open
toby-sutor opened this issue Sep 11, 2024 · 3 comments
Labels
discuss Issue needs further discussion. enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@toby-sutor
Copy link

Describe the enhancement:
With version 8.12. we changed the default connection timeout from Beats/Agent to Elasticsearch from 60 to 3 seconds. As a result, Agents have to reconnect to Elasticsearch more frequently, leading to potential situations where the DNS severs might get spammed.
In most cases, this does not seem to be an issue when a local DNS server is installed on the OS. However, in some scenarios this is not the default or desired, leading to unexpected high network requests.
As such, it is questionable if closing connections after three seconds is feasible, given that this is supposed to be an HTTP connection where users would expect an active keep.alive. A more balanced value like 10-30 seconds might be a better compromise for the default value.

References:

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 11, 2024
@ycombinator ycombinator added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Sep 12, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2024
@ycombinator ycombinator added the discuss Issue needs further discussion. label Sep 12, 2024
@cmacknz
Copy link
Member

cmacknz commented Sep 13, 2024

Thanks, the defaults are a compromise of many use cases. If we get more reports for this we can re-evaluate the value.

Iff you haven't already you can change this. https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#_preset

The latency preset is approximately what the old defaults were. Depending on your data volume you may better off with the throughput preset. Or you could just change the idle_connection_timeout by itself if you haven't already (this requires not setting a preset or using preset: custom).

@strawgate
Copy link
Contributor

strawgate commented Sep 13, 2024

The balanced preset avoids keep alive intentionally as each Agent would be keeping 6-8 active connections to Elasticsearch in normal use. As @cmacknz mentioned, this is a trade-off across use-cases. In low-throughput use-cases, the balanced setting should result in <1 DNS query every 2 seconds in most use-cases which I wouldn't consider to be excessive.

If the customer has a large number of low-throughput Agents, they may find that the scale preset is more appropriate for their use-case as Agents send data less often and thus perform fewer requests (even though the idle timeout is lower), and by extension fewer DNS requests. Or they can follow @cmacknz 's recommendation and customize the settings as needed.

Similarly, if they are medium to high throughput clients, the throughput preset ensures that the connection is kept alive in most cases (though it also increases worker count and max memory consumption)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

5 participants