Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs for gomaxprops option #416

Merged
merged 14 commits into from
Aug 25, 2023
Merged

Add docs for gomaxprops option #416

merged 14 commits into from
Aug 25, 2023

Conversation

kilfoyle
Copy link
Contributor

@kilfoyle kilfoyle commented Aug 17, 2023

This adds documentation for the GOMAXPROPS option to limit CPU usage by Elastic Agent, implemented via elastic/elastic-agent#3179

See docs preview

@nimarezainia @rdner This seemed to me like the best spot in the docs for this, right below the minimum requirements for installing agent. Please let me know if I've missed anything or if the API call may need some fixing up.

[discrete]
=== Limiting {agent} resources

If you need to limit the amount of resource consumed by {agent} you can use the `agent.limits.go_max_procs` configuration option. This option sets the maximum number of CPUs that can be executing simultaneously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kilfoyle let's rewrite this section please. this configuration is not limiting CPU used by the agent, it's "limiting the CPU used by the underlying beats that are supervised by agent." So for example, setting agent.limits.go_max_procs to 1, would mean that only the beats supervised by the agent will be limited to 1 vCPU. "elastic-agent status" would show the user how many beats are being supervised by the agent.

@rdner does this value have to be an integer or are decimals also acceptable? (like half a vcpu)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification Nima. I've updated the section as shown. Let me know if I've missed anything:

Screenshot 2023-08-21 at 4 19 42 PM

@rdner I'll add a note about whether the value has to be an integer once you have a chance to confirm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this configuration is not limiting CPU used by the agent, it's "limiting the CPU used by the underlying beats that are supervised by agent." So for example, setting agent.limits.go_max_procs to 1, would mean that only the beats supervised by the agent will be limited to 1 vCPU. "elastic-agent status" would show the user how many beats are being supervised by the agent.

This is not accurate.

This parameter sets how many CPUs the Go runtime can schedule Go routines on. It does not guarantee that the the given CPU count is used. It might use more for internal purposes of the runtime.

This is what the official documentation says:

The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. There is no limit to the number of threads that can be blocked in system calls on behalf of Go code; those do not count against the GOMAXPROCS limit. This package's GOMAXPROCS function queries and changes the limit.

https://pkg.go.dev/runtime

The value is set for both the agent and underlying Beats it runs, it does not affect Endpoint because it's not written in Go.

@rdner does this value have to be an integer or are decimals also acceptable? (like half a vcpu)

it's only integer, not decimal.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this parameter exposed in the Beats configuration, by the way. It might make sense to make references and updates to that too https://www.elastic.co/guide/en/beats/filebeat/current/configuration-general-options.html#_max_procs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdner thanks. The "threads that can execute user-level Go code simultaneously" confuses me a bit. Can you explain what happens when each Beat under agent has GOMAXPROCS set to 1? let's just say we have 4 beats (ignoring that agent is also in GO. My understanding is that each Beat would then be limited to maximum of 1 thread. So in totality of the agent (again ignoring agent itself) we are limited to 4 CPUs. (of course each beat is limited to only 1).

@cmacknz can i get your eyes on this docs change also. thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nimarezainia there is no simple explanation, unfortunately. If we want to tell our customers the truth it has to be very complicated. The Go runtime is running its own OS threads on any amount of CPUs it has access to. For example, the garbage collector. It's a runtime implementation detail that might change in a later Go release.

What GOMAXPROCS=1 is limiting is what's running on that runtime, so the code that we build and run on it. For example, all the Filebeat code running in parallel will be scheduled only using a single CPU. However, the second CPU might be used by the garbage collector or any other runtime thread. So, it's not accurate to say that we limit everything to a single CPU.

@cmacknz might have a better explanation that I do but it does not change the facts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplest explanation for GOMAXPROCS is that it limits the number of operating system threads that can be executing Go code simultaneously. The majority of users will not care for these low level details, as long as our explanation is approximately correct.

GOMAXPROCS accounts for all user level Go code as far as I can tell, that is the code that executes in kernel user space. We can link them back to the definition of GOMAXPROCS in the Go runtime and interested readers can dig as far as they want to. https://pkg.go.dev/runtime#GOMAXPROCS

For completeness, the GCCPUFraction parameter describes GOMAXPROCS in a way that indicates that it does include CPU time spent in the garbage collector.

        // GCCPUFraction is the fraction of this program's available
	// CPU time used by the GC since the program started.
	//
	// GCCPUFraction is expressed as a number between 0 and 1,
	// where 0 means GC has consumed none of this program's CPU. A
	// program's available CPU time is defined as the integral of
	// GOMAXPROCS since the program started. That is, if
	// GOMAXPROCS is 2 and a program has been running for 10
	// seconds, its "available CPU" is 20 seconds. GCCPUFraction
	// does not include CPU time used for write barrier activity.
	//
	// This is the same as the fraction of CPU reported by
	// GODEBUG=gctrace=1.
	GCCPUFraction [float64](https://pkg.go.dev/builtin#float64)

Copy link
Contributor Author

@kilfoyle kilfoyle Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cmacknz, that seems very clear! Here's some proposed text:

If you need to limit the amount of CPU consumption you can use the agent.limits.go_max_procs configuration option. This parameter limits the number of operating system threads that can be executing Go code simultaneously, thereby limiting the CPU used by both the agent and the underlying {beats} that it supervises. The agent.limits.go_max_procs option accepts an integer value not less than 0, which is the default value that stands for "all available CPUs".

The agent.limits.go_max_procs configuration option is similar to the {beats} {filebeat-ref}/configuration-general-options.html#_max_procs[max_procs] setting. For more detail about the option, refer to the link:https://pkg.go.dev/runtime#GOMAXPROCS[GOMAXPROCS] function in the Go runtime documentation.

To enable the option, run a <<fleet-api-docs,{fleet} API>> request from the {kib} {kibana-ref}/console-kibana.html[Dev Tools console] to override your current {agent} policy and add the go_max_procs parameter. For example, to limit Go code to two operating system threads, run:

...API example...

@kilfoyle
Copy link
Contributor Author

@rdner based on your correction I've updated the text to the following, with a link to the Beats page. How does this look?

Screenshot 2023-08-21 at 6 08 46 PM

BTW, the Beats docs are owned by the dev team but I'm happy to review any changes..

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be clearer that the limit here applies to each process started by agent independently. I add an example to hopefully clarify this. Feel free to wordsmith my suggestions if needed, the way this works is a bit awkward. It is more of a stop gap limit until we can build a more intuitive one.

@kilfoyle
Copy link
Contributor Author

Thanks @cmacknz! I've added the suggestions and made a couple of minor cosmetic changes as well.

I'm still a bit confused though: Does running two Beats result in three processes? Or should the example be that agent is supervising three Beats?

Screenshot 2023-08-23 at 9 46 21 AM

@cmacknz
Copy link
Member

cmacknz commented Aug 23, 2023

I'm still a bit confused though: Does running two Beats result in three processes? Or should the example be that agent is supervising three Beats?

There are three total processes: Elastic Agent itself and the two Beats. Agent is one of the processes here.

@kilfoyle
Copy link
Contributor Author

Here's our latest version:

Screenshot 2023-08-23 at 10 17 29 AM

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@nimarezainia
Copy link
Contributor

Thanks for the further clarifications here. It looks good to me. It is a complex thing to have to explain and the answer is always dependent on what the user has deployed so I think this explanation takes us a long way.

@kilfoyle kilfoyle merged commit f279aaa into elastic:main Aug 25, 2023
mergify bot pushed a commit that referenced this pull request Aug 25, 2023
* Add docs for gomaxprops option

* Update command description wrt Beats CPU

* Fixup

* Link to Filebeat max_procs; clarify must be integer

* Update setting description

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Denis <[email protected]>

* Update GOMAXPROCS description

* touchup

* touchup

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

* touchup

* touchup

---------

Co-authored-by: Denis <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit f279aaa)
kilfoyle added a commit that referenced this pull request Aug 25, 2023
* Add docs for gomaxprops option

* Update command description wrt Beats CPU

* Fixup

* Link to Filebeat max_procs; clarify must be integer

* Update setting description

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Denis <[email protected]>

* Update GOMAXPROCS description

* touchup

* touchup

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

Co-authored-by: Craig MacKenzie <[email protected]>

* Update docs/en/ingest-management/elastic-agent/install-elastic-agent.asciidoc

* touchup

* touchup

---------

Co-authored-by: Denis <[email protected]>
Co-authored-by: Craig MacKenzie <[email protected]>
(cherry picked from commit f279aaa)

Co-authored-by: David Kilfoyle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants