Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.procstat): Allow Multi filter #14815

Closed
wants to merge 18 commits into from
Closed

feat(inputs.procstat): Allow Multi filter #14815

wants to merge 18 commits into from

Conversation

tguenneguez
Copy link
Contributor

Summary

Add new option "multi_criterias" to be able to search process with multiples criterias.
For exemple :

  • user and exe
  • exe and pattern

It's possible with somethink like :

metricpass = "tags.process_name == "httpd""
But in this case, telegraf collect lot of information to put them in garbage can

Checklist

  • No AI generated code was used in this PR

Related issues

#6174

@tguenneguez tguenneguez changed the title Multi filter feat(inputs.procstat): Allow Multi filter Feb 14, 2024
@telegraf-tiger telegraf-tiger bot added area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Feb 14, 2024
@telegraf-tiger
Copy link
Contributor

@srebhan
Copy link
Member

srebhan commented Feb 14, 2024

@tguenneguez thanks for looking into this and thanks for your contribution. I think using statefull PID finders (i.e. internal variables) is not a good solution because this has proven to get complicated over time. Instead I would propose to add one or multiple filter section(s) with the following form

type Filter struct {
    Exec string
    ...
    Recursive bool
}

and then change the PID finders to process those filters. For pgrep this would mostly relate to combining the command-line arguments. For native this would mean you get the process list once and then filter each process according to the filter settings.

So this is a larger restructuring of the plugin. My proposal is to implement this for pgrep-only first and check if that works as I think the combination of command-line args should be easier. Duplicate code where necessary. Afterwards we can, in a second PR, also do it for the native finder...

What do you think?

@tguenneguez
Copy link
Contributor Author

Good morning,

I understand your thinking, but I see 2 objections:

  1. This new good generates a compatibility break.
  2. It will not be possible to set actions for certain filters (tags, metricpass, etc.)

On the other hand, with pgrep, we do not deport the process search to the arguments provided so the filtering is carried out by pgrep directly.
In "native" mode, we actually list all the processes and then look at those that match the criteria, which can be cumbersome on a machine with a lot of processes, but that's roughly what "pgrep" does when the we look at the sources.

Finally, if we go in this direction, the "exec" filtering is not fair.
For linux with pgrep, it filters on the process name while natively, the filtering is on the binary.
For this point, I was thinking of making another PR by adding "name" filtering. This would allow in "native" mode to search on the process name (/proc/(pid)/stat). In "pgrep" mode, this would be equivalent to "exec" for compatibility.

Good reflection
Thomas

@srebhan
Copy link
Member

srebhan commented Feb 23, 2024

@tguenneguez thanks for your thoughts! Regarding the naming etc, this should really be an independent PR!

Regarding your concerns/objections:

This new good generates a compatibility break.

The idea is to keep the old way around and add this feature without breaking the existing output if you use the old way of configuration. I'm emphasizing this as changing current code-paths has the risk of unexpected side-effects. If for example now people specified multiple criteria, only one of those "wins" and is used, all others are silently ignored. If you now change that, people will suddenly get different results.

Furthermore, with the current "flat" configuration it is kind of unclear what the emitted "aggregation metrics" really aggregate...

It will not be possible to set actions for certain filters (tags, metricpass, etc.)

Well that's also not possible today. :-) You can set tags per plugin-instance which is what most people want. You can still extend this per emitted metric using processors (e.g. the lookup processor).

@tguenneguez
Copy link
Contributor Author

Hello @srebhan

The idea is to keep the old way around and add this feature without breaking the existing output if you use the old way of configuration. I'm emphasizing this as changing current code-paths has the risk of unexpected side-effects. If for example now people specified multiple criteria, only one of those "wins" and is used, all others are silently ignored. If you now change that, people will suddenly get different results.
That's wy a had the "multi_criterias" option.
false (default), the first filter criterias is used.
true, all filter criterias are used.
With your method, the user will understant that without filter, only the first is used, if

Furthermore, with the current "flat" configuration it is kind of unclear what the emitted "aggregation metrics" really aggregate...
It's simple for the agregation, it's for each per plugin-instance

Other think, about using pgrep first. I think this is not good for some case :
Windows, you will install pgrep.
Linux, pgrep search in the 4096 characters and with some process (ie java), you will have more than 4096 char of args and the search doesn't work.
I think methode pgrep will be implement first and stay available for compatibilite, but I think it's not good idea.
In my use case I don't use pgrep methode ;-)

On the other hand, your proposal requires (it seems to me a very big development) that I could not take care of...
When we see that the initial request dates from Jul 26, 2019, it would be interesting to propose something ?

@srebhan
Copy link
Member

srebhan commented Feb 26, 2024

I didn't mean to suggest pgrep first. But no matter if you use pgrep or the native method there are two ways to so multi-criterion filtering:

  1. Iterate over the criteria. Get the set of PIDs for the criterion and the intersect it with the set of remaining PIDs that matched the previous criteria. For example if you want to get all processes of user jo with the pattern myapp.* you can first get all processes for user jo and then all processes with pattern myapp.* and then determine the PIDs which are available in both lists.
  2. Combine the criteria and match each process against this combined criterion. In pgrep this is simply done by combining the command-line options, for the native finder this would mean get the process-list, iterate over the processes and for each process iterate over the criteria until one doesn't match.

The second approach is more consistent and less expensive and would be my preferred one, i.e. the one I suggest.

@tguenneguez
Copy link
Contributor Author

Good morning,
I understand your logic and I am absolutely aware of it. But if you look at the logic of searching for processes whether in the pgrep source code or in the promitives of github.com/shirou/gopsutil/v3/process on "linux" ;-)
pgrep:
use of the procps_pids_get system call which lists all the PIDs
https://gitlab.com/procps-ng/procps/-/blob/master/src/pgrep.c?ref_type=heads#L744
native:
use of Processes which lists all PIDs
https://pkg.go.dev/github.com/shirou/gopsutil/v3/process#Processes
Secondly, the two solutions perform filtering

If you look at the code is developed in this direction.
The only difference between your approach and the current approach is to only make one call to retrieve the list of PIDs regardless of the number of simple or multiple filters made in a second step.
But that will pose another problem, the list of PIDs varies very quickly and by the time you get to the last filter criterion, the processes have changed and the filtering no longer works.
I think we need to limit as much as possible the moment when we retrieve the list of PIDs and the moment when we do the filtering.

Thanks
Thomas

@srebhan
Copy link
Member

srebhan commented Mar 6, 2024

@tguenneguez I drafted a PoC of my idea in #14948. What do you think?

# pid_finder = "pgrep"

## allow to search processus by multiple criterias (only exe, pattern, user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? s/processus/processes/

## Method to use when finding process IDs. Can be one of 'pgrep', or
## 'native'. The pgrep finder calls the pgrep executable in the PATH while
## the native finder performs the search directly in a manor dependent on the
## platform. Default is 'pgrep'
# pid_finder = "pgrep"

## allow to search processus by multiple criterias (only exe, pattern, user)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo? s/processus/processes/

@powersj
Copy link
Contributor

powersj commented Apr 4, 2024

I'm going to close this PR while we continue to work through Sven's in #14948. We need to come to an agreement on the expected behavior and desire for these filters over there.

@powersj powersj closed this Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/procstat feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants