-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize stat calls for filesystem collector #1760
Comments
Thanks for the idea. In general, I'm open to this. We have done similar things in other collectors where metric retrieval is slow. Slightly off-topic. It's been a long debated "Should we filter network filesystems from the node_exporter by default". In most cases, it doesn't make sense to monitor network filesystems from nodes. But it's not always possible to monitor the filesystem on the fileserver as easily. This is why we've left this enabled by default. Unlike the other items in the ignored-fs-types list, network filesystems are "real", even if they are not local to the node_exporter itself. (Where I work, we drop NFS from our node_exporter) |
Not opposed but not convinced either. I like this to be simple as possible. Probably not an issue but this introduces some complexity in code and raises questions like how many calls do we want to do in parallel? Is it possible that doing a lot in parallel causes some contention in the kernel? |
@SuperQ Do you have an example of another collector you've done this for? I'm looking at possibly taking on this issue. |
The systemd collector |
@discordianfish Can we get this one into the next release? It's very important for production environments where one might have a couple dozen NFS mounts and missed scrapes add up a lot when say 2 of them are slow. |
Looks like this can be closed after a release of #1772 which has been merged to master |
Host operating system: output of
uname -a
node_exporter version: output of
node_exporter --version
node_exporter command line flags
Are you running node_exporter in Docker?
Nope
This is meant to bring up a discussion about parallelizing the stat calls done in the filesystem collector. For a couple of our filers we've got several mountpoints from the same filter. This means that when the remote filter is overloaded, each one of those stat call for that mountpoint takes a long time. In a few cases this has lead to scrape timeouts. This is especially problematic when the mount is just on the edge of becoming stale because it's never marked as stale and subsequent scrapes continue to timeout.
Any opinions on doing this? Thanks :)
The text was updated successfully, but these errors were encountered: