-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add detailed docker network summary stats #25354
Add detailed docker network summary stats #25354
Conversation
This pull request is now in conflicts. Could you fix it? 🙏
|
💚 Build Succeeded
Expand to view the summary
Build stats
Trends 🧪❕ Flaky test reportNo test was executed to be analysed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this added, this is going to give more consistency when monitoring at the host and the container level.
I think we are going to need to do this as a new metricset as we did in the system module. As in there, this network
metricset reports events per interface, and these new counters are per container. This new metricset could be disabled by default, and then we wouldn't need the new configuration flag.
I also wonder if we really need to sum all the counters of all the processes, see my comment about this.
This PR adds detailed network stats to the
docker/memory
metricset.
By the way, this first sentence of the description confused me a little bit 😅
return &sysinfotypes.NetworkCountersInfo{}, errors.Wrapf(err, "error fetching network counters for PID %d", intPID) | ||
} | ||
|
||
summedMetrics = sumCounter(summedMetrics, counters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this correctly I wonder if we are summing here several times the same data. I think that all processes in the same network namespace report the same network counters.
For example if I start two listening netcats in a container, I see the two sockets under both pids:
$ for pid in $(pidof nc);do echo $pid; cat /proc/$pid/net/tcp; done
3119046
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489370 1 0000000000000000 100 0 0 10 0
1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489816 1 0000000000000000 100 0 0 10 0
3118539
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489370 1 0000000000000000 100 0 0 10 0
1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489816 1 0000000000000000 100 0 0 10 0
I also see them if I start a cat inside the container:
$ for pid in $(pidof cat);do echo $pid; cat /proc/$pid/net/tcp; done
3115294
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
0: 00000000:1F9B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489370 1 0000000000000000 100 0 0 10 0
1: 00000000:1F9C 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 36489816 1 0000000000000000 100 0 0 10 0
This also happens if I look to other files such as /proc/$pid/net/netstat
or /proc/$pid/net/snmp
, the content is the same for all the processes in the same namespace.
Not sure about the API call, but docker container top
lists all the processes in the container, so here we may be summing several times the same information if we are looping over all the processes in the container.
I think that we would only need to report the counters of only one pid in the container. Or to be more correct, given that inside a container there can be processes of multiple network namespaces (dind case for example), for correctness we would need to sum the counters once per namespace found, that's it to sum the counters once per different /proc/<pid>/ns/net
found.
@jsoriano No idea how I overlooked the namespacing, you're definitely right, the counters share a namespace. I was sort of going back and forth on if I should create a new metricset for this. This didn't seem quite "useful" enough on its own, but you might be right. In general, most containers only have one interface, so I wasn't too worried about the awkwardness of reporting this alongside the interface stats, but it also seems sub-optimal. |
Alright, I completely refactored this. It's now in its own metricset, and I also removed a lot of the summing logic--we shouldn't have multiple network namespaces per container I think, so we use Inspect to grab a PID and report the network counters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor, it looks great. Added some small suggestions, after they are addressed I think this can be merged.
we shouldn't have multiple network namespaces per container
This is actually possible in dind (docker in docker) and some other fancier cases, so we might actually sum counters of different namespaces found inside the same container. But I don't think we need to cover these cases, and actually the system module probably doesn't cover multiple namespaces in the host. So current approach looks good to me.
This makes me think that maybe we could have a linux/network_namespace
metricset that collects all these metrics per namespace from the host.
metricbeat/module/docker/network_summary/network_summary_test.go
Outdated
Show resolved
Hide resolved
metricbeat/module/docker/network_summary/network_summary_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Pinging @elastic/integrations (Team:Integrations) |
/test |
/package |
/packaging |
/test |
@fearful-symmetry current failure is not related, I think you can merge this. |
* add detailed network summary stats to docker/memory * add changelog * update deps, notice * update ref yml * fix test * move to new metricset, refactor * revert changes to docker/network * remove config file * update xpack * small fixes (cherry picked from commit f7e80da)
…25601) * Add detailed docker network summary stats (#25354) * add detailed network summary stats to docker/memory * add changelog * update deps, notice * update ref yml * fix test * move to new metricset, refactor * revert changes to docker/network * remove config file * update xpack * small fixes (cherry picked from commit f7e80da) * update docs
What does this PR do?
This PR Creates a new metricset,
docker/network_summary
. This uses network namespaces in order to fetch per-network namespace stats for a given container, similar to the counters insystem/network_summary
I also tried to achieve some level of parity with
system/network_summary
, as it contains many of the same fields. As a result of this, we're using the same fields and same mapping methods, which have been moved to libbeat.Why is it important?
A few users have been requesting data at this level, but associated with a given cgroup/container.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
docker/network_summary
in the module options.docker.network_summary
is reporting, and that fields are populated.