-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect time slip and adjust uptime in the system input #7018
Comments
This is interesting, we do use |
I haven't dug in too deep, but it looks like
|
I think your gopsutil has a problem: it computes the uptime with a diff from boot time to now(): So yeah, it can't work if there's a big time slippage in between. The boot time stays at what it was when the go process was started. This explains the problem and why it is fixed by rebooting the process. |
Left a bug over there. I'll compensate by adding a manual telegraf |
I see, |
Would it make sense to add an option in the system plugin to get uptime through /proc/uptime? I don't know the policy regarding non-cross-platform options, but this could be trivial to implement. |
Let's try to handle it in gopsutil first, they are very responsive and I'm sure we will hear back soon. |
I didn't hear from them, and looking at the code I don't see any easy solution. I'll propose a PR which removes caching the value of boot_time but I suspect this could be rather expensive. In the meantime I have worked around the problem with this: # Read uptime from /proc/uptime instead of bogus gopsutil (see https://github.com/shirou/gopsutil/issues/837)
[[inputs.file]]
files = ['/proc/uptime']
data_format = "grok"
grok_patterns = ["%{BASE10NUM:uptime_sec:float} %{BASE10NUM:idletime_sec:float}"] |
PR is here: https://github.com/shirou/gopsutil/pull/857/files You'll have to see if they merge it and have a new version to update. |
In the event of gopsutil not acting on this, would you be open to a PR which adds the option (or switches) to get uptime from /proc/uptime on the linux version? Right now your system doesn't return the actual uptime. I suspect that this is way more prevalent than you think, because time adjustments from NTP are many and recurrent. On a host with a large uptime, I suspect this could amount to minutes (although I also suspect that in the scheme of things, minutes discrepancies across 2 years of uptime is not too bad for a monitoring system) |
Also seeing the same issue on an embedded device (Netgear ORBI) which lacks a RTC, as here a restart of the telegraf collector fixes the issue |
Same here for 4 different Raspberry Pi which were unplugged for a few days. Restarting the telegraf service helped. |
FYI my PR was merged in gopsutil, maybe someone more competent than me can update the telegraf go deps to reach the right commit in gopsutils to fix this issue here? |
If you can remind me when gopsutil v2.20.5 is released, I can update Telegraf to use it. I expect it will be in the next few days. |
Looks like you were right on time @danielnelson : https://github.com/shirou/gopsutil/releases/tag/v2.20.5 |
Relevant telegraf.conf:
System info:
Raspberry Pi 3A+ without RTC
Steps to reproduce:
Expected behavior:
Telegraf uses the correct uptime and reports the actual number of seconds since boot, independently from the time drift/adjustments (maybe use /proc/uptime?)
Actual behavior:
See Steps
Additional info:
Restarting telegraf correctly picks up the uptime.
The text was updated successfully, but these errors were encountered: