-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scope stops collecting container metrics #1795
Comments
what do you mean by "stops"? you see metrics and then you don't? |
Yeah, was testing out the 0.17.0 release, clicking around. At some point I started getting JS errors. Investigating, it was the sparklines failing to render on the details panel of an alpine container. Where there had been sparklines a few minutes ago. |
After the metrics stopped I
|
This did happen last week too. It might be my env. What command should I try running next time this happens?
|
@foot Could you try to reproduce with 0.16.2 to confirm whether it's a regression?
The logs and the report are the first stage yep. |
I can't solidly repro :(. Here are logs from last time it happened. If no-one has seen this on dev/prod I wouldn't block the release, I'll keep an eye out for it and start running scope w/ |
Uhm, this is most probably the culprit:
However, the latter type of message could be a leftover (expected error) from #1687 . It's difficult to tell because we don't print the error but I also get that for containers which stopped existing. What docker version are you running? I recently bumped the docker client library ( #1787 ), maybe that's part of the problem? Can you check whether the log errors are systematic? (regardless of the UI error) Also, the logs suggest you were pausing/unpausing containers. I also get the second error when doing so. In the meantime, I am going to try silencing the expected stats errors. |
@foot I am starting to suspect that the problem is that metrics are null while the container is paused, which I believe is legitimate. I have managed to reproduce with the following error in the UI: Is that what you saw? I am worried about the decoding error though. That one I cannot reproduce. |
Yes!!! Nice one @2opremio! I can't repro immediately, do you have to wait a little after pausing for the metrics to be dropped? |
I am not sure, I just paused and unpaused a few times. I am more worried about the |
Still can't repro w/ the pause/unpause, have been getting a couple of the other error though, what is the id it gives? I can't find any containers/images w/ that id. |
Are you referring to the Would you mind rebuilding scope with the latest commits in the release branch? |
Nope, the other one:
What is the |
I got the system into a bad state again! By stopping and starting and pausing and restarting a container until docker stopped responding.
So seems to be some undefined docker state will give the samples: null |
It's the container ID |
Cool, I couldn't find But I guess we should handle null samples somewhere, BE or FE? Cleaner to strip them on the BE, but kind of provides some extra info about the state of the system through the http endpoints too which is nice. |
@foot Have you run into this again? |
Nope I have not, closing for now, will re-open if it comes up again. |
From node-details call,
node.metrics.0.samples
is null.Which the UI doesn't handle very gracefully right now.
Process/Host is still collected.
env:
docker-machine
report.2.json.zip
The text was updated successfully, but these errors were encountered: