Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mtail keeps on running if producer crashes when reading from stdin #331

Closed
ema opened this issue Jul 29, 2020 · 6 comments · Fixed by #868
Closed

mtail keeps on running if producer crashes when reading from stdin #331

ema opened this issue Jul 29, 2020 · 6 comments · Fixed by #868
Assignees
Labels
bug This is considered a bug and shall get fixed mtail-Operating Issues related to deploying and running mtail

Comments

@ema
Copy link

ema commented Jul 29, 2020

Hi!

Wikimedia uses mtail for various purposes, including exposing varnish statistics. To do that, we've got a very simple shell script called varnishmtail. As you can see, the script boils down to varnishncsa | mtail -logs /dev/stdin. We used to run mtail with -logfds 0 and then moved to -logs /dev/stdin when -logfds was removed (see this comment on mtail issue #3).

Now, there's a problem we've discovered recently. If the producer dies (varnishncsa in this case) mtail keeps on running normally, hence the varnishmtail script keeps on running, and the systemd unit responsible for the whole thing does not notice anything. However, for all purposes the system is at that point not functioning, given that stats aren't updated any longer. See a more detailed description on our bug tracking system.

I think that, in the special case of when mtail is reading from stdin, receiving EOF should make the process exit. Thoughts?

@jaqx0r
Copy link
Contributor

jaqx0r commented Jul 30, 2020 via email

@ema
Copy link
Author

ema commented Jul 30, 2020

I wonder if there are any weird effects by doing so though, does it need another flag to turn on that behaviour? I think it doesn't, if there are no other log files open to read and the FD closes then it seems there's nothing it can do afterwards.

I also don't think another flag is needed.

In the interim if you're using bash can you turn on pipefail mode to make the shell kill mtail?

That is the first thing I thought of doing too! However, pipefail is about the exit status of the pipeline, not about making any command stop. If mtail keeps on running, the shell waits for it whether pipefail is enabled or not.

wmfgerrit pushed a commit to wikimedia/operations-puppet that referenced this issue Aug 5, 2020
As a workaround for T259020, run the varnishncsa | mtail pipeline in the
background, checking that varnishncsa is still up in a while loop. This
is an evidently baroque hack needed until mtail is fixed to do the right
thing when receiving EOF on stdin: google/mtail#331

Bug: T259020
Change-Id: I6a118e659b3fe7298a7a19ca3268924e8719638b
@jaqx0r jaqx0r self-assigned this Jan 13, 2021
@jaqx0r jaqx0r added the bug This is considered a bug and shall get fixed label Jan 13, 2021
@jaqx0r
Copy link
Contributor

jaqx0r commented Jan 15, 2021

As of HEAD right now I think that mtail will exit properly when stdin is closed if you only have one log (/dev/stdin) and you also add the flag --one_shot.

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

If you can try out in the meantime that would be nice, but not necessary.

@shogos3
Copy link

shogos3 commented Mar 5, 2021

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

Please do this. Because it looks --one_shot does not support bucket. I deserve to use stdin with bucket.

Thanks!

@jaqx0r jaqx0r added the mtail-Operating Issues related to deploying and running mtail label Mar 21, 2021
@jaqx0r
Copy link
Contributor

jaqx0r commented Apr 27, 2024

There's a bit to untangle here.

The tailer can easily shut itself down once there are no more log patterns to watch; i.e. when stdin is closed it is removed from the pattern list.

The VM runtime can shut down once the tailer has closed the lines channel.

The hard part is getting the exporter and http server to close once the VM is done. There's some TODOs I left around indicating I should do better with the server shutdown code which give me some hints.

jaqx0r added a commit that referenced this issue May 6, 2024
The `Exporter` and `HTTPServer` are not connected by channels to the `Runtime`, only
indirectly through mutating effects on the Store.  As such we can't rely on the
`Exporter` to shut down automatically when the `Runtime` shuts down.

Here instead we remove the `WaitGroup` and allow an `Exporter` to be shut down
explicitly after the `WaitGroup` is `Done`.

Issue: #331
@jaqx0r
Copy link
Contributor

jaqx0r commented May 7, 2024

Well it turns out it's not so easy to shut down the tailer because it is also looking for new log patterns, so it needs to know that there are no more logstreams, and that there will never be any more logstreams.

I think I have figured out a way to make that happen, needs a bit more refactoring in the pattern globbing code.

jaqx0r added a commit that referenced this issue May 26, 2024
The work of globbing a single pattern is separate from iterating over all
patterns, setting us up to start a goroutine per pattern instead of an
iterator.

Issue: #331
jaqx0r added a commit that referenced this issue May 26, 2024
We would like to use this in per-pattern goroutines.

Issue: #331
jaqx0r added a commit that referenced this issue May 26, 2024
Instead of collecting sockets and tailing them after, just tail them straight
away.

This makes the "early exit" check for "no things" broken, so replace that with
a check in the final shutdown handler that there were no active globs.  This
will be replaced in the next few changes once migrated to goroutines and we can
use only the `WaitGroup` for counting globs, too, but in the meantime this is
more correct than the previous code.

Issue: #331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This is considered a bug and shall get fixed mtail-Operating Issues related to deploying and running mtail
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants