mtail keeps on running if producer crashes when reading from stdin #331

ema · 2020-07-29T09:22:05Z

Hi!

Wikimedia uses mtail for various purposes, including exposing varnish statistics. To do that, we've got a very simple shell script called varnishmtail. As you can see, the script boils down to varnishncsa | mtail -logs /dev/stdin. We used to run mtail with -logfds 0 and then moved to -logs /dev/stdin when -logfds was removed (see this comment on mtail issue #3).

Now, there's a problem we've discovered recently. If the producer dies (varnishncsa in this case) mtail keeps on running normally, hence the varnishmtail script keeps on running, and the systemd unit responsible for the whole thing does not notice anything. However, for all purposes the system is at that point not functioning, given that stats aren't updated any longer. See a more detailed description on our bug tracking system.

I think that, in the special case of when mtail is reading from stdin, receiving EOF should make the process exit. Thoughts?

The text was updated successfully, but these errors were encountered:

jaqx0r · 2020-07-30T04:09:44Z

That sounds like a good idea. I wonder if there are any weird effects by doing so though, does it need another flag to turn on that behaviour? I think it doesn't, if there are no other log files open to read and the FD closes then it seems there's nothing it can do afterwards. In the interim if you're using bash can you turn on pipefail mode to make the shell kill mtail?

…

On Wed, 29 Jul 2020, 19:22 Emanuele Rocca, ***@***.***> wrote: Hi! Wikimedia uses mtail for various purposes, including exposing varnish statistics. To do that, we've got a very simple shell script called varnishmtail <https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/varnish/files/varnishmtail>. As you can see, the script boils down to varnishncsa | mtail -logs /dev/stdin. We used to run mtail with -logfds 0 and then moved to -logs /dev/stdin when -logfds was removed (see this comment on mtail issue #3 <#3 (comment)>). Now, there's a problem we've discovered recently. If the producer dies (varnishncsa in this case) mtail keeps on running normally, hence the varnishmtail script keeps on running, and the systemd unit responsible for the whole thing does not notice anything. However, for all purposes the system is at that point not functioning, given that stats aren't updated any longer. See a more detailed description on our bug tracking system <https://phabricator.wikimedia.org/T259020>. I think that, in the special case of when mtail is reading from stdin, receiving EOF should make the process exit. Thoughts? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#331>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXFX67B34MC2QL7KRXHS4DR57S5DANCNFSM4PLMQLTQ> .

ema · 2020-07-30T07:54:22Z

I wonder if there are any weird effects by doing so though, does it need another flag to turn on that behaviour? I think it doesn't, if there are no other log files open to read and the FD closes then it seems there's nothing it can do afterwards.

I also don't think another flag is needed.

In the interim if you're using bash can you turn on pipefail mode to make the shell kill mtail?

That is the first thing I thought of doing too! However, pipefail is about the exit status of the pipeline, not about making any command stop. If mtail keeps on running, the shell waits for it whether pipefail is enabled or not.

As a workaround for T259020, run the varnishncsa | mtail pipeline in the background, checking that varnishncsa is still up in a while loop. This is an evidently baroque hack needed until mtail is fixed to do the right thing when receiving EOF on stdin: google/mtail#331 Bug: T259020 Change-Id: I6a118e659b3fe7298a7a19ca3268924e8719638b

jaqx0r · 2021-01-15T00:34:24Z

As of HEAD right now I think that mtail will exit properly when stdin is closed if you only have one log (/dev/stdin) and you also add the flag --one_shot.

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

If you can try out in the meantime that would be nice, but not necessary.

shogos3 · 2021-03-05T08:15:20Z

I want to remove the need for using --one_shot because that is supposed to be a debugging flag, so the issue isn't resolved yet.

Please do this. Because it looks --one_shot does not support bucket. I deserve to use stdin with bucket.

Thanks!

jaqx0r · 2024-04-27T13:34:22Z

There's a bit to untangle here.

The tailer can easily shut itself down once there are no more log patterns to watch; i.e. when stdin is closed it is removed from the pattern list.

The VM runtime can shut down once the tailer has closed the lines channel.

The hard part is getting the exporter and http server to close once the VM is done. There's some TODOs I left around indicating I should do better with the server shutdown code which give me some hints.

The `Exporter` and `HTTPServer` are not connected by channels to the `Runtime`, only indirectly through mutating effects on the Store. As such we can't rely on the `Exporter` to shut down automatically when the `Runtime` shuts down. Here instead we remove the `WaitGroup` and allow an `Exporter` to be shut down explicitly after the `WaitGroup` is `Done`. Issue: #331

jaqx0r · 2024-05-07T12:15:28Z

Well it turns out it's not so easy to shut down the tailer because it is also looking for new log patterns, so it needs to know that there are no more logstreams, and that there will never be any more logstreams.

I think I have figured out a way to make that happen, needs a bit more refactoring in the pattern globbing code.

The work of globbing a single pattern is separate from iterating over all patterns, setting us up to start a goroutine per pattern instead of an iterator. Issue: #331

We would like to use this in per-pattern goroutines. Issue: #331

Instead of collecting sockets and tailing them after, just tail them straight away. This makes the "early exit" check for "no things" broken, so replace that with a check in the final shutdown handler that there were no active globs. This will be replaced in the next few changes once migrated to goroutines and we can use only the `WaitGroup` for counting globs, too, but in the meantime this is more correct than the previous code. Issue: #331

jaqx0r self-assigned this Jan 13, 2021

jaqx0r added the bug This is considered a bug and shall get fixed label Jan 13, 2021

jaqx0r added the mtail-Operating Issues related to deploying and running mtail label Mar 21, 2021

jaqx0r mentioned this issue May 6, 2024

Explicitly stop the Exporter when main routines are done. #860

Merged

jaqx0r added a commit that referenced this issue May 26, 2024

refactor: Extract doPatternGlob from PollLogPatterns.

45d2de8

The work of globbing a single pattern is separate from iterating over all patterns, setting us up to start a goroutine per pattern instead of an iterator. Issue: #331

jaqx0r added a commit that referenced this issue May 26, 2024

refactor: Store the log pattern poll waker in the Tailer struct.

926ef7b

We would like to use this in per-pattern goroutines. Issue: #331

jaqx0r mentioned this issue May 26, 2024

feat: Handle stdin closure and shut down the tailer. #868

Merged

jaqx0r closed this as completed in #868 May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtail keeps on running if producer crashes when reading from stdin #331

mtail keeps on running if producer crashes when reading from stdin #331

ema commented Jul 29, 2020

jaqx0r commented Jul 30, 2020 via email

ema commented Jul 30, 2020

jaqx0r commented Jan 15, 2021 •

edited

Loading

shogos3 commented Mar 5, 2021

jaqx0r commented Apr 27, 2024

jaqx0r commented May 7, 2024

mtail keeps on running if producer crashes when reading from stdin #331

mtail keeps on running if producer crashes when reading from stdin #331

Comments

ema commented Jul 29, 2020

jaqx0r commented Jul 30, 2020 via email

ema commented Jul 30, 2020

jaqx0r commented Jan 15, 2021 • edited Loading

shogos3 commented Mar 5, 2021

jaqx0r commented Apr 27, 2024

jaqx0r commented May 7, 2024

jaqx0r commented Jan 15, 2021 •

edited

Loading