Skip to content
This repository has been archived by the owner on Aug 25, 2021. It is now read-only.

Add ignition-virtio-dump.service #146

Merged
merged 1 commit into from
Mar 31, 2020

Conversation

cgwalters
Copy link
Member

Debugging failures in the initrd is annoying; this code
looks for a virtio-serial port named com.coreos.ignition.journal,
and runs as part of emergency.target.

I plan to change mantle to set up this port by default, so if
something fails in the initramfs we'll at least reliably get
the journal in a sane parsable format.

This is a special targeted subset of
coreos/ignition#585

@jlebon
Copy link
Member

jlebon commented Jan 6, 2020

We'll need something like this for rpm-ostree's CI too at least, where right now the journal kola collects stops on the first reboot (again, because kola doesn't know the node is being rebooted).

I had thought of something similar though possibly using another console instead, and ForwardToConsole=. Using virtio-serial channels are a neat idea! We made use of them in SystemTap to support targeting probing processes inside VMs. Though hmm, the downside is we'll probably need a custom unit to proxy the messages through.

Anyway, WDYT about having this functionality in https://github.com/coreos/fedora-coreos-config directly, and just conditionalizing the unit on ConditionPathExists=/dev/virtio-ports/com.coreos.journal?

@jlebon
Copy link
Member

jlebon commented Jan 6, 2020

(To clarify, what I'm suggesting here is making this a streaming thing instead, installing it in both the initrd and the real root, and making it more of a "host API" than something Ignition-specific.)

@cgwalters
Copy link
Member Author

I think these are strongly related but still orthogonal things. We don't need to stream the journal from the initrd - assuming the initrd works fine, if we do journal streaming in the real root we'll get the logs we need then.

Hence, I'd propose merging this PR mostly as is, and do what you're suggesting as a separate virtio channel indeed owned by fedora-coreos-config (since it's not really related to Ignition).

@cgwalters
Copy link
Member Author

BTW, I wrote exactly what you're suggesting for gnome-continuous for several reasons, but one of the most interesting is that the default for desktop systems is not to have ssh on.

(It could make sense to change mantle to default to 'exec over virtio' but that's a separate discussion)

@cgwalters
Copy link
Member Author

Any further thoughts on this one?

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are strongly related but still orthogonal things. We don't need to stream the journal from the initrd - assuming the initrd works fine, if we do journal streaming in the real root we'll get the logs we need then.

Hmm, I don't follow. If the goal is to have a debugging hook like this, IMO it'd be even more useful if it streamed starting from the initrd too. E.g. systemd.log_level=debug systemd.log_target=console works on both the initrd systemd and real root systemd. And if we do that, I think it can be used in place of this.

But yeah, this is clearly useful to have today, so no issues from me getting this in meanwhile.

Anyway, a few optional comments, but LGTM as is too.

dracut/99emergency-timeout/ignition-virtio-dump.service Outdated Show resolved Hide resolved
dracut/99emergency-timeout/module-setup.sh Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member Author

If the goal is to have a debugging hook like this, IMO it'd be even more useful if it streamed starting from the initrd too.

Yeah...though it would duplicate then what kola is doing with gathering the journals (we could replace that only on qemu of course).

We'd also need to handle being killed and restarted across the switchroot and think about how that appears in logs.

I guess again my main concern is getting the journal when things go wrong - when things go "right" (at least up till ssh) one has a ton of options.

Arguably, we should have a similar service in the real root that also handles failure to reach the default target.

@jlebon
Copy link
Member

jlebon commented Jan 29, 2020

Yeah...though it would duplicate then what kola is doing with gathering the journals (we could replace that only on qemu of course).

Yeah, the goal would definitely be to make kola use that for qemu (and fixing the rpm-ostree vmcheck test logs case).

We'd also need to handle being killed and restarted across the switchroot and think about how that appears in logs.

This would be tricky to do but not unsolvable I think. E.g. the proxy service could just write out on shutdown the cursor of the last message it proxied?

I guess again my main concern is getting the journal when things go wrong - when things go "right" (at least up till ssh) one has a ton of options.

The way I'm thinking of it, the contexts in which you would have this set up is also where you want to be ready for things to go wrong (e.g. Ignition debugging, test harnesses, etc..). I don't see it as re-implementing e.g. systemd-remote-journal but something lower level than that and situational.

But again, I definitely see the value of just something that fires on emergency in the initrd. So this WFM!

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 26, 2020
Pairs with coreos/ignition-dracut#146

What we really want is to use this in kola, will do as a
separate followup.
@cgwalters
Copy link
Member Author

Now pairs with coreos/coreos-assembler#1290 and tested to work (or I guess successfully fail?) together.

Will merge both when both are approved.

@cgwalters
Copy link
Member Author

Actually now that I play with this more...it might be nice if we wrote to the channel just {} when we succeeded too - that would make the flow in mantle/kola saner because we could synchronously wait for either success/failure rather than waiting for (ignition failure or ssh works).

@cgwalters
Copy link
Member Author

I thought about the "generalize this to post-initramfs" and realized we don't necessarily need to bake it into CoreOS by default - it could be injected via Ignition.

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 27, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 27, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 27, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.
@jlebon
Copy link
Member

jlebon commented Mar 27, 2020

I thought about the "generalize this to post-initramfs" and realized we don't necessarily need to bake it into CoreOS by default - it could be injected via Ignition.

The way I think of it is that a generalized version of this would be like the serial console output; it just streams from start to end of the VM on the same port. The same output you get from journalctl -o json really: logs there start from before switchroot.

Debugging failures in the initrd is annoying; this code
looks for a virtio-serial port named `com.coreos.ignition.journal`,
and runs as part of `emergency.target`.

I plan to change mantle to set up this port by default, so if
something fails in the initramfs we'll at least reliably get
the journal in a sane parsable format.

This is a special targeted subset of
coreos/ignition#585
@cgwalters
Copy link
Member Author

cgwalters commented Mar 27, 2020

The same output you get from journalctl -o json really: logs there start from before switchroot.

Sure, but post-switchroot any code injected via Ignition to write to a port or do whatever is going to get those logs too - it'll just be delayed until the switchroot happens.

Another important thing is that instead of getting all logs the calling code can also use e.g. journalctl -u or whatever to filter to specific units to avoid transferring all the data, etc.

cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Mar 28, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.

And add a test case for this.
@cgwalters
Copy link
Member Author

OK last call on this one...if there aren't any further objections/thoughts I plan to merge.

@cgwalters cgwalters merged commit 6136be3 into coreos:master Mar 31, 2020
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 3, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.

And add a test case for this.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 3, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.

And add a test case for this.
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request Apr 3, 2020
Pairs with coreos/ignition-dracut#146

This way, we error out fast if something went wrong in the initramfs
rather than timing out.  And further, we get the journal as JSON,
so we can do something intelligent in the future to analyze it.

And add a test case for this.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 8, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this pull request Apr 9, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to coreos/fedora-coreos-config that referenced this pull request Apr 9, 2020
This is similar to: coreos/ignition-dracut#146

For our test system, it generally works really well to inject
things via Ignition.  That PR was about handling failures in the
initramfs *before* Ignition runs.

This PR is trying to help us test the scenario where no Ignition
is injected into the Live ISO.  Let's also use the virtio-channel
approach.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 10, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 10, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 15, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 15, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 16, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request Apr 17, 2020
This finally unifies the advantages of `cosa run` and `kola spawn`.
I kept getting annoyed by how serial console sizing is broken
(e.g. trying to use `less` etc.).  Using `ssh` via `kola spawn`
addresses that, but it means you can't debug the initramfs.

Now things work in an IMO pretty cool way; if you do e.g.
`cosa run --kargs ignition.config.url=blah://` (or inject a bad
Ignition config) to cause a failure in the initramfs,
you'll see a nice error (building on
coreos/ignition-dracut#146 ) telling you
to rerun with `cosa run --devshell-console`.

Things are also wired up cleanly so that we support rebooting
with the equivalent of `kola spawn --reconnect` (which we should
probably remove now).  You can exit via *either* quitting SSH
cleanly or using `poweroff`, and the lifecycle of ssh and qemu
is wired together.

And finally, if we detect a cosa workdir we also bind it in by
default.

More to come here, such as auto-injecting debugging
tools and containers.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 29, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request Apr 29, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.

The immediate motivation here is I may add another kola test
which could use this.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request May 1, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.

The immediate motivation here is I may add another kola test
which could use this.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request May 1, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.

The immediate motivation here is I may add another kola test
which could use this.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this pull request May 1, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.

The immediate motivation here is I may add another kola test
which could use this.
openshift-merge-robot pushed a commit to coreos/coreos-assembler that referenced this pull request May 1, 2020
This came up in coreos/ignition-dracut#146
and since then we've been doing more "ad hoc unit writing to virtio"
in mantle, but let's add a general API that streams the journal.

This is just better for what devshell wants - we can more precisely
watch for sshd starting.  And more code in e.g. `testiso.go` could
use it too which can come later.

The immediate motivation here is I may add another kola test
which could use this.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants