Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed #1471

shannara · 2018-07-25T13:49:05Z

Host system details

State: idle; auto updates disabled
Deployments:
  ostree://fedora-atomic:fedora/28/x86_64/atomic-host
                   Version: 28.20180722.0 (2018-07-23 00:38:05)
                    Commit: 106a7095fd46741948ba60dbd0586668ae7abe6336671444ec790d7541a88778
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1

● ostree://fedora-atomic:fedora/28/x86_64/atomic-host
                   Version: 28.20180515.1 (2018-05-15 16:32:35)
                    Commit: a29367c58417c28e2bd8306c1f438b934df79eba13706e078fe8564d9e0eb32b
              GPGSignature: Valid signature by 128CF232A9371991C8A65695E08E7E629DB62FB1

Steps to reproduce it

Install Fedora Atomic 28 from 20180515.1 iso.

First issue need to create /var/cache/rpm-ostree (fix with #1366)

Second issue, after run rpm-ostree upgrade, and reboot to 20180722.0, various errors display,

waiting to mount /var/home, /boot, /boot/efi and blocked to :
"Configure read-only root support" (5mn .. / no limit)

Need to hard reboot and switch to previous ostree (20180515) that work.

The text was updated successfully, but these errors were encountered:

jlebon · 2018-07-25T14:00:12Z

So, it looks like "Configure read-only root support" comes from fedora-readonly.service, which I hadn't come across before. It sounds like something we don't actually need given that root is already read-only, though I haven't looked deeper into it yet.

Can you also paste /proc/cmdline and /etc/sysconfig/readonly-root?

shannara · 2018-07-25T14:04:57Z

/proc/cmdline

BOOT_IMAGE=/ostree/fedora-atomic-eeed31e4c67469d3fe08d0fe25a704f4208aec0702dce0c98b5e4efa9bddaca2/vmlinuz-4.16.8-300.fc28.x86_64 rd.lvm.lv=fedora_sle-fedora/root rd.luks.uuid=luks-0d7cb32b-2ff7-4fca-9788-4449ff70fbf0 root=/dev/mapper/fedora_sle--fedora-root ostree=/ostree/boot.1/fedora-atomic/eeed31e4c67469d3fe08d0fe25a704f4208aec0702dce0c98b5e4efa9bddaca2/0

/etc/sysconfig/readonly-root

# Set to 'yes' to mount the system filesystems read-only.
READONLY=no
# Set to 'yes' to mount various temporary state as either tmpfs
# or on the block device labelled RW_LABEL. Implied by READONLY
TEMPORARY_STATE=no
# Place to put a tmpfs for temporary scratch writable space
RW_MOUNT=/var/lib/stateless/writable
# Label on local filesystem which can be used for temporary scratch space
RW_LABEL=stateless-rw
# Options to use for temporary mount
RW_OPTIONS=
# Label for partition with persistent data
STATE_LABEL=stateless-state
# Where to mount to the persistent data
STATE_MOUNT=/var/lib/stateless/state
# Options to use for persistent mount
STATE_OPTIONS=
# NFS server to use for persistent data?
CLIENTSTATE=
# Use slave bind-mounts
SLAVE_MOUNTS=yes

cgwalters · 2018-07-25T14:15:27Z

Hmm. First, we should nuke fedora-readonly.service from orbit - OSTree is just superior to it and they overlap a lot.

That said, that looks like the default config, and AFAICS from reading /usr/lib/systemd/fedora-readonly it should be a no-op service if READONLY=no.

I think this is a symptom of something else; can you paste your /etc/fstab? I wonder if we regressed something related to the /var generator in a different way.

shannara · 2018-07-25T14:22:06Z

Hi @cgwalters,

This is my /etc/fstab

#
# /etc/fstab
# Created by anaconda on Wed Jul 25 14:07:11 2018
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/fedora_sle--fedora-root /                       xfs     defaults,x-systemd.device-timeout=0 0 0
UUID=6ccb7110-c197-47ec-b26b-4e406872b0fa /boot                   ext4    defaults        1 2
UUID=7B30-E2CA          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
/dev/mapper/fedora_sle--fedora-home /home                   xfs     defaults,x-systemd.device-timeout=0 0 0

Indicate /home and / but effective mountpoint as respectivelly /sysroot and /var/home

jlebon · 2018-07-25T15:15:12Z

Hmm, I can't reproduce this. I tried upgrading a cloud VM from that same version to the latest with /var/roothome and /var/home mounts. Though I didn't try to install from an older ISO. (Any reason you're not using the latest ISO BTW? See the bottom of https://getfedora.org/en/atomic/download/ for links).

shannara · 2018-07-25T15:22:40Z

@jlebon, thanks to try reproduce them.

I use this ISO because present on my download directory, and no think to download latest.
Because new machine, i can reinstall with latest iso, but strange if work to you with same version, maybe environment (UEFI / Secure Boot / LVM encrypt).

jlebon · 2018-07-25T19:33:22Z

For context, from IRC, @shannara also tried the latest ISO but got issues booting:

11:49:28 < shannara> with last iso, not grub display, boot seem break.

Though I just tried the latest ISO here as well in a UEFI VM with encrypted VG and a /home mount, and it worked OK (I did have to wait a bit after entering the passphrase at boot for some units that were taking a while). So it's potentially a VM vs bare metal issue, which would make this especially tricky to nail down.

cgwalters · 2018-07-25T19:52:28Z

Assuming the latest ISO picked up the updated kernel, I'm guessing this is a kernel regression; maybe something going wrong with the driver for whatever disks are in use.

You can test this by using rpm-ostree override to install just the kernel from 28.20180722.0.

shannara · 2018-07-27T07:54:13Z

Hi,

Sorry to late response, off office yesterday :(, but good news, i run rpm-ostree upgrade this morning and work.
kernel : 4.17.7-200

:/
docker Container failed at start, need start manually /var/log/firewalld folder is missing.

Jul 27 10:00:48 sle-fedora systemd[1]: Starting Docker Application Container Engine...
Jul 27 10:00:48 sle-fedora dockerd-current[1035]: time="2018-07-27T10:00:48.241016706+02:00" level=warning msg="could not change group /var/run/docker.sock to docker: group docker not found"
Jul 27 10:00:48 sle-fedora dockerd-current[1035]: time="2018-07-27T10:00:48.241301402+02:00" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!>
Jul 27 10:00:48 sle-fedora dockerd-current[1035]: listen tcp 10.192.10.10:2376: bind: cannot assign requested address
Jul 27 10:00:48 sle-fedora systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Jul 27 10:00:48 sle-fedora systemd[1]: docker.service: Failed with result 'exit-code'.
Jul 27 10:00:48 sle-fedora systemd[1]: Failed to start Docker Application Container Engine.

Jul 27 10:02:40 sle-fedora systemd[1]: Starting firewalld - dynamic firewall daemon...
Jul 27 10:02:40 sle-fedora firewalld[1681]: ERROR: Failed to open log file '/var/log/firewalld': [Errno 13] Permission denied: '/var/log/firewalld'

cgwalters · 2018-07-27T11:02:42Z

It sounds like there's a problem with /var in general. Any errors in systemctl status systemd-tmpfiles-setup?

shannara · 2018-07-27T15:35:33Z

This is status of systemctl status systemd-tmpfiles-setup

[root@sle-fedora ~]# systemctl status systemd-tmpfiles-setup
● systemd-tmpfiles-setup.service - Create Volatile Files and Directories
   Loaded: loaded (/usr/lib/systemd/system/systemd-tmpfiles-setup.service; static; vendor preset: disabled)
   Active: active (exited) since Fri 2018-07-27 10:00:47 CEST; 7h ago
     Docs: man:tmpfiles.d(5)
           man:systemd-tmpfiles(8)
  Process: 858 ExecStart=/usr/bin/systemd-tmpfiles --create --remove --boot --exclude-prefix=/dev (code=exited, status=0/SUCCESS)
 Main PID: 858 (code=exited, status=0/SUCCESS)

Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/systemd.conf:43] Duplicate line for path "/var/lib/systemd/coredump", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/tmp.conf:12] Duplicate line for path "/var/tmp", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/var.conf:19] Duplicate line for path "/var/cache", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/var.conf:21] Duplicate line for path "/var/lib", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: [/usr/lib/tmpfiles.d/var.conf:23] Duplicate line for path "/var/spool", ignoring.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: "/home" already exists and is not a directory.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: "/srv" already exists and is not a directory.
Jul 27 10:00:47 sle-fedora systemd-tmpfiles[858]: "/tmp" already exists and is not a directory.
Jul 27 10:00:47 sle-fedora systemd[1]: Started Create Volatile Files and Directories.

cgwalters · 2018-07-31T20:33:33Z

FWIW https://src.fedoraproject.org/rpms/initscripts/c/7276c52c2cf057f5a82fd73498b1c7137a615e54?branch=master split out the fedora-readonly.service - though I don't think it's really related here.

dustymabe · 2018-07-31T20:45:46Z

I hit an issue when trying out atomic workstation (silverblue). This was on a macbook pro (2015) so bare metal EFI and no dual boot. I installed from ISO (just happened to be the beta ISO Fedora-AtomicWorkstation-ostree-x86_64-28_Beta-1.3.iso) and then did an rpm-ostree upgrade from a local mirror. After the upgrade I got an error with ostree-remount.service attempting to boot the older tree worked fine. Attached are all the logs.

I should reall try this with the officially released ISO but i'm tethered to my hotspot right now as local internet ISP is having an outage.

journal.txt
ostree-remount-journal.txt
rpm-ostree-status.txt
systemctl-status-ostree-remount.txt

dustymabe · 2018-07-31T20:46:36Z

also, no /var/ mount.

cgwalters · 2018-07-31T20:51:41Z

That's quite strange...the service exits successfully according to the audit logs. But for some reason systemd (probably?) is trying to execute it again.

I wonder if there's a core race condition here somewhere in systemd where ostree-remount is going in and changing the mount status, systemd detects that and decides that it hasn't completed local-fs.target or something, and then tries again?

jlebon · 2018-07-31T20:53:42Z

Are you able to log on the serial console when you reach emergency.target? If so, can you post findmnt?

Alternatively/additionally, before running upgrade (run cleanup -p if already done), can you override ostree-remount.service so that the ExecStart is prefixed with /usr/bin/strace?

cgwalters · 2018-07-31T20:54:08Z

Humm...wait I have an idea, today our fstab generator does:

                               /* We need /sysroot mounted writable first */
                               "After=ostree-remount.service\n"

But we don't have RemainAfterExit=yes in ostree-remount...I bet systemd may try to run it once for each mount...

cgwalters · 2018-07-31T20:54:29Z

Testing out that theory now 🔬

dustymabe · 2018-07-31T20:58:55Z

waiting on colin to come back with the results of his test.. after that I'll try to answer @jlebon's questions

This is standard practice for units like this; e.g. it's what `systemd-remount-fs.service` does. I think it may be part of or the whole cause for coreos/rpm-ostree#1471 I haven't reproduced the problem exactly but it seems to me that if the unit starts and is GC'd, then when systemd goes to execute a later unit it might end up restarting it. A noticeable side effect of this is that `systemctl status ostree-remount` exits with code `0` as expected.

jlebon · 2018-07-31T21:08:03Z

At least https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=750683 confirms start-limit-hit can happen if not using RemainAfterExit.

cgwalters · 2018-07-31T21:14:32Z

Yeah...I wasn't able to reproduce this in some quick testing but it's clearly a race condition per the quote:

Without RemainAfterExit, when another service is started and depends on the
oneshot service (directly or indirectly, typically by depending on
something like basic.target which depends on the oneshot), systemd will
not consider the oneshot to be already active if it has finished
executing and exited, and it will thus be started again.

I actually find myself wondering now what the heck a use case is for a Type=oneshot without RemainAfterExit=yes apart from systemd-run type things.

This is standard practice for units like this; e.g. it's what `systemd-remount-fs.service` does. I think it may be part of or the whole cause for coreos/rpm-ostree#1471 I haven't reproduced the problem exactly but it seems to me that if the unit starts and is GC'd, then when systemd goes to execute a later unit it might end up restarting it. A noticeable side effect of this is that `systemctl status ostree-remount` exits with code `0` as expected. Closes: #1697 Approved by: jlebon

dustymabe · 2018-07-31T22:02:04Z

Are you able to log on the serial console when you reach emergency.target? If so, can you post findmnt?

findmnt.txt

Alternatively/additionally, before running upgrade (run cleanup -p if already done), can you override ostree-remount.service so that the ExecStart is prefixed with /usr/bin/strace?

unfortunately no strace in the initramfs

dustymabe · 2018-08-01T02:44:36Z

ostreedev/ostree#1697 seems to have solved it for me: ostreedev/ostree#1697 (comment).

I just wonder why we only see this sometimes? I'd expect to see this a lot more for some reason.

cgwalters · 2018-08-01T12:29:49Z

I just wonder why we only see this sometimes? I'd expect to see this a lot more for some reason.

I don't know offhand; do you have a lot of mount units configured? Any unusual configuration?

dustymabe · 2018-08-01T14:51:41Z

I don't know offhand; do you have a lot of mount units configured? Any unusual configuration?

just did a kickstart install with the following partition setup and then rpm-ostree upgrade after first login and configuring the remote.

install
cdrom
reboot
lang en_US.UTF-8
keyboard us
rootpw  foo
authconfig --enableshadow --passalgo=sha512
selinux --enforcing
timezone --utc America/New_York
bootloader --location=mbr --driveorder=sda --append="crashkernel=auto"

# Partition info
zerombr
clearpart --all
reqpart
part /boot --size=1024 --fstype="ext4"
part pv.01 --grow --encrypted --passphrase foo
volgroup vgroot pv.01
logvol / --size=10240 --fstype="xfs" --name=root --vgname=vgroot
logvol swap --size=4096 --fstype="swap" --name=swap --vgname=vgroot

ostreesetup --osname="fedora" --remote="fedora" --url="file:///ostree/repo" --ref="fedora/28/x86_64/workstation" --nogpg

maybe the fact that the disk is encrypted ?

cgwalters · 2018-08-01T17:54:59Z

@shannara Do you see something like this in your log?

Jul 31 16:24:57 localhost.localdomain systemd[1]: Starting OSTree Remount OS/ bind mounts...
Jul 31 16:24:57 localhost.localdomain ostree-remount[689]: Remounted: /sysroot
Jul 31 16:24:57 localhost.localdomain systemd[1]: Started OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: Starting OSTree Remount OS/ bind mounts...
Jul 31 16:24:58 localhost.localdomain systemd[1]: Started OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: Starting OSTree Remount OS/ bind mounts...
Jul 31 16:24:58 localhost.localdomain systemd[1]: Started OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: Starting OSTree Remount OS/ bind mounts...
Jul 31 16:24:58 localhost.localdomain systemd[1]: Started OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: Starting OSTree Remount OS/ bind mounts...
Jul 31 16:24:58 localhost.localdomain systemd[1]: Started OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: ostree-remount.service: Start request repeated too quickly.
Jul 31 16:24:58 localhost.localdomain systemd[1]: ostree-remount.service: Failed with result 'start-limit-hit'.
Jul 31 16:24:58 localhost.localdomain systemd[1]: Failed to start OSTree Remount OS/ bind mounts.
Jul 31 16:24:58 localhost.localdomain systemd[1]: ostree-remount.service: Triggering OnFailure= dependencies.

?

shannara · 2018-08-03T07:36:51Z

Hi @cgwalters,

because i reinstalling with 20180625 iso, i not previous journal of used with 20180515, but i reminding that during boot screen, few lines with mount /var/home (already mount or similar).

dustymabe · 2018-08-29T19:39:30Z

do we think ostreedev/ostree#1697 maybe fixed the problem for this issue? should we consider closing it?

cgwalters · 2018-08-29T20:01:56Z

Maybe? I dunno. The OP's issue had a different symptom but it could have been related, it's hard to say for sure.

cgwalters · 2018-10-12T15:11:16Z

@shannara Any luck with a more recent FAH release?

dustymabe · 2018-11-01T17:24:14Z

I'm going to close this since I think ostreedev/ostree#1697 fixed the problem. @shannara - please re-open if you have more info to add here or if you think the problem is not fixed.

shannara changed the title ~~Uppgrade ostree from 20180515.1 to 20180722.0 failed.~~ Upgrade ostree from 20180515.1 to 20180722.0 failed. Jul 25, 2018

cgwalters added the bug label Jul 25, 2018

cgwalters changed the title ~~Upgrade ostree from 20180515.1 to 20180722.0 failed.~~ Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed Jul 30, 2018

cgwalters mentioned this issue Jul 31, 2018

ostree-remount.service: RemainAfterExit=yes ostreedev/ostree#1697

Closed

dustymabe mentioned this issue Aug 29, 2018

coreos-firstboot-complete: RemainAfterExit=yes coreos/ignition-dracut#15

Merged

dustymabe closed this as completed Nov 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed #1471

Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed #1471

shannara commented Jul 25, 2018 •

edited by jlebon

Loading

jlebon commented Jul 25, 2018

shannara commented Jul 25, 2018

cgwalters commented Jul 25, 2018

shannara commented Jul 25, 2018

jlebon commented Jul 25, 2018

shannara commented Jul 25, 2018

jlebon commented Jul 25, 2018

cgwalters commented Jul 25, 2018

shannara commented Jul 27, 2018 •

edited

Loading

cgwalters commented Jul 27, 2018

shannara commented Jul 27, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018

dustymabe commented Jul 31, 2018

cgwalters commented Jul 31, 2018

jlebon commented Jul 31, 2018

cgwalters commented Jul 31, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018 •

edited

Loading

jlebon commented Jul 31, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018

dustymabe commented Aug 1, 2018 •

edited

Loading

cgwalters commented Aug 1, 2018

dustymabe commented Aug 1, 2018

cgwalters commented Aug 1, 2018

shannara commented Aug 3, 2018

dustymabe commented Aug 29, 2018

cgwalters commented Aug 29, 2018

cgwalters commented Oct 12, 2018

dustymabe commented Nov 1, 2018

Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed #1471

Upgrading Fedora Atomic Host 20180515.1 to 20180722.0 failed #1471

Comments

shannara commented Jul 25, 2018 • edited by jlebon Loading

jlebon commented Jul 25, 2018

shannara commented Jul 25, 2018

cgwalters commented Jul 25, 2018

shannara commented Jul 25, 2018

jlebon commented Jul 25, 2018

shannara commented Jul 25, 2018

jlebon commented Jul 25, 2018

cgwalters commented Jul 25, 2018

shannara commented Jul 27, 2018 • edited Loading

cgwalters commented Jul 27, 2018

shannara commented Jul 27, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018

dustymabe commented Jul 31, 2018

cgwalters commented Jul 31, 2018

jlebon commented Jul 31, 2018

cgwalters commented Jul 31, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018 • edited Loading

jlebon commented Jul 31, 2018

cgwalters commented Jul 31, 2018

dustymabe commented Jul 31, 2018

dustymabe commented Aug 1, 2018 • edited Loading

cgwalters commented Aug 1, 2018

dustymabe commented Aug 1, 2018

cgwalters commented Aug 1, 2018

shannara commented Aug 3, 2018

dustymabe commented Aug 29, 2018

cgwalters commented Aug 29, 2018

cgwalters commented Oct 12, 2018

dustymabe commented Nov 1, 2018

shannara commented Jul 25, 2018 •

edited by jlebon

Loading

shannara commented Jul 27, 2018 •

edited

Loading

dustymabe commented Jul 31, 2018 •

edited

Loading

dustymabe commented Aug 1, 2018 •

edited

Loading