worker VM comes up in emergency mode when using tectonic install #318

bparees · 2018-09-26T19:26:34Z

the bootstrap+master nodes come up ok, the worker goes into emergency mode.

The only tweak is that i'm resizing the image filesystem:

wget http://aos-ostree.rhev-ci-vms.eng.rdu2.redhat.com/rhcos/images/cloud/latest/rhcos-qemu.qcow2.gz
gzip -d rhcos-qemu.qcow2.gz 
cp rhcos-qemu.qcow2 rhcos-qemu.new.qcow2 
qemu-img resize rhcos-qemu.new.qcow2  +20G
virt-resize --expand /dev/vda2 rhcos-qemu.qcow2 rhcos-qemu.new.qcow2

The text was updated successfully, but these errors were encountered:

dustymabe · 2018-09-26T19:27:37Z

FYI: me and @jlebon are taking point on investigation

dustymabe · 2018-09-26T19:32:48Z

virt-resize --expand /dev/vda2 rhcos-qemu.qcow2 rhcos-qemu.new.qcow2

note that with #293 this step shouldn't be necessary

ashcrow · 2018-09-26T19:33:54Z

Do we need to let those folks doing testing know to hold off for the time being?

dustymabe · 2018-09-26T19:34:30Z

Do we need to let those folks doing testing know to hold off for the time being?

only if they resize their disks like @bparees did above

jlebon · 2018-09-26T19:35:07Z

Are we sure this is related to resizing the image though?

dustymabe · 2018-09-26T19:35:23Z

Are we sure this is related to resizing the image though?

not yet, hopefully will know soon.

ashcrow · 2018-09-26T19:40:43Z

/assign @dustymabe @jlebon

ashcrow · 2018-09-26T19:40:51Z

/kind bug

jlebon · 2018-09-26T20:06:33Z

Booting the node after virt-resize definitely works. Passing it through the installer now to see if I can reproduce the worker failure.

dustymabe · 2018-09-26T20:08:54Z

Booting the node after virt-resize definitely works. Passing it through the installer now to see if I can reproduce the worker failure.

yep I'm seeing the same. haven't run the installer yet.

dustymabe · 2018-09-26T20:17:12Z

FYI opened a bug for filesystem not getting resized on boot: #319

jlebon · 2018-09-26T20:50:08Z

OK, reproduced this. So I think this might be a bug in virt-resize. It looks like it's corrupting the superblock, which is causing sysroot.mount to fail:

Sep 26 20:39:36 worker-dsvgr.mco.testing kernel: XFS (vda2): last sector read failed
Sep 26 20:39:36 worker-dsvgr.mco.testing mount[504]: mount: /dev/vda2: can't read superblock
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: sysroot.mount mount process exited, code=exited status=32
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: Failed to mount /sysroot.
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: Dependency failed for Initrd Root File System.
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: Dependency failed for Reload Configuration from the Real Root.
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: Job initrd-parse-etc.service/start failed with result 'dependency'.
Sep 26 20:39:36 worker-dsvgr.mco.testing systemd[1]: Triggering OnFailure= dependencies of initrd-parse-etc.service.

You can see the same issue when trying to guestmount:

[root@pet /]# LIBGUESTFS_BACKEND=direct guestmount --ro -d worker-dsvgr -m /dev/sda2 /mnt/tmp
libguestfs: error: mount_options: mount exited with status 32: mount: /sysroot: can't read superblock on /dev/sda2.
guestmount: ‘/dev/sda2’ could not be mounted.

jlebon · 2018-09-26T20:51:46Z

So I'd say for now, let's figure out #319 and I'll see about digging deeper in libguestfs/report an issue if there isn't already one.

bparees · 2018-09-26T20:52:51Z

@jlebon any alternate way i can grow the filesystem in the meantime?

jlebon · 2018-09-26T20:56:56Z

Are you able to get the installer to finish with the default size at least? If so, then you can just xfs_growfs /sysroot manually once the workers are up.

dustymabe · 2018-09-26T21:02:32Z

I wonder if it's because you are overwriting an existing disk image. @jlebon can you try to reproduce with an empty outdisk ?

[dustymabe@media images]$ zcat rhcos-4.0.6179-qemu.qcow2.gz > rhcos-4.0.6179-qemu.qcow2
[dustymabe@media images]$ truncate -s 40G outdisk.qcow2
[dustymabe@media images]$ virt-resize --expand /dev/vda2 rhcos-4.0.6179-qemu.qcow2 outdisk.qcow2 | tee
[   0.0] Examining rhcos-4.0.6179-qemu.qcow2
**********

Summary of changes:

/dev/sda1: This partition will be left alone.

/dev/sda2: This partition will be resized from 7.7G to 39.7G.  The
filesystem xfs on /dev/sda2 will be expanded using the ‘xfs_growfs’
method.

**********
[   3.4] Setting up initial partition table on outdisk.qcow2
[   3.8] Copying /dev/sda1
[   4.7] Copying /dev/sda2
[  27.6] Expanding /dev/sda2 using the ‘xfs_growfs’ method

Resize operation completed with no errors.  Before deleting the old disk, 
carefully check that the resized disk boots and works correctly.

jlebon · 2018-09-26T21:04:24Z

Ahh good point. I did see the truncate guidelines in virt-resize(1), though didn't think it should affect it. Will try that out.

ashcrow · 2018-09-26T21:05:27Z

So I'd say for now, let's figure out #319 and I'll see about digging deeper in libguestfs/report an issue if there isn't already one.

FWIW @jlebon worked on #319. #320 merged and should be in the next compose.

bparees · 2018-09-26T21:21:29Z

the truncate approach left me w/ a totally hosed install, no VMs would even come up, terraform just spun on:
module.libvirt_base_volume.libvirt_volume.coreos_base: Still creating... (1m30s elapsed)
module.bootstrap.libvirt_ignition.bootstrap: Still creating... (1m30s elapsed)

dustymabe · 2018-09-26T21:33:47Z

no VMs would even come up

I bet this was because the installer is specifying the format of the disk to be qcow2 and the truncate essentially makes a raw image. the pipeline is running now with #320 in it so we should have a new image sometime soon.

jlebon · 2018-09-26T21:34:40Z

This seems to work:

$ qemu-img create -f qcow2 -o preallocation=metadata rhcos-4.0.6179-qemu-larger.qcow2 12G
Formatting 'rhcos-4.0.6179-qemu-larger.qcow2', fmt=qcow2 size=12884901888 cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
$ virt-resize --expand /dev/vda2 rhcos-4.0.6179-qemu.qcow2 rhcos-4.0.6179-qemu-larger.qcow2

module.libvirt_base_volume.libvirt_volume.coreos_base: Still creating... (1m30s elapsed)
module.bootstrap.libvirt_ignition.bootstrap: Still creating... (1m30s elapsed)

Hmm, I did notice it taking longer, but it finished in the end. I think it's from copying over the disk to /var/lib/libvirt/images?

(Though again, now that #319 is merged, there's not much use in doing this, esp. since virt-resize leaves you with a fully sized image sitting on your disk, x2 for the base layer copied to /var/lib/libvirt/images).

dustymabe · 2018-09-26T21:42:47Z

the pipeline is running now with #320 in it so we should have a new image sometime soon

will be fixed in4.0.6185, so look for that image to show up in the output directory. in the next 30 minutes to an hour.

dustymabe · 2018-09-26T22:16:24Z

so look for that image to show up in the output directory

just landed

bparees · 2018-09-26T22:32:01Z

worker comes up and seems to have expanded properly. Well, sort of properly. It's got 16gigs. I had grown the image by 20gigs and the master VM shows 36gigs as i'd expect.

$ ssh [email protected]
-bash-4.2$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vda2       37430252 4798312  32631940  13% /

$ ssh [email protected]
-bash-4.2$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
/dev/vda2       16972780 4794784  12177996  29% /

I assume that may require enlisting the installer team, but i'm going to leave it with you guys for the moment...

dustymabe · 2018-09-29T03:08:09Z

i think we can close this now since it seems like the way the virt-resize was being done (copying over an existing non-empty image) caused corruption.

The fix of issue #319 i think helps you too.

Fixed in #320.

bparees · 2018-09-29T03:12:17Z

@dustymabe you saw my last #318 (comment) right?

dustymabe · 2018-10-01T02:35:22Z

@dustymabe you saw my last #318 (comment) right?

misread..

This issue has been migrated to openshift/cluster-api-provider-libvirt#28

bparees · 2018-10-01T02:36:15Z

thanks @dustymabe !

openshift-ci-robot assigned dustymabe and jlebon Sep 26, 2018

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Sep 26, 2018

dustymabe closed this as completed Sep 29, 2018

dustymabe mentioned this issue Oct 1, 2018

support different sizes for worker disks. openshift/cluster-api-provider-libvirt#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker VM comes up in emergency mode when using tectonic install #318

worker VM comes up in emergency mode when using tectonic install #318

bparees commented Sep 26, 2018

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

ashcrow commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

ashcrow commented Sep 26, 2018

ashcrow commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018 •

edited

Loading

jlebon commented Sep 26, 2018

bparees commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018

ashcrow commented Sep 26, 2018

bparees commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018 •

edited

Loading

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

bparees commented Sep 26, 2018

dustymabe commented Sep 29, 2018

bparees commented Sep 29, 2018 •

edited

Loading

dustymabe commented Oct 1, 2018

bparees commented Oct 1, 2018

worker VM comes up in emergency mode when using tectonic install #318

worker VM comes up in emergency mode when using tectonic install #318

Comments

bparees commented Sep 26, 2018

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

ashcrow commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

ashcrow commented Sep 26, 2018

ashcrow commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018 • edited Loading

jlebon commented Sep 26, 2018

bparees commented Sep 26, 2018

jlebon commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018

ashcrow commented Sep 26, 2018

bparees commented Sep 26, 2018

dustymabe commented Sep 26, 2018

jlebon commented Sep 26, 2018 • edited Loading

dustymabe commented Sep 26, 2018

dustymabe commented Sep 26, 2018

bparees commented Sep 26, 2018

dustymabe commented Sep 29, 2018

bparees commented Sep 29, 2018 • edited Loading

dustymabe commented Oct 1, 2018

bparees commented Oct 1, 2018

jlebon commented Sep 26, 2018 •

edited

Loading

jlebon commented Sep 26, 2018 •

edited

Loading

bparees commented Sep 29, 2018 •

edited

Loading