Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBD fails to reconnect if started in initramfs #95

Open
jpf91 opened this issue Jan 31, 2019 · 2 comments
Open

NBD fails to reconnect if started in initramfs #95

jpf91 opened this issue Jan 31, 2019 · 2 comments

Comments

@jpf91
Copy link

jpf91 commented Jan 31, 2019

Maybe this is more a RHEL/dracut bug, but I'm not really sure how to find the root cause of this problem.

Using Centos 7.6 (3.10.0-957.1.3.el7.x86_64, nbd 3.14) and booting with nbd root filesystem and nbd client options -p -t10, the nbd-client fails to reconnect after a network hickup. If the nbd-client is ever restarted after boot (i.e. nbd-client -d /dev/nbd0 && nbd-client ... -p -t10 /dev/nbd0) the newly started nbd-client recovers just fine on network failures.

Adding -nofork and redirecting the nbd-client stderr output I was able to capture the following output of an initramfs started nbd-client:

CentOS Linux 7 (Core)
Kernel 3.10.0-957.1.3.el7.x86_64 on an x86_64

localhost login: [   58.382947] fuse init (API version 7.22)
[  151.366971] block nbd0: Receive control failed (result -104)
[  151.370055] block nbd0: shutting down socket
[  151.372247] block nbd0: queue cleared
[  151.373853] nbd,3371: Kernel call returned: 104 Reconnecting
[  151.395901] Error: Socket failed: Connection refused
[  151.395901] Exiting.
[  151.949925] e1000: ens33 NIC Link is Down
[  152.397085]  Reconnecting
[  157.989809] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  157.992670] IPv6: ADDRCONF(NETDEV_CHANGE): ens33: link becomes ready
[  158.007023] block nbd0: Attempted send on closed socket
[  158.008705] blk_update_request: I/O error, dev nbd0, sector 18518264
[  158.010371] XFS (nbd0): metadata I/O error: block 0x11a90f8 ("xfs_trans_read_buf_map") error 5 numblks 8
[  158.012997] block nbd0: Attempted send on closed socket
[  158.014379] blk_update_request: I/O error, dev nbd0, sector 55372432
[  158.016040] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[  158.041987] block nbd0: Attempted send on closed socket
[  158.043606] blk_update_request: I/O error, dev nbd0, sector 18518264
[  158.045309] XFS (nbd0): metadata I/O error: block 0x11a90f8 ("xfs_trans_read_buf_map") error 5 numblks 8
[  158.048266] block nbd0: Attempted send on closed socket
[  158.049547] blk_update_request: I/O error, dev nbd0, sector 55372432
[  158.051324] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[  158.118247] block nbd0: Attempted send on closed socket
[  158.119665] blk_update_request: I/O error, dev nbd0, sector 197568
[  158.121388] XFS (nbd0): metadata I/O error: block 0x303c0 ("xfs_trans_read_buf_map") error 5 numblks 32
[  158.123793] XFS (nbd0): xfs_imap_to_bp: xfs_trans_read_buf() returned error -5.
[  158.125818] block nbd0: Attempted send on closed socket
[  158.127059] blk_update_request: I/O error, dev nbd0, sector 55372432
[  158.133341] XFS (nbd0): metadata I/O error: block 0x34cea90 ("xfs_trans_read_buf_map") error 5 numblks 8
[  158.133786] block nbd0: Attempted send on closed socket
[  158.133788] blk_update_request: I/O error, dev nbd0, sector 55694096
[  158.133793] block nbd0: Attempted send on closed socket
[  158.133793] blk_update_request: I/O error, dev nbd0, sector 55694096
[  158.133821] block nbd0: Attempted send on closed socket
[  158.133821] blk_update_request: I/O error, dev nbd0, sector 55694096
[  159.419970] Error: Socket failed: Connection refused
[  159.419970] Exiting.
[  160.422105]  Reconnecting
[  160.424333] Error: Socket failed: Connection refused
[  160.424333] Exiting.
[  161.428520]  Reconnecting
[  161.431024] Error: Socket failed: Connection refused
[  161.431024] Exiting.
[  162.433949]  Reconnecting
[  162.436331] Error: Socket failed: Connection refused
[  162.436331] Exiting.
[  163.438283]  Reconnecting
[  163.441123] Error: Socket failed: Connection refused
[  163.441123] Exiting.
[  164.443784]  Reconnecting
[  164.446080] Error: Socket failed: Connection refused
[  164.446080] Exiting.
[  165.447956]  Reconnecting
[  165.450821] Error: Socket failed: Connection refused
[  165.450821] Exiting.
[  166.454441]  Reconnecting
[  166.457184] Error: Socket failed: Connection refused
[  166.457184] Exiting.
[  167.460289]  Reconnecting
[  167.462858] Error: Socket failed: Connection refused
[  167.462858] Exiting.
[  168.464403]  Reconnecting
[  168.466996] Error: Socket failed: Connection refused
[  168.466996] Exiting.
[  169.469852]  Reconnecting
[  169.472117] Error: Socket failed: Connection refused
[  169.472117] Exiting.
[  170.479696]  Reconnecting
[  170.482395] Error: Socket failed: Connection refused
[  170.482395] Exiting.
[  171.492696]  Reconnecting
[  171.494984] Error: Socket failed: Connection refused
[  171.494984] Exiting.
[  172.505718]  Reconnecting
[  172.508425] Error: Socket failed: Connection refused
[  172.508425] Exiting.
[  173.518706]  Reconnecting
[  173.521055] Error: Socket failed: Connection refused
[  173.521055] Exiting.
[  174.531709]  Reconnecting
[  174.534078] Error: Socket failed: Connection refused
[  174.534078] Exiting.
[  175.544887]  Reconnecting
[  175.547474] Error: Socket failed: Connection refused
[  175.547474] Exiting.
[  176.557703]  Reconnecting
[  176.559988] Error: Socket failed: Connection refused
[  176.559988] Exiting.
[  177.570708]  Reconnecting
[  177.574317] Error: Socket failed: Connection refused
[  177.574317] Exiting.
[  178.583621]  Reconnecting
[  178.587565] Error: Socket failed: Connection refused
[  178.587565] Exiting.
[  179.596929]  Reconnecting
[  179.600292] Error: Socket failed: Connection refused
[  179.600292] Exiting.
[  180.609744]  Reconnecting
[  180.613501] Error: Socket failed: Connection refused
[  180.613501] Exiting.
[  181.622742]  Reconnecting
[  181.624653] Error: Socket failed: Connection refused
[  181.624653] Exiting.
[  182.635739]  Reconnecting
[  182.639482] Error: Socket failed: Connection refused
[  182.639482] Exiting.
[  183.648633]  Reconnecting
[  183.650941] Error: Socket failed: Connection refused
[  183.650941] Exiting.
[  184.661574]  Reconnecting
[  185.674957] Error: Socket failed: Connection refused
[  185.674957] Exiting
[  186.095634] e1000: ens33 NIC Link is Down
[  186.687704]  Reconnecting
[  190.117042] e1000: ens33 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[  193.714773] Error: Cannot open NBD: No such file or directory
[  193.714773] Exiting.

So it fails in https://github.com/NetworkBlockDevice/nbd/blob/master/nbd-client.c#L1317. I also tried to reproduce using newer kernels, however there the nbd-client was always stopped after the initramfs finished executing, so the situation was actually much worse.

I think we'll try to use iscsi for our root devices now (especially as the nbd behaviour is not particularly nice when a connection drops: We get lot's of IO errors until the connection is restored which means most running programs will crash), but I still wanted to file this report as it may help others.

@yoe
Copy link
Member

yoe commented Sep 9, 2019

Can you please try to reproduce this using a more recent version of NBD? 3.14 is quite old, and I think I fixed some issues related to persist mode since then.

@chabad360
Copy link

chabad360 commented Jan 27, 2021

I've been able to consistently reproduce this issue but without the first line of error. My errors just start with block nbd0: shutting down socket. The issue only showed up in version 3.21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants