Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iscsi: capture and log iscsi disconnect error #43

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/iscsi/iscsi.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ func getISCSIInfo(req *csi.NodePublishVolumeRequest) (*iscsiDisk, error) {
}

for _, portal := range portals {
bkportal = append(bkportal, portalMounter(string(portal)))
bkportal = append(bkportal, portalMounter(portal))
}

iface := req.GetVolumeContext()["iscsiInterface"]
Expand Down
7 changes: 4 additions & 3 deletions pkg/iscsi/iscsi_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,9 +109,10 @@ func (util *ISCSIUtil) DetachDisk(c iscsiDiskUnmounter, targetPath string) error
return err
}

iscsiLib.Disconnect(connector.TargetIqn, connector.TargetPortals)

if err := os.RemoveAll(targetPath); err != nil {
if disConnectErr := iscsiLib.Disconnect(connector.TargetIqn, connector.TargetPortals); disConnectErr != nil {
klog.Warningf("Warning: Disconnect failed for IQN: %v", connector.TargetIqn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a scenario where disconnect fails but we want to continue? Would this potentially leak connections?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should always continue when dealing with iscsi disconnects IMO, and in fact we are; this change only add a log warning if the disconnect fails but it doesn't change or stop the disconnect sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we are just capturing and moving on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm wondering about is if we should be retrying to disconnect after the first failed attempt. Could we end up leaking a bunch of connections after some time and cause other issues down the line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@humblec is this resolved? Should we be returning an error and failing instead of continuing? This is the error sequence I'm concerned about:

  1. Disconnect fails. We log and move on
  2. Next step to os.Remove succeeds.
  3. Since NodeUnpublishVolume succeeds we never come back and try to clean up this connection and the LUN still remains attached to this node.

The reason why I'm concerned about this is because we encountered data corruption in in-tree fc plugin due to the volume not being successfully cleaned up from the node and a pod rescheduled to another node and then rescheduled again back to the original node. However the disk was not reattached since it was already connected and the pod ended up updating stale data causing corruption. kubernetes/kubernetes#101791

A couple more questions about this disconnect and general unpublish flow:

  • Can you have multiple volumes on the same node sharing the same connection? Would disconnecting here cause other mounted volumes to fail?
  • Does disconnect also detach volumes from the node?

cc @jsafrane

Copy link
Contributor Author

@humblec humblec Sep 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msau42 there are some good amount of changes required in the csi-lib-iscsi which has been done at my local test , but not part of this PR. There is also a PR (kubernetes-csi/csi-lib-iscsi#29) from @paullaffitte which carry almost the same fixes , for example # kubernetes-csi/csi-lib-iscsi#29 (comment), but need to make sure we are not breaking the consumers with those changes. I have a discussion scheduled with Paul about coordinating the effort and to get the library in a good shape first so that it can be pulled over here.

Cc @xing-yang @jsafrane @saad-ali

I will revert on the final action items to take these forward.

Till then we can hold this PR.

}
if err := os.Remove(targetPath); err != nil {
klog.Errorf("iscsi: failed to remove mount path Error: %v", err)
return err
}
Expand Down