-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: cockroach zip will not complete if there are decommissioned nodes #43966
Comments
Triaging with Ron now, we should fix this ASAP as it blocks support for any cluster with decommed nodes. Will have Ron bring this up at the SIG while I'm OOO. |
Seems to be trivially reproducible. Started a local cluster with 4 nodes, decommissioned one, tried to debug zip, failed when it got to the 3rd node which was decommissioned. A file is created, but seems to be writing the file completely before it zips it, so the file on my local machine only has encoded data on it. |
@roncrdb do you know if this specific to 19.2/master, or does it also repro with 19.1/2.1? If also in other versions I'll go for a simpler fix which can be more readily backported. |
@knz I have not tested on 19.1/2.1 |
@knz tested this on a roachprod cluster, decommissioned a node on both 2.1.10 and 19.1.6 both completed the debug zip file without the node that was decommissioned as was expected. So it does not fail but it does complain that it cannot connect to the node that is offline, skips that node, and finishes creating the zip file which is what 19.2 should do but instead fails. |
Found the bug, will send PR out |
Reported by @roncrdb
Currently if some nodes are fully decommissioned (i.e. also down)
cockroach zip
will still try (and fail) to connect to them and retrieve data.Failing to do so, it reports noisy error messages like this:
This stops the cockroach debug zip from completing a rolling restart may fix this issue, but it would be good to find the root cause of what is happening as well.
The text was updated successfully, but these errors were encountered: