-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acceptance: TestDockerCLI failed #61896
Comments
seems like something changed with node recommissioning:
maybe we're not allowed to recommission a node that has been fully decommissioned anymore? i can remove the test, but rather have someone who has been working on this confirm that to be the case. |
Yes that is exactly correct. We changed that behavior a couple weeks ago. I am surprised this wasn't detected by the test when the corresponding PR was merged. |
on second thoughts, however, we're failing to start node 6 with |
I am confused - this later error suggests that the same (virtual) store was used to add a new node. That would be an error. We want new/fresh stores for new nodes. Is that error also happening for a fresh store? If so probably @irfansharif or @lunevalex can help. |
This is failing on the adding a fresh store step AFAICT. |
I wonder if @erikgrinaker has an opinion about this, as someone who recently worked in this space? |
I took a look and I can put blame neither on the decommissioning mechanism nor the transient cluster (though the transient cluster code is a bit jumbled; why do we have both of these: cockroach/pkg/cli/demo_cluster.go Lines 55 to 56 in 8c5253b
and we certainly don't decommission properly as we're never waiting for the replicas to drain, we just yank the node out right away: cockroach/pkg/cli/demo_cluster.go Lines 457 to 480 in 8c5253b
Another weird thing is that we seem to be using the The thing I really would like to see, and which is not available in the artifacts it seems, are the logs for the servers. |
demo server logs are disabled unfortunately |
we can probably repro locally too |
I couldn't repro locally, but weird things did happen and my CPU got pegged
at 100%
***@***.***:26257/movr> \demo add region=ca-central,zone=a
node 2 has been added with locality "region=ca-central,zone=a"
***@***.***:26257/movr> \demo add region=ca-central,zone=a
node 3 has been added with locality "region=ca-central,zone=a"
***@***.***:26257/movr> \demo add region=ca-central,zone=a
node 4 has been added with locality "region=ca-central,zone=a"
***@***.***:26257/movr> \demo decommission 4
node 4 has been decommissioned
***@***.***:26257/movr> \demo recommission 4
internal server error: failed to connect to the node: initial connection
heartbeat failed: rpc error: code = Unavailable desc = connection error:
desc = "transport: Error while dialing dial tcp 127.0.0.1:62056: i/o
timeout"
***@***.***:26257/movr> \demo recommission 4
internal server error: failed to connect to the node: initial connection
heartbeat failed: rpc error: code = Unavailable desc = connection error:
desc = "transport: Error while dialing dial tcp 127.0.0.1:62056: i/o
timeout"
***@***.***:26257/movr> \demo add region=ca-central,zone=a
node 5 has been added with locality "region=ca-central,zone=a"
Note how I didn't get the expected error during the recommission :shrug:
…On Tue, Mar 16, 2021 at 9:16 AM kena ***@***.***> wrote:
we can probably repro locally too
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#61896 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGXPZGYTTTTK43F4ORY6X3TD4HXDANCNFSM4ZBW4PYQ>
.
|
It's not failing all the time, otherwise it would have failed in the PR that caused the failure :) I'm going to stress this locally and report results. |
After 100 runs locally I am not able to repro either... However Tobias' remarks above about how the decommissioning process is effected are still applicable. I think the code should be massaged to mimic what is done by the |
Or rather the work should be done on the server side, so that we don't have to identical-but-different implementations on the client side. (Or the client side impls should be made shareable). |
(acceptance).TestDockerCLI failed on release-21.1@8a4e9de1cc8c150f9c95b505a4f748056974bba7:
MoreParameters:
Related:
See this test on roachdash |
Closing since this test failure is from an old branch |
(acceptance).TestDockerCLI failed on release-21.1@050385adeb3094405ce52e73f6deb03f75a96f1a:
More
Parameters:
Related:
See this test on roachdash
powered by pkg/cmd/internal/issues
Jira issue: CRDB-6260
The text was updated successfully, but these errors were encountered: