Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC Error #101

Closed
dtonnesen opened this issue Nov 16, 2021 · 9 comments
Closed

RPC Error #101

dtonnesen opened this issue Nov 16, 2021 · 9 comments
Assignees
Labels
type/question Ask a question. This is the default label associated with a question issue.

Comments

@dtonnesen
Copy link

dtonnesen commented Nov 16, 2021

How can the Team help you today?

Attempting to install CSM. Vanilla K8s, v1.20.9 Ubuntu. Using default parameters and only modifying the 4 parameters in values.yaml with no certificates.

root@dsib0211:~/csm# more values.yaml

jwtKey: key

cipherKey: "aasdfgafhgshsffadgshsdffgsdggggg"

adminUserName: admin

adminPassword: admin

helm install -n csm-installer --set-string scheme=http --set-string dbSSLEnabled="false" --create-namespace csm-installer dell/csm-installer -f values.yaml

NAME: csm-installer
LAST DEPLOYED: Tue Nov 16 10:15:21 2021
NAMESPACE: csm-installer
STATUS: deployed
REVISION: 1
TEST SUITE: None

Deployment seems to succeed but the cockroachdb does not start:
NAMESPACE NAME READY STATUS
csm-installer cluster-init-sjtdr 1/1 Running
csm-installer cockroachdb-0 0/1 Running
csm-installer cockroachdb-1 0/1 Running
csm-installer dell-csm-installer-86665ffb7d-dphzh 1/1 Running

Looking at the logs for the cluster-init I see constant repetition of error below which I assume is referring to the db:

kubectl logs -f cluster-init-sjtdr -n csm-installer

warning: node not ready to perform cluster initialization: initial connection heartbeat failed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: i/o timeout" (retrying)

cockroachdb logs have connectivity issue:

kubectl logs -f cockroachdb-0 -n csm-installer

W211116 15:16:03.965422 115 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ [-] 26 ‹grpc: addrConn.createTransport failed to connect to {cockroachdb-1.cockroachdb:26257 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout". Reconnecting...›
W211116 15:16:03.965681 46 server/init.go:374 ⋮ [n?] 27 outgoing join rpc to ‹cockroachdb-1.cockroachdb:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"›

I have thoroughly searched for troubleshooting information and tried changing values like host and api server in the values.yaml but same error. I'm happy to provide additional information if useful. Thanks.

@dtonnesen dtonnesen added the type/question Ask a question. This is the default label associated with a question issue. label Nov 16, 2021
@hoppea2
Copy link
Collaborator

hoppea2 commented Nov 17, 2021

Thanks for you submission @dtonnesen, we'll triage this shortly.

@tdawe
Copy link
Collaborator

tdawe commented Nov 17, 2021

Hi @dtonnesen , can you re-install the CSM installer and set dbInstanceCount to 1 in the values.yaml file. This will only deploy a single instance of cockroachdb and may help get around the issue where multiple instances are unable to join.

@dtonnesen
Copy link
Author

dtonnesen commented Nov 17, 2021

Thank you I tried that too I should have mentioned sorry. I can do it again if you think the logs might be different.

@tdawe
Copy link
Collaborator

tdawe commented Nov 17, 2021

Yes, please try again and check if the logs are different. Prior to re-installing you can delete the /var/lib/cockroachdb directories from the k8s worker nodes (see https://dell.github.io/csm-docs/docs/deployment/troubleshooting/#why-does-the-cluster-init-pod-show-the-error-cluster-has-already-been-initialized) which are used to persist the database data. This will ensure it's a fresh install and not using any old data from cockroachdb.

@dtonnesen
Copy link
Author

Sure, I did do that last time also but I'll do it again.

@dtonnesen
Copy link
Author

Looks to be the same errors:

I211117 13:12:51.808172 28 server/init.go:197 ⋮ [n?] 30 awaiting cockroach init or join with an already initialized node
W211117 13:13:11.808789 95 vendor/google.golang.org/grpc/internal/channelz/logging.go:73 ⋮ [-] 31 ‹grpc: addrConn.createTransport failed to connect to {cockroachdb-0.cockroachdb:26257 0 }. Err: connection error: desc = "transport: Error while dialing dial tcp: i/o timeout". Reconnecting...›
W211117 13:13:11.809074 93 server/init.go:374 ⋮ [n?] 32 outgoing join rpc to ‹cockroachdb-0.cockroachdb:26257› unsuccessful: ‹rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"›

transport: Error while dialing dial tcp: i/o timeout" (retrying)
warning: node not ready to perform cluster initialization: initial connection heartbeat failed: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: i/o timeout" (retrying)

@dtonnesen
Copy link
Author

@tdawe @hoppea2 Any additional thoughts on what I can try here? I've done all I can outside of the CSM installation, including implementing the replication module, so this is impeding my ability to finish the project. Thanks.

@tdawe
Copy link
Collaborator

tdawe commented Nov 18, 2021

Currently reviewing the environment

@tdawe
Copy link
Collaborator

tdawe commented Nov 22, 2021

Closing this question as we've discussed a workaround using the helm charts.

@tdawe tdawe closed this as completed Nov 22, 2021
csmbot pushed a commit that referenced this issue Aug 1, 2023
* [replication] Added upgrade page and updated install info (#57)

* Added note about repctl logs file

* Added upgrade instructions for both controller and sidecar

* modified installation\upgrade section

* Fixed couple of grammar mistakes

* Added new entry to troubleshooting page

* Addressed review comments

* Changed link address

Co-authored-by: Maxim Sklyarov <[email protected]>

* Update deployment steps for CSM Authorization (#58)

* begin updating deployment

* fixed typos

* add auth upgrade doc

* updated powerscale with authorization

* updated authorization documentation for powermax, powerflex, and powerscale

* refactored for powermax

* added vxflexos related docs for auth deployment and configuration

* consolidated proxy server root cert

* fix grammar, notes, value.yaml parameters, update auth deployment

* added note for driver configurations with auth

* updated note

* add auth note to drivers

* update upgrade path

Co-authored-by: atye <[email protected]>
Co-authored-by: sharmilarama <[email protected]>
Co-authored-by: Logan Jones <[email protected]>

* Fix operator install docs (#62)

* Small update to the contributing doc (#54)

* Update _index.md

* Update _index.md

* fixed sidecar instructions

* Update _index.md

* making changes requested by Aron

* trying to get rid of unwanted changes

Co-authored-by: gallacher <[email protected]>

* add Volume Health Monitor section (#67)

* add Volume Health Monitor section

* PR feedback

* pv/pvc metrics csi-powerstore changes (#64)

* Added troubleshooting documentation about gateway timeout for authorization (#63)

* Upgrade and Rollback Support for CSM for Authorization proxy server (#66)

* added auth upgrade and rollback, updated auth notes for drivers

* fixed spacing

* [replication] Added uninstall page, updated repctl readme (#70)

* static provisioning and ephemeral changes (#71)

* Update uninstall.md

* updated auuth deployment steps (#72)

* add  healthMonitorInterval to values table (#79)

* Helm install update (#74)

* updating helm install instructions

* adding troubleshooting for helm update

* minor changes and updates

* more minor changes

* word change

* more minor changes

* addressing comments from Jacob

* fixing numbers

* update code owners (#76)

* Move health monitor section to correct file  (#81)

* update correct file

* remove feature from wrong file

* Removed older OpenShift and added new driver versions (#84)

* Feature rwop csi powerstore (#89)

* Documentation for RWOP - CSI Powerstore

* Addressed review comment

* Update powerstore.md

Co-authored-by: shanmydell <[email protected]>

* Feature rwop accessmode support for csi-powerscale (#90)

Co-authored-by: shanmydell <[email protected]>

* Tenant documentation for both csi-unity and operator (#85)

Co-authored-by: shanmydell <[email protected]>

* Replication prerequisites & troubleshooting (#93)

Co-authored-by: shanmydell <[email protected]>

* Feature/pvc metrics csi powerstore update (#91)

* volume health monitoring update (#92)

* volume health monitoring update

* Update powerscale.md

* update documentation for health monitoring

Co-authored-by: shanmydell <[email protected]>
Co-authored-by: Randeep Sharma <[email protected]>
Co-authored-by: Bahubali Jain <[email protected]>

* Changed replication support matrix (#94)

* Changed replication support matrix

* Changed to X

* Add health values (#95)

* add new values to values table

* Add note to features section

* fix typo

* Common changes (#86)

* Unity - RWOP Access Mode and Volume Health Monitoring (#77)

* RWOP support matrix change (#96)

* Added known issue for unity (#97)

* Update powerflex.md (#98)

* powerscale release notes updated (#99)

* Operator Docs changes related to Unity features (#102)

* Operator upgrade documentation for volume health monitor changes (#104)

* Added note about how to list volume snapshots (#101)

* restructured deployment docs (#106)

* Improve operator install steps (#107)

* Update versions (#100)

* Added note that clarifies keys for csm installer (#108)

* Added volume health monitor in CSI spec support (#109)

* updated sample update for topology usage (#112)

#82

Co-authored-by: Andrey Schipilo <[email protected]>
Co-authored-by: Maxim Sklyarov <[email protected]>
Co-authored-by: shaynafinocchiaro <[email protected]>
Co-authored-by: atye <[email protected]>
Co-authored-by: sharmilarama <[email protected]>
Co-authored-by: Logan Jones <[email protected]>
Co-authored-by: Jooseppi Luna <[email protected]>
Co-authored-by: JacobGros <[email protected]>
Co-authored-by: Ashish Verma <[email protected]>
Co-authored-by: Trevor Dawe <[email protected]>
Co-authored-by: gilltaran <[email protected]>
Co-authored-by: hoppea2 <[email protected]>
Co-authored-by: Francis Nijay <[email protected]>
Co-authored-by: shanmydell <[email protected]>
Co-authored-by: Bahubali Jain <[email protected]>
Co-authored-by: karthikk92 <[email protected]>
Co-authored-by: Sakshi-dell <[email protected]>
Co-authored-by: Randeep Sharma <[email protected]>
Co-authored-by: Bahubali Jain <[email protected]>
Co-authored-by: rensyct <[email protected]>
Co-authored-by: Rajendra Indukuri <[email protected]>
Co-authored-by: abhi16394 <[email protected]>
Co-authored-by: panigs7 <[email protected]>
Co-authored-by: Prasanna M <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Ask a question. This is the default label associated with a question issue.
Projects
None yet
Development

No branches or pull requests

4 participants