Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New hosts connection details are not updated #1667

Open
raghu-nandan-bs opened this issue Dec 30, 2022 · 2 comments
Open

New hosts connection details are not updated #1667

raghu-nandan-bs opened this issue Dec 30, 2022 · 2 comments

Comments

@raghu-nandan-bs
Copy link

raghu-nandan-bs commented Dec 30, 2022

What version of Cassandra are you using?

datastax 6.8.28

What version of Gocql are you using?

1.3.1

What version of Go are you using?

go1.19.2

What did you do?

cassandra is deployed and managed using k8ssandra-oprerator
certain pods of cassandra were replaced due to a kubernetes node-down event.

What did you expect to see?

gocql should gracefully handle changes in topology

What did you see instead?

when cassandra nodes (pods) are replaced, their IPs change, gocql client does not update hostinfo with new IPs

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

  • output of nodetool status
    all nodes are in UN state
  • output of SELECT peer, rpc_address FROM system.peers
  • rebuild your application with the gocql_debug tag and post the output
time="2022-12-23T07:00:55Z" level=debug msg="gocql: Session.handleNodeDown: 10.244.101.39:9042\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:00Z" level=error msg="error connecting: dial tcp 10.244.125.76:9042: i/o timeout (hostinfo=[HostInfo hostname=\"10.244.125.76\" connectAddress=\"10.244.125.76\" peer=\"10.244.125.76\" rpc_address=\"10.244.125.76\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.244.125.76\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"datacenter-id\" rack=\"rack-2\" host_id=\"59b3e71a-319e-4438-8104-67e536c1977f\" version=\"v4.0.0\" state=DOWN num_tokens=8])" Id=2
time="2022-12-23T07:01:00Z" level=debug msg="gocql: connection failed \"10.244.125.76\": dial tcp 10.244.125.76:9042: i/o timeout, reconnecting with *gocql.ConstantReconnectionPolicy\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:06Z" level=error msg="error connecting: dial tcp 10.244.125.76:9042: i/o timeout (hostinfo=[HostInfo hostname=\"10.244.125.76\" connectAddress=\"10.244.125.76\" peer=\"10.244.125.76\" rpc_address=\"10.244.125.76\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.244.125.76\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"datacenter-id\" rack=\"rack-2\" host_id=\"59b3e71a-319e-4438-8104-67e536c1977f\" version=\"v4.0.0\" state=DOWN num_tokens=8])" Id=2
time="2022-12-23T07:01:06Z" level=debug msg="gocql: connection failed \"10.244.125.76\": dial tcp 10.244.125.76:9042: i/o timeout, reconnecting with *gocql.ConstantReconnectionPolicy\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:12Z" level=error msg="error connecting: dial tcp 10.244.125.76:9042: i/o timeout (hostinfo=[HostInfo hostname=\"10.244.125.76\" connectAddress=\"10.244.125.76\" peer=\"10.244.125.76\" rpc_address=\"10.244.125.76\" broadcast_address=\"<nil>\" preferred_ip=\"<nil>\" connect_addr=\"10.244.125.76\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"datacenter-id\" rack=\"rack-2\" host_id=\"59b3e71a-319e-4438-8104-67e536c1977f\" version=\"v4.0.0\" state=DOWN num_tokens=8])" Id=2
time="2022-12-23T07:01:12Z" level=debug msg="gocql: connection failed \"10.244.125.76\": dial tcp 10.244.125.76:9042: i/o timeout, reconnecting with *gocql.ConstantReconnectionPolicy\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:13Z" level=debug msg="gocql: unable to dial \"[HostInfo hostname=\\\"10.244.125.76\\\" connectAddress=\\\"10.244.125.76\\\" peer=\\\"10.244.125.76\\\" rpc_address=\\\"10.244.125.76\\\" broadcast_address=\\\"<nil>\\\" preferred_ip=\\\"<nil>\\\" connect_addr=\\\"10.244.125.76\\\" connect_addr_source=\\\"connect_address\\\" port=9042 data_centre=\\\"datacenter-id\\\" rack=\\\"rack-2\\\" host_id=\\\"59b3e71a-319e-4438-8104-67e536c1977f\\\" version=\\\"v4.0.0\\\" state=DOWN num_tokens=8]\": dial tcp 10.244.125.76:9042: i/o timeout\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:13Z" level=debug msg="gocql: filling stopped \"10.244.125.76\": dial tcp 10.244.125.76:9042: i/o timeout\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:13Z" level=debug msg="gocql: conns of pool after stopped \"10.244.125.76\": 0\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:13Z" level=debug msg="gocql: Session.handleNodeDown: 10.244.125.76:9042\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=debug msg="gocql: pool connection error \"10.244.111.100:9042\": read tcp 10.244.113.56:60504->10.244.111.100:9042: read: connection reset by peer\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=error msg="error in query: read tcp 10.244.113.56:60504->10.244.111.100:9042: read: connection reset by peer" Id=1750
time="2022-12-23T07:01:22Z" level=error msg="error in query: gocql: connection closed waiting for response" Id=3934
time="2022-12-23T07:01:22Z" level=debug msg="gocql: pool connection error \"10.244.111.100:9042\": read tcp 10.244.113.56:60492->10.244.111.100:9042: read: connection reset by peer\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=error msg="error in query: read tcp 10.244.113.56:60492->10.244.111.100:9042: read: connection reset by peer" Id=1906
time="2022-12-23T07:01:22Z" level=error msg="error in query: read tcp 10.244.113.56:60504->10.244.111.100:9042: read: connection reset by peer" Id=2833
time="2022-12-23T07:01:22Z" level=error msg="error in query: gocql: no hosts available in the pool" Id=1
time="2022-12-23T07:01:22Z" level=error msg="error in query: read tcp 10.244.113.56:60504->10.244.111.100:9042: read: connection reset by peer" Id=1439
time="2022-12-23T07:01:22Z" level=error msg="error in query: read tcp 10.244.113.56:60492->10.244.111.100:9042: read: connection reset by peer" Id=3729
time="2022-12-23T07:01:22Z" level=error msg="error in query: gocql: no hosts available in the pool" Id=1246
time="2022-12-23T07:01:22Z" level=error msg="error connecting: dial tcp 10.244.111.100:9042: connect: connection refused (hostinfo=[HostInfo hostname=\"10.244.111.100\" connectAddress=\"10.244.111.100\" peer=\"<nil>\" rpc_address=\"10.244.111.100\" broadcast_address=\"10.244.111.100\" preferred_ip=\"<nil>\" connect_addr=\"10.244.111.100\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"datacenter-id\" rack=\"rack-3\" host_id=\"52fbc88e-8f99-4f42-9882-a16a6018527a\" version=\"v4.0.0\" state=UP num_tokens=8])" Id=2
time="2022-12-23T07:01:22Z" level=debug msg="gocql: unable to dial control conn 10.244.111.100:9042: dial tcp 10.244.111.100:9042: connect: connection refused\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=error msg="error in query: gocql: no hosts available in the pool" Id=3871
time="2022-12-23T07:01:22Z" level=error msg="error connecting: dial tcp 10.244.111.100:9042: connect: connection refused (hostinfo=[HostInfo hostname=\"10.244.111.100\" connectAddress=\"10.244.111.100\" peer=\"<nil>\" rpc_address=\"10.244.111.100\" broadcast_address=\"10.244.111.100\" preferred_ip=\"<nil>\" connect_addr=\"10.244.111.100\" connect_addr_source=\"connect_address\" port=9042 data_centre=\"datacenter-id\" rack=\"rack-3\" host_id=\"52fbc88e-8f99-4f42-9882-a16a6018527a\" version=\"v4.0.0\" state=UP num_tokens=8])" Id=2
time="2022-12-23T07:01:22Z" level=debug msg="gocql: unable to dial \"[HostInfo hostname=\\\"10.244.111.100\\\" connectAddress=\\\"10.244.111.100\\\" peer=\\\"<nil>\\\" rpc_address=\\\"10.244.111.100\\\" broadcast_address=\\\"10.244.111.100\\\" preferred_ip=\\\"<nil>\\\" connect_addr=\\\"10.244.111.100\\\" connect_addr_source=\\\"connect_address\\\" port=9042 data_centre=\\\"datacenter-id\\\" rack=\\\"rack-3\\\" host_id=\\\"52fbc88e-8f99-4f42-9882-a16a6018527a\\\" version=\\\"v4.0.0\\\" state=UP num_tokens=8]\": dial tcp 10.244.111.100:9042: connect: connection refused\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=debug msg="gocql: filling stopped \"10.244.111.100\": dial tcp 10.244.111.100:9042: connect: connection refused\n" sess-key="database-dummy.domain.com:9042"
time="2022-12-23T07:01:22Z" level=error msg="error in query: gocql: no hosts available in the pool" Id=1159

@raghu-nandan-bs
Copy link
Author

https://github.com/gocql/gocql/pull/1009/files

I think the following changes affected updating of IPs in the client, during event handling

@mikelococo
Copy link

It seems likely to me that #1668 is another case of this happening. Note in that ticket we've conducted manual tests to reproduce the issue and bisected it to #1632. If they are indeed the same underlying cause, I'm skeptical that https://github.com/gocql/gocql/pull/1009/files is related as is suggested in a comment above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants