Skip to content

Commit

Permalink
fix(cluster) use 'listen_address' for contact point in refresh()
Browse files Browse the repository at this point in the history
Previously, using `coordinator.host` to add the contact point to the
LB policy means that if the user specified a hostname, then it would be
used to index this node instead of the IP address. Nothing harmful in
that except some inconsistent log messages (sometimes an IP address
shows up, other times a hostname).

Problem
-------

An issue arises however when:

1. Several Cluster instances call `:refresh()` on the same C* cluster
2. DNS round-robin is in effect for the contact point hostnames

Let's consider clusterA and clusterB, both instances of the Cluster
module. Let's also consider the following C* cluster:

    10.16.0.1 node1
    10.16.0.2 node2

And the following DNS record:

    cassandra.default.svc.cluster.local. 30    IN A    10.16.0.1
    cassandra.default.svc.cluster.local. 30    IN A    10.16.0.2

First, clusterA calls `refresh()`, with `contact_points = { "cassandra"
}`, and as a result inserts the following topology in the cluster's shm:

    cassandra:[peer info]
    10.16.0.2:[peer info]

Its LB policy now has 2 entries: `cassandra` and `10.16.0.2`.

Then, clusterB calls `refresh()` as well, with the same `contact_points`
option, and as a result first purges the cluster's shm content, before
inserting the following:

    10.16.0.1:[peer info]
    cassandra:[peer info]

Note that because of the round-robin DNS resolution, `cassandra` pointed
to `10.16.0.2` this time.

Now, when clusterA will invoke its LB policy to elect a peer for a given
query, it will eventually look for `10.16.0.2`. However, such an entry
does not exist in the cluster's shm anymore. Therefore, the following
error is returned:

    no host details for 10.16.0.2

Proposed solution
-----------------

By replacing the cache key of the peer's info in the shm from the
specified `contact_point` value (which is the user's input), to the
`listen_address` column of the `system.local` table, do not store hosts
details by hostname anymore.

This has the added benefit of ensuring all logs and other operations
done by the Cluster module are always using the IP address of the node.

From #118
  • Loading branch information
thibaultcha authored Aug 9, 2018
1 parent f43c638 commit 8b9fcd9
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions lib/resty/cassandra/cluster.lua
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ function _Cluster:refresh()
if not coordinator then return nil, err end

local local_rows, err = coordinator:execute [[
SELECT data_center,rpc_address,release_version FROM system.local
SELECT data_center,listen_address,release_version FROM system.local
]]
if not local_rows then return nil, err end

Expand All @@ -493,7 +493,7 @@ function _Cluster:refresh()
coordinator:setkeepalive()

rows[#rows+1] = { -- local host
rpc_address = coordinator.host,
rpc_address = local_rows[1].listen_address,
data_center = local_rows[1].data_center,
release_version = local_rows[1].release_version
}
Expand Down

0 comments on commit 8b9fcd9

Please sign in to comment.