Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-connect for nextgenrepl real-time #1804

Closed
martinsumner opened this issue Nov 12, 2021 · 1 comment
Closed

Auto-connect for nextgenrepl real-time #1804

martinsumner opened this issue Nov 12, 2021 · 1 comment
Assignees

Comments

@martinsumner
Copy link
Contributor

As a general rule, a design decision was made for nextgenrepl to lean more heavily on setup being discovered by operator configuration rather than discovery. This reduces complexity in the code.

However, this creates an overhead for operators, especially when considering cluster changes where nextgenrepl real-time replication is used.

Currently each node in the "sink" is configured with a static set of peers in the source to connect with:

%% @doc A list of peers is required to inform the sink node how to reach the
%% src.  All src nodes will need to have entries consumed - so it is
%% recommended that each src node is referred to in multiple sink node
%% configurations.
%% The list of peers is tokenised as host:port:protocol
%% In exceptional circumstances this definition can be extended to
%% queuename:host:port:protocol - but restricting the definitions of queuename
%% to the single queue specified in replrtq_sinkqueue is strongly recommended.
{mapping, "replrtq_sinkpeers", "riak_kv.replrtq_sinkpeers", [
    {datatype, string},
    {commented, "127.0.0.1:8098:http"}
]}.

However, if we join nodes into the source cluster, as soon as that node can start to coordinate PUTs a queue of real-time replication events will begin to form. This now requires the operator to script real-time changes to the sink cluster to reflect this.

There exists too much scope for operator error in this case.

What is proposed instead is that there will be a new config item:

%% @doc Use peer list for discovery only
%% If enabled rplrtq_sinkpeers will be used for discovery only.
%% A call will be made to one of those peers (at random) to get a list of upnodes, and
%% their open port. Periodically (every 60s), the call to the peer will be repeated.  The
%% actual replrtq_sinkpeers used in the configuration will be those returned by this
%% request
{mapping, "replrtq_peer_discovery", "riak_kv.replrtq_peer_discovery", [
    {datatype, {flag, enabled, disabled}},
    {default, disabled},
    {commented, disabled}
]}.

Ideally, different sink peers should be configured with different source peers for discovery (or a load-balance address be used).

This will not work with NAT - in this case replrtq_peer_discovery must be disabled, and the NAT'd addresses used in the replrtq_sinkpeers configuration for static peer relationships.

@martinsumner
Copy link
Contributor Author

martinsumner commented Feb 25, 2022

The implementation of this will use a separate process riak_kv_replrtq_peer. This process will always be started, but will only actively poll if peer_discovery is enabled.

When triggered the peer discovery process will:

  • View the Queue/Peer configuration passed in at startup app_helper:get_env(riak_kv, replrtq_sinkpeers, "").
  • Call the riak_kv_replrtq_snk for a {list of current peers, worker_count, per_peer_limit} for each of the configured queues.
  • Call each configured peer for that queue, and discover the active peers within the cluster (this process should be managed by the riak_kv_replrtq_peer process on the source via a new API call).
  • Find the union of all active peers discovered for that queue.
  • Compare the new set of peers with the previous set of current peers returned from the riak_kv_replrtq_snk - if the sets of peers match, then no action should be taken.
  • If the peers differ, use the remove_snkqueue/1 & add_snkqueue/4 to update the configuration of the riak_kv_replrtq_snk.

The trigger will be a slow poll, perhaps once per hour by default (randomised to reduce risk of coordination). If a peer temporarily goes down/up, the existing riak_kv_replrtq_snk should handle this as normal.

There needs to be two additional console commands:

  • reset_all_peers(QueueName): visit each active peer in this cluster and trigger (one at a time) the peer discovery process for each active node in the cluster, for the given queue. The command will have no effect where discovery is not enabled.
  • reset_all_workercounts(WorkerCount, PerPeerLimit): visit each active peer, change the default values for the WorkerCount and PerPeerLimit, and then reset each Queue configuration to reflect these new counts

The only change required to riak_kv_replrtq_snk is the capability to extract the current peer information. If peer discovery is not enabled (which will be the default), this will behave exactly as before - so the new behaviour is backwards compatible. Any issues with peer discovery, and it can be disabled and previous configuration be relied upon.

If peer discovery is enabled, the behaviour of riak_kv_replrtq_snk is only interrupted should there be a need for configuration change. From a configuration perspective, all known peers can continue to be configured, without penalty. The riak_kv_replrtq_peer process should handle exceptions when contacting a peer, the unavailability of a configured peer should not crash the process.

If no peers are discovered, then the riak_kv_replrtq_snk will be reverted back to the peer settings as configured at startup.

The new peer discovery process will only work should the capability exist in both clusters. Replication will not work as expected if peer discovery is enabled, and all nodes in all clusters are at the minimum required version. Peer discovery must only be enabled, once the administrator has confirmed that it is supported across the domain - this will not fail gracefully.

If peer discovery is enabled, when riak starts, the behaviour of the snk will be as if discovery was disabled until the first peer discovery event occurs (i.e. just the configured peers will be used).

martinsumner added a commit that referenced this issue May 11, 2022
See #1804

The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes.

Default behaviour is to always fallback to configured behaviour.

Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes.
hmmr pushed a commit to TI-Tokyo/basho-riak_kv that referenced this issue Nov 23, 2022
See basho#1804

The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes.

Default behaviour is to always fallback to configured behaviour.

Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes.
# Conflicts:
#	rebar.config
#	src/riak_kv_replrtq_peer.erl
#	src/riak_kv_replrtq_snk.erl
#	src/riak_kv_replrtq_src.erl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant