-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-connect for nextgenrepl real-time #1804
Comments
The implementation of this will use a separate process When triggered the peer discovery process will:
The trigger will be a slow poll, perhaps once per hour by default (randomised to reduce risk of coordination). If a peer temporarily goes down/up, the existing There needs to be two additional console commands:
The only change required to If peer discovery is enabled, the behaviour of If no peers are discovered, then the The new peer discovery process will only work should the capability exist in both clusters. Replication will not work as expected if peer discovery is enabled, and all nodes in all clusters are at the minimum required version. Peer discovery must only be enabled, once the administrator has confirmed that it is supported across the domain - this will not fail gracefully. If peer discovery is enabled, when riak starts, the behaviour of the snk will be as if discovery was disabled until the first peer discovery event occurs (i.e. just the configured peers will be used). |
See #1804 The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes. Default behaviour is to always fallback to configured behaviour. Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes.
See basho#1804 The heart of the problem is how to avoid needing configuration changes on sink clusters when source clusters are bing changed. This allows for new nodes to be discovered automatically, from configured nodes. Default behaviour is to always fallback to configured behaviour. Worker Counts and Per Peer Limits need to be set based on an understanding of whether this will be enabled. Although, if per peer limit is left to default, the consequence will be the worker count will be evenly distributed (independently by each node). Note, if Worker Count mod (Src Node Count) =/= 0 - then there will be no balancing of the excess workers across the sink nodes. # Conflicts: # rebar.config # src/riak_kv_replrtq_peer.erl # src/riak_kv_replrtq_snk.erl # src/riak_kv_replrtq_src.erl
As a general rule, a design decision was made for nextgenrepl to lean more heavily on setup being discovered by operator configuration rather than discovery. This reduces complexity in the code.
However, this creates an overhead for operators, especially when considering cluster changes where nextgenrepl real-time replication is used.
Currently each node in the "sink" is configured with a static set of peers in the source to connect with:
However, if we join nodes into the source cluster, as soon as that node can start to coordinate PUTs a queue of real-time replication events will begin to form. This now requires the operator to script real-time changes to the sink cluster to reflect this.
There exists too much scope for operator error in this case.
What is proposed instead is that there will be a new config item:
Ideally, different sink peers should be configured with different source peers for discovery (or a load-balance address be used).
This will not work with NAT - in this case replrtq_peer_discovery must be disabled, and the NAT'd addresses used in the replrtq_sinkpeers configuration for static peer relationships.
The text was updated successfully, but these errors were encountered: