Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add draft implementation for canonical raft #5933

Closed
wants to merge 2 commits into from
Closed

feat: add draft implementation for canonical raft #5933

wants to merge 2 commits into from

Conversation

niebayes
Copy link

Goals:

Add draft implementation for canonical Raft, i.e. all requests, whether read or write requests, go to the Raft layer.

@niebayes niebayes marked this pull request as draft June 26, 2023 16:09
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider consistency mlde as part of the general raft_configuration?

@@ -280,3 +281,9 @@ def mixin_stateful_parser(parser):
nargs='+',
)

gp.add_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be at Deployment level, not Pod level

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, there is a specific section whrre u will find raft related args

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -55,9 +55,11 @@ message EndpointsProto {

// list of endpoints exposed by an Executor
repeated string endpoints = 1;
repeated string write_endpoints = 2;
// TODO(niebayes): regenerate proto files.
repeated string read_endpoints = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add an extra list, the difference between all the endpoints and write endpoints is already the read endpoints

Copy link
Author

@niebayes niebayes Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any endpoints not for read nor for write? For e.g. endpoints about shutting down an executor, or about upgrade an executor?
On the other hand, some endpoints may not want to involve Raft at all, i.e. we simply forward them to the executor directly rather than letting raft worker forward them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only write and read.

@@ -332,6 +411,13 @@ def _unwrap_batching_decorator(self, fn):
else:
return fn

def _unwrap_read_decorator(self, fn):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget about read decorator, everything that js not write, is read

@JoanFM
Copy link
Member

JoanFM commented Jun 26, 2023

Are we sure this is what we want? I think it does not make sense to have Read requests go fully into the Raft Layer. This would make the log grow insanely large for no reason. Isn't there a better way to achieve strong consensus?

@niebayes
Copy link
Author

@JoanFM You are right, if letting read requests go into the Raft layer, the log will grow insanely if the workload is read skewed. However, if not do so, the strong consistency mode is not guaranteed.
I'm currently adding an optimization called read index which makes the read requests not go into the Raft layer. It works as such:

  • client sends a read request to the leader
  • leader asks the Raft layer for its committed index. This index is stored and associated with the read request. We call this index the read index.
  • leader then broadcasts a checkQuorum RPC to all others.
  • followers respond with acknowledging the leadership or rejecting.
  • if leader accepts a majority of acknowledgements, the leader could confirm its leadership and it assures the leader has the most up-to-date log. Now, the leader waits for its applied index go equal or beyond the read index. At that moment, the leader could safely read the FSM without worrying breaks the strong consistency mode.

This optimization only involves an extra round of RPC interaction and read requests do not need to go fully into the Raft layer and hence log would not grow

I've investigated that Hashicorp Raft supports querying about committed index and applied index. But I'm not sure if it supports invoking a checkQuorum-like RPC. If not, I have to slightly modify Hashicorp Raft manually. Don't worry, I could handle it.

@niebayes
Copy link
Author

@JoanFM About whether or not wrapping consistency mode into the raft general configuration:
If you mean the configuration for Hashicorp Raft, no, we shall not. Since the consistency mode controls the behavior of the server layer.
Besides, I'm reading through Hashicorp consul codebase so that I can get more clue about how consul interacts with Hashicorp Raft.

@niebayes
Copy link
Author

@JoanFM About the RpcInterface:
image

Why we create an RpcInterface instance for each server registration? Is there any chance we could pass the reference for a single RpcInterface instance?

I want to implement the RpcInterface as a coordinator which manages the communication between the raft layer and the executor FSM. Can you provide any advices?

@JoanFM
Copy link
Member

JoanFM commented Jun 27, 2023

@JoanFM About the RpcInterface: image

Why we create an RpcInterface instance for each server registration? Is there any chance we could pass the reference for a single RpcInterface instance?

I want to implement the RpcInterface as a coordinator which manages the communication between the raft layer and the executor FSM. Can you provide any advices?

For this I am not sure, maybe u are right and we could go with only an instance

@JoanFM
Copy link
Member

JoanFM commented Jun 27, 2023

@JoanFM About whether or not wrapping consistency mode into the raft general configuration: If you mean the configuration for Hashicorp Raft, no, we shall not. Since the consistency mode controls the behavior of the server layer. Besides, I'm reading through Hashicorp consul codebase so that I can get more clue about how consul interacts with Hashicorp Raft.

@JoanFM You are right, if letting read requests go into the Raft layer, the log will grow insanely if the workload is read skewed. However, if not do so, the strong consistency mode is not guaranteed. I'm currently adding an optimization called read index which makes the read requests not go into the Raft layer. It works as such:

  • client sends a read request to the leader
  • leader asks the Raft layer for its committed index. This index is stored and associated with the read request. We call this index the read index.
  • leader then broadcasts a checkQuorum RPC to all others.
  • followers respond with acknowledging the leadership or rejecting.
  • if leader accepts a majority of acknowledgements, the leader could confirm its leadership and it assures the leader has the most up-to-date log. Now, the leader waits for its applied index go equal or beyond the read index. At that moment, the leader could safely read the FSM without worrying breaks the strong consistency mode.

This optimization only involves an extra round of RPC interaction and read requests do not need to go fully into the Raft layer and hence log would not grow

I've investigated that Hashicorp Raft supports querying about committed index and applied index. But I'm not sure if it supports invoking a checkQuorum-like RPC. If not, I have to slightly modify Hashicorp Raft manually. Don't worry, I could handle it.

Manually editing Hashicorl Raft is not a good option as then we would need to mantain the 2 codebases and make sure that updates on hashicorp are compatible, to me this is only good if Hashicorp takes a PR.

@JoanFM
Copy link
Member

JoanFM commented Jun 27, 2023

general configuration: If you mean the configuration

this can be a useful link:
hashicorp/raft#436

@niebayes
Copy link
Author

niebayes commented Jun 27, 2023

I've made a PR hashicorp/raft#560 about adding a CommitIndex for Hashicorp Raft.
With this, we can request the current commit index from Hashicorp Raft

@niebayes
Copy link
Author

niebayes commented Jun 27, 2023

I've also noticed that Hashicorp Raft has provided an API verifyLeader which seems equivalent to checkQuorum.
image
I'm not familiar about the usage of the API and the future. I'm wondering what I need to do is calling the verifyLeader API which gives me a future and then calling Error on the future to wait for the response. Am I right?

Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested, I like the refactoring on the RpcInterface.

Highlights:

  1. Remove the logic about ReadEndpoint as it is unnecessary.
  2. Processing ReadRequests in Strong consistency mode has to be done more optimally but without changing Hashicorp Raft library.

Good progress and understanding of the library and codebase I am seeing, thanks a lot and congrats

gp.add_argument(
'--consistency-mode',
type=str,
default='Strong',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make Eventual the default. Plus get the string from the Enum

@@ -280,3 +281,9 @@ def mixin_stateful_parser(parser):
nargs='+',
)

gp.add_argument(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

// TODO(niebayes): regenerate proto files.
read_endpoints := response.ReadEndpoints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this, the logic of write and read was there, there is no other set of endpoints

}
}

if !found {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not exist

@@ -268,6 +268,12 @@ def __init__(
threading.Lock()
) # watch because this makes it no serializable

overlap_endpoints = set(self.read_endpoints).intersection(self.write_endpoints)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this logic

@@ -382,6 +388,23 @@ def requests(self):
self._requests = copy.copy(self.requests_by_class[self.__class__.__name__])
return self._requests

@property
def read_endpoints(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -90,6 +90,83 @@ def _inherit_from_parent_class_inner(cls_):
_inherit_from_parent_class_inner(cls)


def read(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -877,6 +877,8 @@ async def endpoint_discovery(self, empty, context) -> jina_pb2.EndpointsProto:
self.logger.debug('got an endpoint discovery request')
endpoints_proto = jina_pb2.EndpointsProto()
endpoints_proto.endpoints.extend(list(self._executor.requests.keys()))
# TODO(niebayes): add read_endpoints to the proto.
endpoints_proto.read_endpoints.extend(list(self._executor.read_endpoints))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@JoanFM
Copy link
Member

JoanFM commented Jun 27, 2023

I've also noticed that Hashicorp Raft has provided an API verifyLeader which seems equivalent to checkQuorum. image I'm not familiar about the usage of the API and the future. I'm wondering what I need to do is calling the verifyLeader API which gives me a future and then calling Error on the future to wait for the response. Am I right?

I believe so yes

@niebayes
Copy link
Author

As we know, the read index requires the applied index. However, the Raft layer only knows the last applied index which is the index of the log it most recently sends to the FSM but the FSM maybe not consume the log yet. So the AppliedIndex API maybe returns a more up-to-date index.
image
So, in order to implement the read index optimization. We need to record the last applied index the executor FSM has consumed.

@niebayes
Copy link
Author

Another issue is the determinism of the FSM Apply. As you can see, the Hashicorp Raft requires the Apply implementation to be deterministic.
image

But our Apply implementation has many error handlings. Would they make the Apply not deterministic?
image

@JoanFM
Copy link
Member

JoanFM commented Jun 27, 2023

recently sends to the FSM but the FSM maybe not consume the log yet. So th

This is a known issue. Indeed, if there is a connection errror to the Executor, it may be problematic to the FSM consistency.

This is why I would like to have the Python object wrapped inside the Golang code to avoid having this communication layer.

The Unmarshalling error should be deterministic however.

@niebayes niebayes closed this by deleting the head repository Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants