-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo analysis of uncontended register behaviour #101185
Repo analysis of uncontended register behaviour #101185
Conversation
WIP because:
|
854c212
to
9d3cdb9
Compare
Today repository analysis verifies that a register behaves correctly under contention, retrying until successful, but it turns out that some repository implementations cannot even perform uncontended register writes correctly which may cause endless retries in the contended case. This commit adds another repository analyser which verifies that uncontended register writes work correctly on the first attempt.
9d3cdb9
to
17566ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
for (final var multipartUpload : uploads.values()) { | ||
if (multipartUpload.getPath().startsWith(prefix)) { | ||
multipartUpload.appendXml(uploadsList); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. This was essentially a bug in the fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or at least an unimplemented feature :) This fixture takes quite a lot of shortcuts and only really emulates the bits of S3's API that matter to us.
@@ -144,7 +144,7 @@ static TransportVersion def(int id) { | |||
public static final TransportVersion PIPELINES_IN_BULK_RESPONSE_ADDED = def(8_519_00_0); | |||
public static final TransportVersion PLUGIN_DESCRIPTOR_STRING_VERSION = def(8_520_00_0); | |||
public static final TransportVersion TOO_MANY_SCROLL_CONTEXTS_EXCEPTION_ADDED = def(8_521_00_0); | |||
|
|||
public static final TransportVersion UNCONTENDED_REGISTER_ANALYSIS_ADDED = def(8_522_00_0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to use transportVersion for now. But strictly speaking, this feels more belong to the in-development Feature interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO adding a new remote action is a change to the transport protocol, although I do see that we could reasonably avoid calling the new action based on whether the cluster supports the feature or not. I don't expect we will be able to migrate assertions like these over to features tho (but maybe that is the eventual plan?):
private final String registerName; | ||
private final List<DiscoveryNode> nodes; | ||
private final AtomicBoolean otherAnalysisComplete; | ||
private int currentValue; // actions run in strict sequence so no need for synchronization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending request and handling response can be performed by different transport worker threads. So I think they can potentially see different values even when the action is in strict order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this isn't the case, but it's a good observation. If that were the case then we potentially would need synchronization here indeed. Fortunately for remote requests the request and response go over the same TCP channel which means they use the same transport worker thread (docs) and therefore the request handling happens-before the response handling in program order on that thread so it's ok. Local requests bypass the transport worker threads of course, but they all happen within the same JVM so we have a proper happens-before relationship there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. You are right. In previous convesations, I heard that we could in theory send request via one channel but receive the response via another since transport does not have the ordered processing constraint of HTTP 1.1. It is a theoretical possbility, not what we have today. Sorry that I mis-remembered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking I think there may be no happens-before relationship between sending a request A and receiving a different request B which was caused by the remote node's handling of request A, because those things will use different TCP channels for sure and therefore may land on different transport threads. We do use nested requests in various places, e.g. recovery. I'm not sure if this is something that can happen in practice, but definitely something to watch out for.
// Registers are not supported on all repository types, and that's ok. | ||
listener.onResponse(null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding: I don't think we indicate in the response that this operation is unsupported? Are we not interested in it? I am aware that the existing "Contented" version does the same. So it is likely ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, specifically we do not support register operations for the (somewhat-unloved) HDFS repository implementation, and we have no plan to address this in future so we just skip all these checks for HDFS repositories.
} else if (key.startsWith(RepositoryAnalyzeAction.UNCONTENDED_REGISTER_NAME_PREFIX) || randomBoolean()) { | ||
listener.onResponse(OptionalBytesReference.of(registers.computeIfAbsent(key, ignored -> new BytesRegister()).get())); | ||
} else { | ||
final var bogus = randomFrom(BytesArray.EMPTY, new BytesArray(new byte[] { randomByte() })); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic here is a bit to follow. IIUC, we don't want to return anything wrong for uncontended analysis. But the code here seems to suggest that we could be returning bogus result for it. But not really because the call to compareAndExchangeRegister
always return a new BytesRegister()
which always return 0 regardless of the bogus value. I think it would be better if we could make this more explicit for the uncontended operations. Maybe have it as the top level switch, e.g.:
if (key.startsWith(RepositoryAnalyzeAction.UNCONTENDED_REGISTER_NAME_PREFIX)) {
...
} else {
// everything else, essentially the existing code
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ yes this is all a little unsatisfactory indeed. This area is going to need a little rework once #101184 is merged, I'll try and bring the checks on the key prefix to the top level in these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok now merged and cleaned up, see 2dd4250.
Hi @DaveCTurner, I've created a changelog YAML for you. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Verification of uncontended operations on linearizable registers was introduced in elastic#101185 so does not apply in versions before 8.12. Fixes up the backport of elastic#102050.
Verification of uncontended operations on linearizable registers was introduced in elastic#101185 so does not apply in versions before 8.12. Fixes up the backport of elastic#102050.
Verification of uncontended operations on linearizable registers was introduced in elastic#101185 so does not apply in versions before 8.12. Fixes up the backport of elastic#102050.
Verification of uncontended operations on linearizable registers was introduced in elastic#101185 so does not apply in versions before 8.12. Fixes up the backport of elastic#102050.
Today repository analysis verifies that a register behaves correctly
under contention, retrying until successful, but it turns out that some
repository implementations cannot even perform uncontended register
writes correctly which may cause endless retries in the contended case.
This commit adds another repository analyser which verifies that
uncontended register writes work correctly on the first attempt.