-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecating _primary preference makes getting consistent results impossible(?) #31929
Comments
Pinging @elastic/es-search-aggs |
Pinging @elastic/es-distributed |
@bra-fsn could you explain a bit about your use case and why you only want to retrieve results from the primary? If the use case is down to wanting every client to see the exact same view of the data and not have stale reads I think you'd have a hard time achieving this even with
I also wonder why you do not consider custom preference an option since if you wanted to replicate the behaviour of |
What @colings86 said is true: using
It is not the case that the primary always serves all acknowledged writes, because it might not have refreshed. It is also not the case that replicas lag behind the primary like this: writes are only acknowledged once written to all copies, not just to the primary. |
we discussed it on fix it friday. The main use case for preferences are data locality (for attribute based preferences) and more efficient use of shard caches (where the custom preference value sends requests to the same shard where caches are potentially hot). We should update the docs to better reflect that and also to clarify that this is why the |
Today it is unclear what guarantees are offered by the search preference feature, and we claim a guarantee that is stronger than what we really offer: > A custom value will be used to guarantee that the same shards will be used > for the same custom value. This commit clarifies this documentation and explains more clearly why `_primary`, `_replica`, etc. are deprecated in `6.x` and removed in `master`. Relates elastic#31929 elastic#26335 elastic#26791.
Today it is unclear what guarantees are offered by the search preference feature, and we claim a guarantee that is stronger than what we really offer: > A custom value will be used to guarantee that the same shards will be used > for the same custom value. This commit clarifies this documentation and explains more clearly why `_primary`, `_replica`, etc. are deprecated in `6.x` and removed in `master`. Relates #31929 #26335 #26791.
Today it is unclear what guarantees are offered by the search preference feature, and we claim a guarantee that is stronger than what we really offer: > A custom value will be used to guarantee that the same shards will be used > for the same custom value. This commit clarifies this documentation and explains more clearly why `_primary`, `_replica`, etc. are deprecated in `6.x` and removed in `master`. Relates #31929 #26335 #26791.
Today it is unclear what guarantees are offered by the search preference feature, and we claim a guarantee that is stronger than what we really offer: > A custom value will be used to guarantee that the same shards will be used > for the same custom value. This commit clarifies this documentation and explains more clearly why `_primary`, `_replica`, etc. are deprecated in `6.x` and removed in `master`. Relates #31929 #26335 #26791.
Today it is unclear what guarantees are offered by the search preference feature, and we claim a guarantee that is stronger than what we really offer: > A custom value will be used to guarantee that the same shards will be used > for the same custom value. This commit clarifies this documentation and explains more clearly why `_primary`, `_replica`, etc. are deprecated in `6.x` and removed in `master`. Relates #31929 #26335 #26791.
@colings86, @bleskes
So if I'm right using the doc APIs in the above way, a consistent view can be achieved. Or at least this is how I understand after reading the docs. Could you please tell me what I'm getting wrong? Thanks, |
@DaveCTurner for searches, right. But for the doc APIs, inflight data will either be served from the translog or initiate a refresh (which means no stale data). BTW, this could even be true for searches. Where I need consistency, I use refresh=wait_for on index operations, so consistency here means after an index/update operation with refresh=wait_for has returned, a new search should include that data (and now you only can assume this with setting _primary). "writes are only acknowledged once written to all copies, not just to the primary"
|
When an indexing request with
|
The docs were updated in #32098 to address the original question. I can see a case for addressing the followup questions with further doc improvements, so I'm leaving this open and marking it for further discussion. |
@bra-fsn thanks for explaining. I think you can do what you're doing but you have to change a bit how you reason about the API. Also, there are some edge cases when things fail that you should be aware of. @DaveCTurner explained some details of the API and the higher level constructs are explained here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-replication.html . Please me know if you find anything unclear in that document. It's very important for us that users understand what they can (and not) expect from Elasticsearch. |
Oh, I missed those, thanks for pointing that out! |
@bleskes What's still unclear to me is the answer to the original question.
Given all of these are true:
So I think it would be nicer to keep _primary and change this sentence (from #32098):
to something like this: _primary helps to avoid inconsistent results, but there should be a caveat: if a primary fails and it can't update the in-sync allocation IDs, unacknowledged writes may have broken linearizability. |
The situation you are describing is a dirty read which is indeed a counterexample to linearizability. The documentation there indicates that dirty reads can be exposed by an isolated primary, but in fact there is nothing special about primaries: they can be exposed by any shard copy. The
This isn't the case - there's also |
To add to what David said - in your description the update to doc(2) was never acknowledged (because it was not fully replicated) but all of the above can happen even for acked writes. Say a primary got isolated, but doesn't know it is the case yet. Another replica is promoted and that one happens to receive and process with doc(2). Now you issue a get and that happens to hit the node with the old primary. It will happily respond (in that moment) with doc(1). Bottom line - what ever you do, the model we use (for good reasons, see the doc I linked to) doesn't offer linearization under failure modes but is way more efficient under normal operations. Instead of fighting to reduce the error window (it's already pretty small) and make the system more complex, we prefer to clearly communicate this and make it clear via the API. The goal is that people will build systems that take it into account (potentially not caring) rather then assuming we give guarantees that only seem to be true. |
Thanks for your questions @bra-fsn, this kind of discussion is a good way to show us where our docs might be improved. If there's anything more that we can clarify then don't hold back. I'm closing this now as there's been no activity for a while and there's no further action to be taken at the moment. |
Elasticsearch up to 6.x has a preference setting (for eg. get and search operations) named _primary.
According to the docs this makes the query to run on the primary shard:
" The operation will go and be executed only on the primary shards. "
On master (7.x) this setting has disappeared (on 6.x it's deprecated) and for the doc API only _local remained.
This makes it impossible to use the doc (and search) API to get consistent results. A subsequent query could return stale data, which may be even true if a custom preference value is used, because it just hashes the client to a given shard, which may be yet to be updated (while asking the primary would return correct result). Two different clients could get two different results for the same query in the same time this way (or given that all clients use the same custom preference could all see the same stale data).
One could argue that using wait_for_active_shards is the solution here, but even if it's working correctly in this manner (ie: all replicas and the primary are updated atomically from the PoV of the client, which I guess is not true), it would require that all of the shards are always available for writes, which makes replication somewhat useless, or at least much less useful.
Given all of these, I would like to ask you to restore the _primary option for preference.
The text was updated successfully, but these errors were encountered: