-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972
Conversation
…er to disk" Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. elastic#3593. This is also an indirect spinoff from elastic#26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes elastic#15768 Closes elastic#26912
/cc @uschindler |
Having 2 IndexReaders (in fact DirectoryReaders) open at same time will not have any impact on memory usage, as both share the same LeafReaders. The memory of stale segments is only freed if the one with oldest/nolonger uptodate segments is released. @s1monw was this your question so you pinged me? |
@uschindler thanks, I was aware of this that's why I added it. I pinged you since you where involved in the original issue here #15768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up so quickly. I did a pass and left mostly nits around naming (sorry :)). I did leave some comments about what seems like excessive refreshes which is why I marked this as request changes.
Also - with the double searcher managers and their relationship to the version map I got a bit worried about the IndexinMemoryController calling writeIndexingBuffer
. I couldn't find any test for that. Do we have one? I think we should have a concurrent read/write by ID engine unit test which validates it's safe to call refresh/flush/writeIndexingBuffer without us messing up. WDYT?
manager = createSearcherManager(); | ||
manager = createSearcherManager(searcherFactory); | ||
internalSearcherManager = createSearcherManager(new SearcherFactory()); | ||
this.internalSearcherManager = internalSearcherManager; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we name these by scope? i.e., getScopeSearcherManager?
@@ -215,9 +217,11 @@ public InternalEngine(EngineConfig engineConfig) throws EngineException { | |||
throw e; | |||
} | |||
} | |||
manager = createSearcherManager(); | |||
manager = createSearcherManager(searcherFactory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit - now that we call createSearchManager
twice, shall we move the following line out of that method and into this constructor? it feels unrelated.
lastCommittedSegmentInfos = readLastCommittedSegmentInfos(searcherManager, store);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nit - searcherFactory
-> searcherFactoryForSearch
?
refresh(source, true); | ||
} | ||
|
||
final void refresh(String source, boolean refreshExternal) throws EngineException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make the refreshExternal an EnumSet of SearcherScope?
} catch (AlreadyClosedException e) { | ||
failOnTragicEvent(e); | ||
throw e; | ||
} catch (Exception e) { | ||
try { | ||
failEngine("refresh failed", e); | ||
failEngine("refresh failed source[" + source + "]", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -1231,7 +1240,7 @@ public void writeIndexingBuffer() throws EngineException { | |||
// The version map is using > 25% of the indexing buffer, so we do a refresh so the version map also clears | |||
logger.debug("use refresh to write indexing buffer (heap size=[{}]), to also clear version map (heap size=[{}])", | |||
new ByteSizeValue(indexingBufferBytes), new ByteSizeValue(versionMapBytes)); | |||
refresh("write indexing buffer"); | |||
refresh("write indexing buffer", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can now always use this "refresh" path here rather than the IndexWriter.flush
- refresh is now much lighter? (not sure what other side effects are there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think that's good
@@ -1347,7 +1355,7 @@ public CommitId flush(boolean force, boolean waitIfOngoing) throws EngineExcepti | |||
commitIndexWriter(indexWriter, translog, null); | |||
logger.trace("finished commit for flush"); | |||
// we need to refresh in order to clear older version values | |||
refresh("version_table_flush"); | |||
refresh("version_table_flush"); // TODO technically we could also only refresh the internal searcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't the comment about releasing segments in tryRenewSyncCommit holds here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did some copy past mess here. I will fix
@@ -1500,6 +1508,8 @@ public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpu | |||
if (flush) { | |||
if (tryRenewSyncCommit() == false) { | |||
flush(false, true); | |||
} else { | |||
refresh("renew sync commit"); // we have to refresh both searchers here to ensure we release unreferenced segments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't tryRenewSyncCommit already doing it in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above sorry for the noise
@@ -1867,6 +1884,10 @@ protected void doRun() throws Exception { | |||
// free up transient disk usage of the (presumably biggish) segments that were just merged | |||
if (tryRenewSyncCommit() == false) { | |||
flush(); | |||
} else { | |||
// we only refresh the rather cheap internal searcher manager in order to not trigger new datastructures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment - if tryRenewSyncCommit
returned true, it already refreshed, no?
Searcher getSearcher = engine.acquireSearcher("test", Engine.SearcherScope.GET); | ||
Searcher searchSearcher = engine.acquireSearcher("test", Engine.SearcherScope.SEARCH); | ||
assertSameReader(getSearcher, searchSearcher); | ||
IOUtils.close(getSearcher, searchSearcher); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - should we use try-with-resources to avoid leaking on failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
@bleskes I am not sure if you love or hate this 🤷♂️ I took it one step further and solely use that internal searcher everywhere. the external searcher is used for non-RT get and search but internally it's all the done via the private reader. This also means we don't refresh anymore during I personally think this is much cleaner now. |
Yay!! |
@mikemccand you're alive! |
case SEARCH: | ||
return searcherManager; | ||
default: | ||
throw new IllegalStateException("unknonw scope: " + scope); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: unknown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great ❤️ . I left suggestions but there is no need for another review.
@@ -305,9 +305,9 @@ private void buildFieldStatistics(XContentBuilder builder, Terms curTerms) throw | |||
long sumDocFreq = curTerms.getSumDocFreq(); | |||
int docCount = curTerms.getDocCount(); | |||
long sumTotalTermFrequencies = curTerms.getSumTotalTermFreq(); | |||
if (docCount > 0) { | |||
assert ((sumDocFreq > 0)) : "docCount >= 0 but sumDocFreq ain't!"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) it's a good sign you bumped into this, isn't this changing semantics - before when there were no docs, we won't output the Field Stats sub object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand the diff below and you will see all the void 💃
return acquireSearcher(source, Engine.SearcherScope.EXTERNAL); | ||
} | ||
|
||
private Engine.Searcher acquireSearcher(String source, Engine.SearcherScope scope) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the move to internal external (rather then get/search), I think this method can go and that also means that the Engine.acquireSearch(source, scope) can be made protected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. I see it now :(
scope = SearcherScope.INTERNAL; | ||
} else { | ||
// we expose what has been externally expose in a point in time snapshot via an explicit refresh | ||
scope = SearcherScope.EXTERNAL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ sneaky and nice. I think we can extend InternalEngineTests.testSimpleOperations with:
// but, we can still get it (in realtime)
getResult = engine.get(newGet(true, doc), searcherFactory);
assertThat(getResult.exists(), equalTo(true));
assertThat(getResult.docIdAndVersion(), notNullValue());
getResult.release();
+ // but not real time is not yet visible
+ getResult = engine.get(newGet(false, doc), searcherFactory);
+ assertThat(getResult.exists(), equalTo(false));
+ getResult.release();
// even though we maintain 2 managers we really do the heavy-lifting only once. | ||
// the second refresh will only do the extra work we have to do for warming caches etc. | ||
externalSearcherManager.maybeRefreshBlocking(); | ||
// fall-through is intentional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the fall through?
@@ -98,21 +96,6 @@ | |||
/** Tracks bytes used by tombstones (deletes) */ | |||
final AtomicLong ramBytesUsedTombstones = new AtomicLong(); | |||
|
|||
/** Sync'd because we replace old mgr. */ | |||
synchronized void setManager(ReferenceManager<?> newMgr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@@ -1005,6 +1005,7 @@ public CompletionStats completionStats(String... fields) { | |||
} | |||
final long time = System.nanoTime(); | |||
final Engine.CommitId commitId = engine.flush(force, waitIfOngoing); | |||
engine.refresh("flush"); // TODO this is technically wrong we should remove this in 7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Now I see why the engine changes can be done in 6.x. I would opt for keeping as it was (i.e. full refresh on flush) and do as a small following in 7.0 to change the refresh command in the engine + mark it as breaking etc. Just a suggestion.
@uschindler @bleskes thanks for the reviews I will wait for CI and merge it |
Thanks @s1monw ! |
…er to disk (#26972) Today, when ES detects it's using too much heap vs the configured indexing buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move the bytes to disk, clear version map, etc. But this has the unexpected side effect of making newly indexed/deleted documents visible to future searches, which is not nice for users who are trying to prevent that, e.g. #3593. This is also an indirect spinoff from #26802 where we potentially pay a big price on rebuilding caches etc. when updates / realtime-get is used. We are refreshing the internal reader for realtime gets which causes for instance global ords to be rebuild. I think we can gain quite a bit if we'd use a reader that is only used for GETs and not for searches etc. that way we can also solve problems of searchers being refreshed unexpectedly aside of replica recovery / relocation. Closes #15768 Closes #26912
Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (elastic#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.
The changes introduced in #26972 missed two places where an internal searcher should be used.
The changes introduced in #26972 missed two places where an internal searcher should be used.
Today all these API calls have a sideeffect of making documents visible to search requests. While this is sometimes desired it's an unnecessary sideeffect and now that we have an internal (engine-private) index reader (#26972) we artificially add a refresh call for bwc. This change removes this sideeffect in 7.0.
* master: (356 commits) Do not set SO_LINGER on server channels (elastic#26997) Fix inconsistencies in the rest api specs for *_script (elastic#26971) fix inconsistencies in the rest api specs for cat.snapshots (elastic#26996) Add docs on full_id parameter in cat nodes API [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (elastic#26987) Fix a typo in the similarity docs (elastic#26970) Use separate searchers for "search visibility" vs "move indexing buffer to disk (elastic#26972) Create weights lazily in filter and filters aggregation (elastic#26983) Use a dedicated ThreadGroup in rest sniffer (elastic#26897) Fire global checkpoint sync under system context Update by Query is modified to accept short `script` parameter. (elastic#26841) Cat shards bytes (elastic#26952) Add support for parsing inline script (elastic#23824) (elastic#26846) Change default value to true for transpositions parameter of fuzzy query (elastic#26901) Adding unreleased 5.6.4 version number to Version.java ...
* master: Remove unnecessary exception for engine constructor Update docs about `script` parameter (#27010) Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000) Do not set SO_LINGER on server channels (#26997) Fix inconsistencies in the rest api specs for *_script (#26971) fix inconsistencies in the rest api specs for cat.snapshots (#26996) Add docs on full_id parameter in cat nodes API [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (#26987) Fix a typo in the similarity docs (#26970) Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972)
* 6.x: Remove unnecessary exception for engine constructor Update docs about `script` parameter (#27010) Do not set SO_LINGER on server channels (#26997) Fix inconsistencies in the rest api specs for *_script (#26971) fix inconsistencies in the rest api specs for cat.snapshots (#26996) Add docs on full_id parameter in cat nodes API Add removal of types to the 6.0 breaking changes Create weights lazily in filter and filters aggregation (#26983) [TEST] Add test that replicates versioned updates with random flushes Use internal searcher for all indexing related operations in the engine Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972) Reformat paragraph in template docs to 80 columns Clarify settings and template on create index Fix reference to TcpTransport in documentation Allow Uid#decodeId to decode from a byte array slice (#26987) Fix a typo in the similarity docs (#26970)
We do some accouting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to elastic#26972
We do some accounting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to #26972
We do some accounting in IndexShard that is not necessarily correct since we maintain two different index readers. This change moves the accounting under the engine which knows what reader we are refreshing. Relates to #26972
…ize segment creation We cut over to internal and external IndexReader/IndexSeacher in elastic#26972 which uses two independent searcher managers. This has the downside that refreshes of the external reader will never clear the internal version map which in-turn will trigger additional and potentially unnecessary segment flushes since memory must be freed. Under heavy indexing load with low refresh intervals this can cause excessive segment creation which causes high GC activity and significantly increases the required segment merges. This change adds a dedicated external reference manager that delegates refreshes to the internal reference manager that then `steals` the refreshed reader from the internal reference manager for external usage. This ensures that external and internal readers are consistent on an external refresh. As a sideeffect this also releases old segments referenced by the internal reference manager which can potentially hold on to already merged away segments until it is refreshed due to a flush or indexing activity.
…mize segment creation (#27253) We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses two independent searcher managers. This has the downside that refreshes of the external reader will never clear the internal version map which in-turn will trigger additional and potentially unnecessary segment flushes since memory must be freed. Under heavy indexing load with low refresh intervals this can cause excessive segment creation which causes high GC activity and significantly increases the required segment merges. This change adds a dedicated external reference manager that delegates refreshes to the internal reference manager that then `steals` the refreshed reader from the internal reference manager for external usage. This ensures that external and internal readers are consistent on an external refresh. As a sideeffect this also releases old segments referenced by the internal reference manager which can potentially hold on to already merged away segments until it is refreshed due to a flush or indexing activity.
…mize segment creation (#27253) We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses two independent searcher managers. This has the downside that refreshes of the external reader will never clear the internal version map which in-turn will trigger additional and potentially unnecessary segment flushes since memory must be freed. Under heavy indexing load with low refresh intervals this can cause excessive segment creation which causes high GC activity and significantly increases the required segment merges. This change adds a dedicated external reference manager that delegates refreshes to the internal reference manager that then `steals` the refreshed reader from the internal reference manager for external usage. This ensures that external and internal readers are consistent on an external refresh. As a sideeffect this also releases old segments referenced by the internal reference manager which can potentially hold on to already merged away segments until it is refreshed due to a flush or indexing activity.
Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.
But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. #3593.
This is also an indirect spinoff from #26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.
Closes #15768
Closes #26912