Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate searchers for "search visibility" vs "move indexing buffer to disk #26972

Merged
merged 9 commits into from
Oct 12, 2017

Conversation

s1monw
Copy link
Contributor

@s1monw s1monw commented Oct 11, 2017

Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.

But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. #3593.

This is also an indirect spinoff from #26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.

Closes #15768
Closes #26912

…er to disk"

Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.

But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. elastic#3593.

This is also an indirect spinoff from elastic#26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.

Closes elastic#15768
Closes elastic#26912
@s1monw s1monw added :Core/Infra/Core Core issues without another label >enhancement v6.1.0 v7.0.0 labels Oct 11, 2017
@s1monw s1monw requested review from jpountz and bleskes October 11, 2017 14:22
@s1monw
Copy link
Contributor Author

s1monw commented Oct 11, 2017

/cc @uschindler

@uschindler
Copy link
Contributor

Having 2 IndexReaders (in fact DirectoryReaders) open at same time will not have any impact on memory usage, as both share the same LeafReaders. The memory of stale segments is only freed if the one with oldest/nolonger uptodate segments is released.

@s1monw was this your question so you pinged me?

@s1monw
Copy link
Contributor Author

s1monw commented Oct 11, 2017

@uschindler thanks, I was aware of this that's why I added it. I pinged you since you where involved in the original issue here #15768

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up so quickly. I did a pass and left mostly nits around naming (sorry :)). I did leave some comments about what seems like excessive refreshes which is why I marked this as request changes.

Also - with the double searcher managers and their relationship to the version map I got a bit worried about the IndexinMemoryController calling writeIndexingBuffer. I couldn't find any test for that. Do we have one? I think we should have a concurrent read/write by ID engine unit test which validates it's safe to call refresh/flush/writeIndexingBuffer without us messing up. WDYT?

manager = createSearcherManager();
manager = createSearcherManager(searcherFactory);
internalSearcherManager = createSearcherManager(new SearcherFactory());
this.internalSearcherManager = internalSearcherManager;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we name these by scope? i.e., getScopeSearcherManager?

@@ -215,9 +217,11 @@ public InternalEngine(EngineConfig engineConfig) throws EngineException {
throw e;
}
}
manager = createSearcherManager();
manager = createSearcherManager(searcherFactory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - now that we call createSearchManager twice, shall we move the following line out of that method and into this constructor? it feels unrelated.

lastCommittedSegmentInfos = readLastCommittedSegmentInfos(searcherManager, store);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nit - searcherFactory -> searcherFactoryForSearch?

refresh(source, true);
}

final void refresh(String source, boolean refreshExternal) throws EngineException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make the refreshExternal an EnumSet of SearcherScope?

} catch (AlreadyClosedException e) {
failOnTragicEvent(e);
throw e;
} catch (Exception e) {
try {
failEngine("refresh failed", e);
failEngine("refresh failed source[" + source + "]", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -1231,7 +1240,7 @@ public void writeIndexingBuffer() throws EngineException {
// The version map is using > 25% of the indexing buffer, so we do a refresh so the version map also clears
logger.debug("use refresh to write indexing buffer (heap size=[{}]), to also clear version map (heap size=[{}])",
new ByteSizeValue(indexingBufferBytes), new ByteSizeValue(versionMapBytes));
refresh("write indexing buffer");
refresh("write indexing buffer", false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can now always use this "refresh" path here rather than the IndexWriter.flush - refresh is now much lighter? (not sure what other side effects are there).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think that's good

@@ -1347,7 +1355,7 @@ public CommitId flush(boolean force, boolean waitIfOngoing) throws EngineExcepti
commitIndexWriter(indexWriter, translog, null);
logger.trace("finished commit for flush");
// we need to refresh in order to clear older version values
refresh("version_table_flush");
refresh("version_table_flush"); // TODO technically we could also only refresh the internal searcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the comment about releasing segments in tryRenewSyncCommit holds here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I did some copy past mess here. I will fix

@@ -1500,6 +1508,8 @@ public void forceMerge(final boolean flush, int maxNumSegments, boolean onlyExpu
if (flush) {
if (tryRenewSyncCommit() == false) {
flush(false, true);
} else {
refresh("renew sync commit"); // we have to refresh both searchers here to ensure we release unreferenced segments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't tryRenewSyncCommit already doing it in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above sorry for the noise

@@ -1867,6 +1884,10 @@ protected void doRun() throws Exception {
// free up transient disk usage of the (presumably biggish) segments that were just merged
if (tryRenewSyncCommit() == false) {
flush();
} else {
// we only refresh the rather cheap internal searcher manager in order to not trigger new datastructures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment - if tryRenewSyncCommit returned true, it already refreshed, no?

Searcher getSearcher = engine.acquireSearcher("test", Engine.SearcherScope.GET);
Searcher searchSearcher = engine.acquireSearcher("test", Engine.SearcherScope.SEARCH);
assertSameReader(getSearcher, searchSearcher);
IOUtils.close(getSearcher, searchSearcher);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - should we use try-with-resources to avoid leaking on failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@s1monw
Copy link
Contributor Author

s1monw commented Oct 11, 2017

@bleskes I am not sure if you love or hate this 🤷‍♂️ I took it one step further and solely use that internal searcher everywhere. the external searcher is used for non-RT get and search but internally it's all the done via the private reader. This also means we don't refresh anymore during force_merge and flush but to keep it BWC I moved these calls into IndexShard. Conceptually I think we should not refresh in these situations and we might change that in 7.0 another option is to make refresh optional on these API and change a default in 7 so folks can choose.

I personally think this is much cleaner now.

@mikemccand
Copy link
Contributor

Yay!!

@bleskes
Copy link
Contributor

bleskes commented Oct 11, 2017

@mikemccand you're alive!

case SEARCH:
return searcherManager;
default:
throw new IllegalStateException("unknonw scope: " + scope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: unknown

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great ❤️ . I left suggestions but there is no need for another review.

@@ -305,9 +305,9 @@ private void buildFieldStatistics(XContentBuilder builder, Terms curTerms) throw
long sumDocFreq = curTerms.getSumDocFreq();
int docCount = curTerms.getDocCount();
long sumTotalTermFrequencies = curTerms.getSumTotalTermFreq();
if (docCount > 0) {
assert ((sumDocFreq > 0)) : "docCount >= 0 but sumDocFreq ain't!";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) it's a good sign you bumped into this, isn't this changing semantics - before when there were no docs, we won't output the Field Stats sub object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand the diff below and you will see all the void 💃

return acquireSearcher(source, Engine.SearcherScope.EXTERNAL);
}

private Engine.Searcher acquireSearcher(String source, Engine.SearcherScope scope) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the move to internal external (rather then get/search), I think this method can go and that also means that the Engine.acquireSearch(source, scope) can be made protected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind. I see it now :(

scope = SearcherScope.INTERNAL;
} else {
// we expose what has been externally expose in a point in time snapshot via an explicit refresh
scope = SearcherScope.EXTERNAL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ sneaky and nice. I think we can extend InternalEngineTests.testSimpleOperations with:

        // but, we can still get it (in realtime)
        getResult = engine.get(newGet(true, doc), searcherFactory);
        assertThat(getResult.exists(), equalTo(true));
        assertThat(getResult.docIdAndVersion(), notNullValue());
        getResult.release();

+        // but not real time is not yet visible
+        getResult = engine.get(newGet(false, doc), searcherFactory);
+        assertThat(getResult.exists(), equalTo(false));
+        getResult.release();

// even though we maintain 2 managers we really do the heavy-lifting only once.
// the second refresh will only do the extra work we have to do for warming caches etc.
externalSearcherManager.maybeRefreshBlocking();
// fall-through is intentional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the fall through?

@@ -98,21 +96,6 @@
/** Tracks bytes used by tombstones (deletes) */
final AtomicLong ramBytesUsedTombstones = new AtomicLong();

/** Sync'd because we replace old mgr. */
synchronized void setManager(ReferenceManager<?> newMgr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@@ -1005,6 +1005,7 @@ public CompletionStats completionStats(String... fields) {
}
final long time = System.nanoTime();
final Engine.CommitId commitId = engine.flush(force, waitIfOngoing);
engine.refresh("flush"); // TODO this is technically wrong we should remove this in 7.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Now I see why the engine changes can be done in 6.x. I would opt for keeping as it was (i.e. full refresh on flush) and do as a small following in 7.0 to change the refresh command in the engine + mark it as breaking etc. Just a suggestion.

@s1monw
Copy link
Contributor Author

s1monw commented Oct 12, 2017

@uschindler @bleskes thanks for the reviews I will wait for CI and merge it

@s1monw s1monw merged commit 21eb9bd into elastic:master Oct 12, 2017
@s1monw s1monw deleted the use_interal_reader_manager branch October 12, 2017 15:19
@uschindler
Copy link
Contributor

Thanks @s1monw !

s1monw added a commit that referenced this pull request Oct 13, 2017
…er to disk (#26972)

Today, when ES detects it's using too much heap vs the configured indexing
buffer (default 10% of JVM heap) it opens a new searcher to force Lucene to move
the bytes to disk, clear version map, etc.

But this has the unexpected side effect of making newly indexed/deleted
documents visible to future searches, which is not nice for users who are trying
to prevent that, e.g. #3593.

This is also an indirect spinoff from #26802 where we potentially pay a big
price on rebuilding caches etc. when updates / realtime-get is used. We are
refreshing the internal reader for realtime gets which causes for instance
global ords to be rebuild. I think we can gain quite a bit if we'd use a reader
that is only used for GETs and not for searches etc. that way we can also solve
problems of searchers being refreshed unexpectedly aside of replica recovery /
relocation.

Closes #15768
Closes #26912
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Oct 13, 2017
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (elastic#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
s1monw added a commit that referenced this pull request Oct 13, 2017
The changes introduced in #26972 missed two places where an internal searcher should be used.
s1monw added a commit that referenced this pull request Oct 13, 2017
The changes introduced in #26972 missed two places where an internal searcher should be used.
s1monw added a commit that referenced this pull request Oct 16, 2017
Today all these API calls have a sideeffect of making documents visible
to search requests. While this is sometimes desired it's an unnecessary sideeffect
and now that we have an internal (engine-private) index reader (#26972) we artificially
add a refresh call for bwc. This change removes this sideeffect in 7.0.
jasontedor added a commit to olcbean/elasticsearch that referenced this pull request Oct 16, 2017
* master: (356 commits)
  Do not set SO_LINGER on server channels (elastic#26997)
  Fix inconsistencies in the rest api specs for *_script (elastic#26971)
  fix inconsistencies in the rest api specs for cat.snapshots (elastic#26996)
  Add docs on full_id parameter in cat nodes API
  [TEST] Add test that replicates versioned updates with random flushes
  Use internal searcher for all indexing related operations in the engine
  Reformat paragraph in template docs to 80 columns
  Clarify settings and template on create index
  Fix reference to TcpTransport in documentation
  Allow Uid#decodeId to decode from a byte array slice (elastic#26987)
  Fix a typo in the similarity docs (elastic#26970)
  Use separate searchers for "search visibility" vs "move indexing buffer to disk (elastic#26972)
  Create weights lazily in filter and filters aggregation (elastic#26983)
  Use a dedicated ThreadGroup in rest sniffer (elastic#26897)
  Fire global checkpoint sync under system context
  Update by Query is modified to accept short `script` parameter. (elastic#26841)
  Cat shards bytes (elastic#26952)
  Add support for parsing inline script (elastic#23824) (elastic#26846)
  Change default value to true for transpositions parameter of fuzzy query (elastic#26901)
  Adding unreleased 5.6.4 version number to Version.java
  ...
jasontedor added a commit that referenced this pull request Oct 16, 2017
* master:
  Remove unnecessary exception for engine constructor
  Update docs about `script` parameter (#27010)
  Don't refresh on `_flush` `_force_merge` and `_upgrade` (#27000)
  Do not set SO_LINGER on server channels (#26997)
  Fix inconsistencies in the rest api specs for *_script (#26971)
  fix inconsistencies in the rest api specs for cat.snapshots (#26996)
  Add docs on full_id parameter in cat nodes API
  [TEST] Add test that replicates versioned updates with random flushes
  Use internal searcher for all indexing related operations in the engine
  Reformat paragraph in template docs to 80 columns
  Clarify settings and template on create index
  Fix reference to TcpTransport in documentation
  Allow Uid#decodeId to decode from a byte array slice (#26987)
  Fix a typo in the similarity docs (#26970)
  Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972)
jasontedor added a commit that referenced this pull request Oct 16, 2017
* 6.x:
  Remove unnecessary exception for engine constructor
  Update docs about `script` parameter (#27010)
  Do not set SO_LINGER on server channels (#26997)
  Fix inconsistencies in the rest api specs for *_script (#26971)
  fix inconsistencies in the rest api specs for cat.snapshots (#26996)
  Add docs on full_id parameter in cat nodes API
  Add removal of types to the 6.0 breaking changes
  Create weights lazily in filter and filters aggregation (#26983)
  [TEST] Add test that replicates versioned updates with random flushes
  Use internal searcher for all indexing related operations in the engine
  Use separate searchers for "search visibility" vs "move indexing buffer to disk (#26972)
  Reformat paragraph in template docs to 80 columns
  Clarify settings and template on create index
  Fix reference to TcpTransport in documentation
  Allow Uid#decodeId to decode from a byte array slice (#26987)
  Fix a typo in the similarity docs (#26970)
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Nov 1, 2017
We do some accouting in IndexShard that is not necessarily correct since
we maintain two different index readers. This change moves the accounting under
the engine which knows what reader we are refreshing.

Relates to elastic#26972
s1monw added a commit that referenced this pull request Nov 2, 2017
We do some accounting in IndexShard that is not necessarily correct since
we maintain two different index readers. This change moves the accounting under
the engine which knows what reader we are refreshing.

Relates to #26972
s1monw added a commit that referenced this pull request Nov 2, 2017
We do some accounting in IndexShard that is not necessarily correct since
we maintain two different index readers. This change moves the accounting under
the engine which knows what reader we are refreshing.

Relates to #26972
s1monw added a commit to s1monw/elasticsearch that referenced this pull request Nov 3, 2017
…ize segment creation

We cut over to internal and external IndexReader/IndexSeacher in elastic#26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
s1monw added a commit that referenced this pull request Nov 9, 2017
…mize segment creation (#27253)

We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
s1monw added a commit that referenced this pull request Nov 11, 2017
…mize segment creation (#27253)

We cut over to internal and external IndexReader/IndexSearcher in #26972 which uses
two independent searcher managers. This has the downside that refreshes of the external
reader will never clear the internal version map which in-turn will trigger additional
and potentially unnecessary segment flushes since memory must be freed. Under heavy
indexing load with low refresh intervals this can cause excessive segment creation which
causes high GC activity and significantly increases the required segment merges.

This change adds a dedicated external reference manager that delegates refreshes to the
internal reference manager that then `steals` the refreshed reader from the internal
reference manager for external usage. This ensures that external and internal readers
are consistent on an external refresh. As a sideeffect this also releases old segments
referenced by the internal reference manager which can potentially hold on to already merged
away segments until it is refreshed due to a flush or indexing activity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants