-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize source filtering in SourceFieldMapper #81970
optimize source filtering in SourceFieldMapper #81970
Conversation
Rally Result of MapFilter
Rally Result of ParserFilter
|
Pinging @elastic/es-search (Team:Search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great @mushao999, thanks for opening. I'd like @nik9000 to have a look as well before we merge it but LGTM in general. I left a couple of suggestions.
server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
Outdated
Show resolved
Hide resolved
Ah! @romseygeek this is the one I'd hope you'd review. And you already did. Cool. I'll take a look too. |
I'm a little surprised there isn't less GC. I mean, there is a little less GC, but I'd hoped for better. I'll bet the GC from the rest of the indexing process is just swamping the gains. |
server/src/main/java/org/elasticsearch/index/mapper/SourceFieldMapper.java
Outdated
Show resolved
Hide resolved
I wonder if it's worth moving your change into |
@elasticmachine test this please |
Applying the new filtering mechanism throughout all the filtering cases is a good idea. But I do not think |
This is a good point! Can we instead create a new interface |
Thanks, this is a great suggestion. I'v added two commits.
By the way, I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of comments but this looks good overall. Thanks for being patient with all of our suggestions!
server/src/main/java/org/elasticsearch/common/xcontent/XContentHelper.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/common/xcontent/XContentHelper.java
Outdated
Show resolved
Hide resolved
@elasticmachine test this please |
The build is failing on checkstyle - can you also run |
@romseygeek Thanks for you suggestion. The |
server/src/main/java/org/elasticsearch/common/xcontent/XContentHelper.java
Show resolved
Hide resolved
@elasticmachine test this please |
@romseygeek There are still two test failed , I will try to fix them tomorrow. Thanks again. |
@romseygeek Sorry for the test failure. Tests failed due to null value of xContentType in |
@elasticmachine test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for your patience on this
This reverts commit 4560a0c. It unexpectedly caused elastic#82891.
This commit introduces XContentFieldFilter, which applies field includes/excludes to XContent without having to realise the xcontent itself as a java map. SourceFieldMapper and ShardGetService are cut over to use this class
@nik9000 and @weizijun did a great job in introducing a memory efficient xcontent filtering approach and benchmarked its peformance (#77154 #81575). From their contribution, source filtering will be faster significantly.
This PR apply this approach into SourceFilterMapper where map filtering is used to include or exclude fields.
We did a benchmark using esrally, and the details list as follow:
--track=nyc_taxis --challenge=append-no-conflicts-index-only --client-options="timeout:60,basic_auth_user:'elastic',basic_auth_password:'Admin@gal'" --track-params='{"bulk_size": 1000, "index_settings": {"index.number_of_replicas": "1", "index.number_of_shards": "6"}, "bulk_indexing_clients": 10}'
"exludes": ["surcharge", "pickup_location"]
So we believe indexing operation with source filtering will be more memory efficient after this PR being applied.