Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Segment Replication] Support realtime TermVector requests with Segment Replication #9585

Merged

Conversation

Rishikesh1159
Copy link
Member

@Rishikesh1159 Rishikesh1159 commented Aug 28, 2023

Description

This PR supports realtime reads for TermVectors requests when segment replication is enabled.
If there are no preferences mentioned for the incoming TermVectors requests then we route it to primary shards

Related Issues

Resolves #9563

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov
Copy link

codecov bot commented Aug 28, 2023

Codecov Report

Merging #9585 (4e43c57) into main (4ccecd6) will decrease coverage by 0.07%.
Report is 5 commits behind head on main.
The diff coverage is 26.13%.

@@             Coverage Diff              @@
##               main    #9585      +/-   ##
============================================
- Coverage     71.16%   71.10%   -0.07%     
- Complexity    58115    58125      +10     
============================================
  Files          4831     4831              
  Lines        273999   274062      +63     
  Branches      39920    39925       +5     
============================================
- Hits         195005   194880     -125     
- Misses        62604    62890     +286     
+ Partials      16390    16292      -98     
Files Changed Coverage Δ
...arch/index/recovery/RemoteStoreRestoreService.java 12.03% <0.00%> (ø)
server/src/main/java/org/opensearch/node/Node.java 85.54% <0.00%> (ø)
...arch/gateway/remote/RemoteClusterStateService.java 65.37% <23.37%> (-13.05%) ⬇️
...action/termvectors/TransportTermVectorsAction.java 51.35% <33.33%> (-3.20%) ⬇️
...search/gateway/remote/ClusterMetadataManifest.java 96.95% <100.00%> (+0.03%) ⬆️

... and 468 files with indirect coverage changes

📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 6f429b6

Incompatible components

Skipped components

Compatible components

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Compatibility status:

Checks if related components are compatible with change 2648fd4

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Compatibility status:

Checks if related components are compatible with change 4e43c57

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2023

Gradle Check (Jenkins) Run Completed with:

@Rishikesh1159 Rishikesh1159 merged commit caf4c80 into opensearch-project:main Sep 8, 2023
@Rishikesh1159 Rishikesh1159 added backport 2.x Backport to 2.x branch backport 2.10 Backport to 2.10 branch labels Sep 8, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 8, 2023
…nt Replication (#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
(cherry picked from commit caf4c80)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 8, 2023
…nt Replication (#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
(cherry picked from commit caf4c80)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Rishikesh1159 pushed a commit that referenced this pull request Sep 8, 2023
…nt Replication (#9585) (#9936)

* support realtime TermVector and MultiTermVector requests with segment replication.



* Fix TermVector requests with segrep.



* Refacotring.



* Address comments on PR.



---------



(cherry picked from commit caf4c80)

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Rishikesh1159 pushed a commit that referenced this pull request Sep 8, 2023
…nt Replication (#9585) (#9935)

* support realtime TermVector and MultiTermVector requests with segment replication.



* Fix TermVector requests with segrep.



* Refacotring.



* Address comments on PR.



---------



(cherry picked from commit caf4c80)

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
if (request.request().doc() != null && request.request().routing() == null) {
// artificial document without routing specified, ignore its "id" and use either random shard or according to preference
GroupShardsIterator<ShardIterator> groupShardsIter = clusterService.operationRouting()
.searchShards(state, new String[] { request.concreteIndex() }, null, request.request().preference());
.searchShards(state, new String[] { request.concreteIndex() }, null, preference);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rishikesh1159 I don't think the primary override here is necessary. In this case, an "artificial" document has been supplied and the user is asking to compute term vectors as if this was a real document. I don't think the "real time" parameter is applicable here since it doesn't need to look up any index data.

@msfroh Does that sound right to you?

Copy link
Collaborator

@msfroh msfroh Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound right to me.

I think the override condition should be:

        if (request.request().realtime()
            && preference == null
            && state.getMetadata().isSegmentReplicationEnabled(request.concreteIndex())
            && request.request().doc() == null) {

Or you could revert the edit to this line and continue to use request.request().preference() for the artificial doc case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the code seems to unconditionally attempt to fetch the document even if an artificial document is supplied. Not sure if that complicates things here.

Copy link
Collaborator

@msfroh msfroh Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's probably worth fixing -- from the looks of it, that GetResult is ignored, which seems a little wasteful.

That said, I don't think it hurts anything. In particular, if the doc parameter is specified, then it GETs a document with ID based on an autoincrementing (not really random) integer:

public TermVectorsRequest doc(BytesReference doc, boolean generateRandomId, MediaType mediaType) {
// assign a random id to this artificial document, for routing
if (generateRandomId) {
this.id(String.valueOf(randomInt.getAndAdd(1)));
}
this.doc = doc;
this.mediaType = mediaType;
return this;
}

(generateRandomId is always true on the REST parsing code path.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, there is no value in routing to a primary shard if an artificial doc is specified.

kaushalmahi12 pushed a commit to kaushalmahi12/OpenSearch that referenced this pull request Sep 12, 2023
…nt Replication (opensearch-project#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
Signed-off-by: Kaushal Kumar <[email protected]>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/OpenSearch that referenced this pull request Sep 20, 2023
…nt Replication (opensearch-project#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
brusic pushed a commit to brusic/OpenSearch that referenced this pull request Sep 25, 2023
…nt Replication (opensearch-project#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
Signed-off-by: Ivan Brusic <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…nt Replication (opensearch-project#9585)

* support realtime TermVector and MultiTermVector requests with segment replication.

Signed-off-by: Rishikesh1159 <[email protected]>

* Fix TermVector requests with segrep.

Signed-off-by: Rishikesh1159 <[email protected]>

* Refacotring.

Signed-off-by: Rishikesh1159 <[email protected]>

* Address comments on PR.

Signed-off-by: Rishikesh1159 <[email protected]>

---------

Signed-off-by: Rishikesh1159 <[email protected]>
Signed-off-by: Rishikesh Pasham <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.10 Backport to 2.10 branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG][Segment Replication] Support realtime reads with TermVector Requests for segrep enabled indices
5 participants