Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation for match_only_text field #11039

Merged

Conversation

rishabhmaurya
Copy link
Contributor

@rishabhmaurya rishabhmaurya commented Oct 31, 2023

Description

  • Implementation of match_only_text field with index_options fixed to docs, norms disabled to optimize on storage.
  • Supports query types with added latency - Phrase, Prefix, MultiPhrase & MultiPhrasePrefix.
  • It supports all features same as text field type except following -
Features not supported -
  • Interval Queries
  • Span queries
  • Queries on index time phrase and index prefix fields.
  • Aggregation queries - just like text field isn't meant for aggregation queries.
  • Disabling _sourcefield will disable any positional query support (Note: instead of not allowing disabling _source field, I have decided to throw exceptions when positional queries would be executed for users who don't have use case of positional queries and want to save on space).
  • update to index_options to any value other than docs isn't allowed.
  • Scoring, boost may not work as expected, so this field type isn't meant to be used for relevancy purposes.

Best used for -

  • In log analytics, where the goal is to search for log entries based on specific keywords or error codes or exception type in error message, the focus may be on retrieving relevant logs rather than analyzing the frequency or position of terms within each log entry.
  • In a job portal, users often search for jobs based on job titles, skills, or locations. The frequency and position of terms within job descriptions may be less significant for these searches.
  • In a news application, users might search for articles based on keywords in headlines or summaries. The focus is on finding relevant news articles rather than analyzing the frequency or position of terms within each article.

Migration from text field

reindex api can be used to migrate the index from text to match_only_text field and viceversa.

Implementation breakdown -

  1. New MatchOnlyTextFieldMapper which extends TextFieldMapper with few constraints and different defaults
  2. New FieldType for MatchOnlyTextFieldMapper which adds constraints specific to match_only_text field and supports positional queries by reading value of the match_only_text field from the _source field.
  3. SourceFieldMatchQuery - query which accepts a delegate query, to apply various filters to prune the resultset, and a source filter query. It loads the _source for each hit from result of delegate query, creates a single doc Lucene MemoryIndex for each hit and runs source filter query against it. It uses ConstantScoreWeight to disable scoring.

Testing done

  1. Unit tests for Mapper - asserting the OpenSearch DSL query to the SourceFieldMatchQuery created by the MatchOnlyTextFieldMapper. It also validates various unsupported cases.
  2. Unit tests for SourceFieldMatchQuery - it create a Lucene index and tries to check possible scenarios for SourceFieldMatchQuery like -
    1. docs matching both delegate and filter query
    2. docs matching only delegate query
    3. docs matching only filter query
    4. docs matching neither delegate nor filter query
    5. Expected behavior when source field is disabled
    6. Expected behavior when there is a missing field.
  3. Created clone of various integration tests where text field was getting used by replacing text with match_only_text field. Overridden the expected behavior wherever necessary.

TODOs

  1. Support for interval queries?
  2. Add support for match_only_field type on dashboard
  3. Public documentation - [DOC] match_only_text handler  documentation-website#5427
  4. Performance testing -
  5. Cache logic validation

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

#6836
opensearch-project/documentation-website#5427

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from e437959 to 0f5485c Compare October 31, 2023 22:07
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

github-actions bot commented Oct 31, 2023

Compatibility status:

Checks if related components are compatible with change f1fb443

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/performance-analyzer.git]

Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

github-actions bot commented Nov 1, 2023

Gradle Check (Jenkins) Run Completed with:

Copy link
Contributor

❌ Gradle check result for ccab297: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for a9d1f6d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 05d96bd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6fdb8da: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from 6fdb8da to 2fdc207 Compare November 16, 2023 01:27
Copy link
Contributor

❌ Gradle check result for 2fdc207: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e213f84: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c20df4a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for c6ce7f9: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for e2c1886: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch 2 times, most recently from f9fc10f to b765a36 Compare November 17, 2023 20:47
Copy link
Contributor

❌ Gradle check result for f9fc10f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b765a36: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 40e205e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya rishabhmaurya force-pushed the rishma-match-only-field branch from 2bdec4b to e879baf Compare January 2, 2024 20:00
Copy link
Contributor

github-actions bot commented Jan 2, 2024

✅ Gradle check result for e879baf: SUCCESS

Copy link

codecov bot commented Jan 2, 2024

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Comparison is base (63f4f13) 71.47% compared to head (f1fb443) 71.40%.
Report is 1 commits behind head on main.

Files Patch % Lines
.../opensearch/index/query/SourceFieldMatchQuery.java 67.34% 8 Missing and 8 partials ⚠️
...nsearch/index/mapper/MatchOnlyTextFieldMapper.java 93.54% 4 Missing and 4 partials ⚠️
...a/org/opensearch/index/search/MultiMatchQuery.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11039      +/-   ##
============================================
- Coverage     71.47%   71.40%   -0.07%     
+ Complexity    59253    59240      -13     
============================================
  Files          4907     4909       +2     
  Lines        278248   278426     +178     
  Branches      40428    40460      +32     
============================================
- Hits         198871   198817      -54     
- Misses        62850    63093     +243     
+ Partials      16527    16516      -11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@andrross andrross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one changelog entry nitpick

CHANGELOG.md Outdated Show resolved Hide resolved
Signed-off-by: Rishabh Maurya <[email protected]>
Copy link
Contributor

github-actions bot commented Jan 2, 2024

❕ Gradle check result for f1fb443: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@andrross andrross added the backport 2.x Backport to 2.x branch label Jan 3, 2024
@andrross andrross merged commit 7b1c2c7 into opensearch-project:main Jan 3, 2024
33 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 3, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix build failures

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix bugs

Signed-off-by: Rishabh Maurya <[email protected]>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <[email protected]>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <[email protected]>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <[email protected]>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <[email protected]>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <[email protected]>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <[email protected]>

* remove unused import

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* fix integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <[email protected]>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <[email protected]>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <[email protected]>

* rename @before method

Signed-off-by: Rishabh Maurya <[email protected]>

* update changelog description

Signed-off-by: Rishabh Maurya <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
(cherry picked from commit 7b1c2c7)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Jan 3, 2024
* Implementation for match_only_text field



* Fix build failures



* Fix bugs



* Added mapper tests, stil failing on prefix and phrase tests



* Disable index prefix and phrase mapper



* Added unit tests for phrase and multiphrase query validation



* Add unit tests for prefix and prefix phrase queries



* Add a test to cover 3 word with synonym match phrase prefix query



* Add unit test for SourceFieldMatchQuery



* Added test for _source disabled case



* Add unit test for missing field



* more validation tests and changelog update



* Added integration tests for match_only_text replicating text field integ tests



* Added skip section in integ test to fix mixed cluster failures



* remove unused import



* Address PR comments



* fix integ tests



* Fix flaky test due to random indexwriter



* pr comment: header modification



* Address PR comments



* addded change to the right section of CHANGELOG



* overriding the textFieldType before every test



* rename @before method



* update changelog description



---------


(cherry picked from commit 7b1c2c7)

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix build failures

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix bugs

Signed-off-by: Rishabh Maurya <[email protected]>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <[email protected]>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <[email protected]>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <[email protected]>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <[email protected]>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <[email protected]>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <[email protected]>

* remove unused import

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* fix integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <[email protected]>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <[email protected]>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <[email protected]>

* rename @before method

Signed-off-by: Rishabh Maurya <[email protected]>

* update changelog description

Signed-off-by: Rishabh Maurya <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Implementation for match_only_text field

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix build failures

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix bugs

Signed-off-by: Rishabh Maurya <[email protected]>

* Added mapper tests, stil failing on prefix and phrase tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Disable index prefix and phrase mapper

Signed-off-by: Rishabh Maurya <[email protected]>

* Added unit tests for phrase and multiphrase query validation

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit tests for prefix and prefix phrase queries

Signed-off-by: Rishabh Maurya <[email protected]>

* Add a test to cover 3 word with synonym match phrase prefix query

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for SourceFieldMatchQuery

Signed-off-by: Rishabh Maurya <[email protected]>

* Added test for _source disabled case

Signed-off-by: Rishabh Maurya <[email protected]>

* Add unit test for missing field

Signed-off-by: Rishabh Maurya <[email protected]>

* more validation tests and changelog update

Signed-off-by: Rishabh Maurya <[email protected]>

* Added integration tests for match_only_text replicating text field integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Added skip section in integ test to fix mixed cluster failures

Signed-off-by: Rishabh Maurya <[email protected]>

* remove unused import

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* fix integ tests

Signed-off-by: Rishabh Maurya <[email protected]>

* Fix flaky test due to random indexwriter

Signed-off-by: Rishabh Maurya <[email protected]>

* pr comment: header modification

Signed-off-by: Rishabh Maurya <[email protected]>

* Address PR comments

Signed-off-by: Rishabh Maurya <[email protected]>

* addded change to the right section of CHANGELOG

Signed-off-by: Rishabh Maurya <[email protected]>

* overriding the textFieldType before every test

Signed-off-by: Rishabh Maurya <[email protected]>

* rename @before method

Signed-off-by: Rishabh Maurya <[email protected]>

* update changelog description

Signed-off-by: Rishabh Maurya <[email protected]>

---------

Signed-off-by: Rishabh Maurya <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants