Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend FuzzyTermQuery to support json field #2173

Merged
merged 5 commits into from
Sep 11, 2023
Merged

Conversation

PingXia-at
Copy link
Contributor

@PingXia-at PingXia-at commented Sep 7, 2023

Summary

Discord discussion here: https://discord.com/channels/908281611840282624/908286403086024724/1148718777651970126

Test Plan

  • Added test for fuzzyTermQuery

@fulmicoton fulmicoton requested a review from PSeitz September 7, 2023 14:45
@fulmicoton
Copy link
Collaborator

@PSeitz can you review?

))

if let Some(json_path_bytes) = term_value.as_json_path_bytes() {
return Ok(AutomatonWeight::new_for_json_path(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop return and semicolon

@codecov-commenter
Copy link

codecov-commenter commented Sep 8, 2023

Codecov Report

Patch coverage: 91.45% and no project coverage change.

Comparison is base (1932513) 94.42% compared to head (4d0ccf4) 94.42%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##             main    #2173    +/-   ##
========================================
  Coverage   94.42%   94.42%            
========================================
  Files         322      322            
  Lines       63104    63207   +103     
========================================
+ Hits        59583    59682    +99     
- Misses       3521     3525     +4     
Files Changed Coverage Δ
src/query/fuzzy_query.rs 92.56% <88.50%> (-1.89%) ⬇️
src/query/automaton_weight.rs 93.10% <100.00%> (+1.26%) ⬆️
src/query/phrase_prefix_query/mod.rs 100.00% <100.00%> (ø)
src/schema/term.rs 91.69% <100.00%> (+0.47%) ⬆️

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

/// Returns the json path bytes (including the JSON_END_OF_PATH byte)
///
/// Returns `None` if the value is not JSON.
pub(crate) fn as_json_path_bytes(&self) -> Option<&[u8]> {
Copy link
Contributor

@PSeitz PSeitz Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an extra method, as_json already covers the path. as_json seems unused currently, you can edit it to include JSON_END_OF_PATH if needed

Copy link
Contributor Author

@PingXia-at PingXia-at Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json_path_bytes are used here

fn automaton_stream<'a>(
        &'a self,
        term_dict: &'a TermDictionary,
    ) -> io::Result<TermStreamer<'a, &'a A>> {
        let automaton: &A = &self.automaton;
        let mut term_stream_builder = term_dict.search(automaton);

        if let Some(json_path_bytes) = &self.json_path_bytes {
            term_stream_builder = term_stream_builder.ge(json_path_bytes);
            if let Some(end) = prefix_end(json_path_bytes) {
                term_stream_builder = term_stream_builder.lt(&end);
            }
        }

        term_stream_builder.into_stream()
    }

Two reasons for this new method

  1. We need to include the JSON_END_OF_PATH byte. Otherwise, the automaton_stream will include path aa while we're only interested in path a
  2. The StreamBuilder.ge / lt methods require byte references, not a string for the path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as_json seems unused currently, you can edit it to include JSON_END_OF_PATH if needed

FYI it is used in debug_value_bytes.

@PingXia-at PingXia-at requested a review from PSeitz September 8, 2023 13:30
@@ -6,7 +6,7 @@ pub use phrase_prefix_query::PhrasePrefixQuery;
pub use phrase_prefix_scorer::PhrasePrefixScorer;
pub use phrase_prefix_weight::PhrasePrefixWeight;

fn prefix_end(prefix_start: &[u8]) -> Option<Vec<u8>> {
pub(crate) fn prefix_end(prefix_start: &[u8]) -> Option<Vec<u8>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the u8::MAX logic here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred to the phrasePrefixQuery implementation, which also filters the term based on the term value prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it attempts to find the next larger prefix. Typically, this involves incrementing the last u8 value by 1. However, there is an edge case to consider when the last u8 value is u8::MAX.

@PingXia-at PingXia-at requested a review from PSeitz September 11, 2023 01:31
@PSeitz PSeitz merged commit e4e416a into quickwit-oss:main Sep 11, 2023
@PSeitz
Copy link
Contributor

PSeitz commented Sep 11, 2023

LGTM. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants