Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: #2078 return error when tokenizer not found while indexing #2093

Merged
merged 3 commits into from
Jun 16, 2023
Merged

fix: #2078 return error when tokenizer not found while indexing #2093

merged 3 commits into from
Jun 16, 2023

Conversation

naveenann
Copy link
Contributor

@naveenann naveenann commented Jun 15, 2023

This is to fix #2078

@naveenann naveenann marked this pull request as ready for review June 15, 2023 15:46
Comment on lines 933 to 936
let index = Index::create_in_dir(&tempdir_path, schema).unwrap();
index
.tokenizers()
.register("custom_en", custom_en_tokenizer);
Copy link
Contributor

@PSeitz PSeitz Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that can be removed, it's unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can remove the tokenizer registration but we may need the index creation else open_in_dir will fail

let mut document = Document::default();
document.add_text(title, "The Old Man and the Sea");
index_writer.add_document(document).unwrap();
match index_writer.commit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use unwrap_err() which will panic if it's not an error

@@ -900,4 +911,42 @@ mod tests {
postings.positions(&mut positions);
assert_eq!(positions, &[4]); //< as opposed to 3 if we had a position length of 1.
}

// ISSUE-#2078 - writing and searching shall throw error when the field tokenizer is missing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment implies this also tests search which is not the case

@naveenann naveenann requested a review from PSeitz June 15, 2023 16:16
@codecov-commenter
Copy link

Codecov Report

Merging #2093 (46b96c2) into main (ebc7812) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##             main    #2093   +/-   ##
=======================================
  Coverage   94.38%   94.38%           
=======================================
  Files         321      321           
  Lines       60583    60612   +29     
=======================================
+ Hits        57179    57210   +31     
+ Misses       3404     3402    -2     
Impacted Files Coverage Δ
src/indexer/segment_writer.rs 97.87% <100.00%> (+0.08%) ⬆️

... and 4 files with indirect coverage changes

@PSeitz PSeitz merged commit 5996209 into quickwit-oss:main Jun 16, 2023
@naveenann naveenann deleted the issue/2078-tokenizer-not-found-while-indexing branch June 23, 2023 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

writing and searching shall throw error when the field tokenizer is missing
3 participants