Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multilang tokenizer #3608

Merged
merged 12 commits into from
Jul 17, 2023
34 changes: 34 additions & 0 deletions config/tutorials/wikipedia/multilang-index-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#
# Index config file for multilang wikipedia datasets.
#

version: 0.6

index_id: multilang-wikipedia

doc_mapping:
tokenizers:
- name: multilang
type: multilang
field_mappings:
- name: title
type: text
tokenizer: multilang
record: position
stored: true
fieldnorms: true
- name: body
type: text
tokenizer: multilang
record: position
stored: true
fieldnorms: true
- name: url
type: text
tokenizer: raw

search_settings:
default_search_fields: [title, body]

indexing_settings:
commit_timeout_secs: 10
Loading