opensearch-project · kolchfa-aws · Jan 2, 2025 · Sep 23, 2024 · Sep 23, 2024 · Sep 24, 2024
diff --git a/_analyzers/tokenizers/character-group-tokenizer.md b/_analyzers/tokenizers/character-group-tokenizer.md
@@ -0,0 +1,43 @@
+---
+layout: default
+title: Character group tokenizer
+parent: Tokenizers
+nav_order: 20
+has_children: false
+has_toc: false
+---
+
+# Character group tokenizer
+
+The character group tokenizer is a simple text segmentation tool that splits text into tokens based on the presence of specific characters. This tokenizer is ideal for scenarios where a simple tokenization method is required, avoiding the complexity and overhead associated with pattern-based tokenizers.
+
+The character group tokenizer accepts the following parameters:
+
+1. `tokenize_on_chars`: Specifies a set of characters on which the text should be tokenized. The tokenizer creates a new token upon encountering any character from the specified set, for example, single characters `(e.g., -, @)` and character classes such as `whitespace`, `letter`, `digit`, `punctuation`, and `symbol`.
+2. `max_token_length`: Defines the token's maximum length. If the token exceeds the specified length, then the tokenizer splits a token at intervals defined by the parameter. Default is `255`.
+
+## Example: Using the character group tokenizer
+
+To tokenize the on characters such as `whitespace`, `-` and `:`, see the following example request:
+
+```json
+POST _analyze
+{
+  "tokenizer": {
+    "type": "char_group",
+    "tokenize_on_chars": [
+      "whitespace",
+      "-",
+      ":"
+    ]
+  },
+  "text": "Fast-cars: drive fast!"
+}
+```
+{% include copy-curl.html %}
+
+The following response shows that the specified characters have been removed: 
+
+```
+Fast cars drive fast
+```
diff --git a/_analyzers/tokenizers/index.md b/_analyzers/tokenizers/index.md
@@ -1,7 +1,7 @@
 ---
 layout: default
 title: Tokenizers
-nav_order: 60
+nav_order: 10
 has_children: false
 has_toc: false
 redirect_from: