Skip to content

Development

VincentFoulon80 edited this page Jun 10, 2019 · 1 revision

Create your own Tokenizers

The only thing you need to do when you want to create a Tokenizer is implementing the TokenizerInterface.

You'll have to create a function tokenize where you'll have an array of tokens as parameters, and you'll need to return another array with these same tokens after treatment. You can take a look at some already existing Tokenizers bundled with this engine : WhiteSpaceTokenizer, LowerCaseTokenizer, TrimPunctuationTokenizer...
(Full list)

Clone this wiki locally