Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add tokenizer and sparse encoding (#1301)
* add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * add tokenizer and sparse encoding Signed-off-by: xinyual <[email protected]> * remove special token Signed-off-by: xinyual <[email protected]> * add filter Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * remove warm up Signed-off-by: xinyual <[email protected]> * try empty model Signed-off-by: xinyual <[email protected]> * add block Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * remove log Signed-off-by: xinyual <[email protected]> * remove pt file detect Signed-off-by: xinyual <[email protected]> * add log Signed-off-by: xinyual <[email protected]> * add functionName pipeline Signed-off-by: xinyual <[email protected]> * remove verify log Signed-off-by: xinyual <[email protected]> * skip special token in sparse encoding Signed-off-by: xinyual <[email protected]> * skip omit tokenize config Signed-off-by: xinyual <[email protected]> * skip omit tokenize config-change warm up logic Signed-off-by: xinyual <[email protected]> * reArch Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * omit ml config in sparse encoding Signed-off-by: xinyual <[email protected]> * add null config in warm up Signed-off-by: xinyual <[email protected]> * fix original test Signed-off-by: xinyual <[email protected]> * add tokenize ut half Signed-off-by: xinyual <[email protected]> * fix sparse encoding bug Signed-off-by: xinyual <[email protected]> * add UT for sparse encoding and tokenize Signed-off-by: xinyual <[email protected]> * remove useless framwork type Signed-off-by: xinyual <[email protected]> * common/src/test/java/org/opensearch/ml/common/input/MLInputTest.java Signed-off-by: xinyual <[email protected]> * change key for tokenize Signed-off-by: xinyual <[email protected]> * reArch DLModel Signed-off-by: xinyual <[email protected]> * reArch DLModel again Signed-off-by: xinyual <[email protected]> * response format Signed-off-by: xinyual <[email protected]> * tokenize only one output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * clean sparse output Signed-off-by: xinyual <[email protected]> * change UT number Signed-off-by: xinyual <[email protected]> * remove useless predict code Signed-off-by: xinyual <[email protected]> * remove useless part Signed-off-by: xinyual <[email protected]> * change tokenize way Signed-off-by: xinyual <[email protected]> * reArch add textEmbedding model Signed-off-by: xinyual <[email protected]> * add tokenize logic Signed-off-by: xinyual <[email protected]> * add abstract Signed-off-by: xinyual <[email protected]> * clear code Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * fix it class Signed-off-by: xinyual <[email protected]> * add IT file Signed-off-by: xinyual <[email protected]> * reformulate Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference Signed-off-by: xinyual <[email protected]> * reformulate remote inference json and array Signed-off-by: xinyual <[email protected]> * verify Signed-off-by: xinyual <[email protected]> * undo string utils Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * skip dummy model Signed-off-by: xinyual <[email protected]> * add inner load Model Signed-off-by: xinyual <[email protected]> * rename variable Signed-off-by: xinyual <[email protected]> * add default for idf Signed-off-by: xinyual <[email protected]> * add ut for sparse encoding and tokenizer Signed-off-by: xinyual <[email protected]> * add close model Signed-off-by: xinyual <[email protected]> * change mock class Signed-off-by: xinyual <[email protected]> * remove buffer for sparse encoding output Signed-off-by: xinyual <[email protected]> * change tokenize model ready logic Signed-off-by: xinyual <[email protected]> * rewrite input functionName Signed-off-by: xinyual <[email protected]> * deduplicate Signed-off-by: xinyual <[email protected]> * change UT usage Signed-off-by: xinyual <[email protected]> * fix downloadAndSplit test Signed-off-by: xinyual <[email protected]> * fix Helper test Signed-off-by: xinyual <[email protected]> * remove meaningless change Signed-off-by: xinyual <[email protected]> * remove complie change Signed-off-by: xinyual <[email protected]> * rename Signed-off-by: xinyual <[email protected]> * fix typo error and simplify wrap code Signed-off-by: xinyual <[email protected]> * add comment Signed-off-by: xinyual <[email protected]> * using gson and remove useless close logic Signed-off-by: xinyual <[email protected]> * update comment and import problem Signed-off-by: xinyual <[email protected]> * add static idf name Signed-off-by: xinyual <[email protected]> * fix format problem Signed-off-by: xinyual <[email protected]> * extract an abstract model for sparse and dense sentence transformer translator Signed-off-by: xinyual <[email protected]> * fix typo error Signed-off-by: xinyual <[email protected]> * remove duplicate tokenizer file, fix import problem and add comment for tokenizer model Signed-off-by: xinyual <[email protected]> --------- Signed-off-by: xinyual <[email protected]> (cherry picked from commit 31a4e25)
- Loading branch information