Refactor & improve stability #4

t83714 · 2025-01-28T00:53:47Z

What this PR does

Fixes:

Rename EmbeddingGenerator to EmbeddingEncoder
Fixed serverOptions weren't passed through properly in test cases
Upgrade to @huggingface/transformers v3.2.4
Upgrade onnxruntime-node v1.20.1
Avoid including unused models in docker images (smaller image size)
Increase probe timeout seconds
Use worker pool
Process sentence list with separate model runs
set default workerTaskTimeout to 60 seconds
use quantized version (q8) default model
set default limits.memory to 850M
set default replicas number to 2
Add max_length config to model config (configurable via helm config)
set max_length of default model to 1024 due to excessive memory usage when working on text longer than 2048 (the default model supports up to 8192)
only use padding for multiple inputs received when encoding the input

Checklist

There are unit tests to verify my changes are correct or unit tests aren't applicable
I've updated CHANGES.md with what I changed.

…ngGenerator to EmbeddingEncoder The embedding performance will be better but require more memory

- onnxruntime-node 1.20.1

- use full size model (unquantized) by default - increase timeout seconds

…ure request timeout during new work creation

- set default limits.memory to 1100M - set default replicas number to 2 - set default model to quantized (q8) Alibaba-NLP/gte-base-en-v1.5

…ge when working on text longer than 2048 (the model supports up to 8192) - set limits.memory to 850M

t83714 · 2025-01-28T00:59:29Z

tested via alpha releases: v1.1.0-alpha.3

t83714 and others added 30 commits January 7, 2025 18:49

refactor: use non-quantized default model by default & rename Embeddi…

98dca3c

…ngGenerator to EmbeddingEncoder The embedding performance will be better but require more memory

update default config value

eb4e5f1

fixed serverOptions weren't passed through properly

3686785

- upgrade to @huggingface/transformers 3.2.4

c1deeae

- onnxruntime-node 1.20.1

fixed build issue

5570ed8

v1.1.0-alpha.0

8a2449b

fixed docker build

db80b70

Merge branch 'refactor' into release/v1.1.0-alpha.0

a833509

adjust docker build logic to fix sharp installation issue

72fecee

fix sharp installation in docker

0ea1e49

- avoid including unused models in docker images

bb2bb26

- use full size model (unquantized) by default - increase timeout seconds

fixed: .cache folder doesn't have write permission

8a58ab0

limit request processing concurrency to 1

211a160

change default model to q8 quantized version

0c416a0

skip checking embeddingEncoder ready status in readiness probe

19e3a0f

test cases adjustment

eb73161

use worker pool

1d2d818

fixed broken startup probe

aef657a

make sure encode is only called when worker ready

ae46442

set minWorker

ebfc925

print debug info

0099ebe

print debug info when DEBUG env var set to "true"

c7e6077

use separate session to process string array items

a5f35b3

clean up code

e62de4f

make maxWorkers, minWorkers & workerTaskTimeout configurable

6a19b22

move waitTillReady call out of encode function in worker to avoid fut…

73fc76d

…ure request timeout during new work creation

- set default workerTaskTimeout to 60 seconds

bdc759f

- set default limits.memory to 1100M - set default replicas number to 2 - set default model to quantized (q8) Alibaba-NLP/gte-base-en-v1.5

increase the default memory limit to 2G

6f461d0

- set max_length of default model to 1024 due to excessive memory usa…

b5ed686

…ge when working on text longer than 2048 (the model supports up to 8192) - set limits.memory to 850M

tokenizer: only use padding for multiple inputs are received

c9d1b8a

set workerTaskTimeout in package.json start script

1f70046

t83714 changed the title ~~Refactor~~ Refactor & improve stability Jan 28, 2025

update changes.md

2226460

t83714 merged commit 3fdd248 into main Jan 28, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor & improve stability #4

Refactor & improve stability #4

t83714 commented Jan 28, 2025 •

edited

Loading

t83714 commented Jan 28, 2025

Refactor & improve stability #4

Refactor & improve stability #4

Conversation

t83714 commented Jan 28, 2025 • edited Loading

What this PR does

Checklist

t83714 commented Jan 28, 2025

t83714 commented Jan 28, 2025 •

edited

Loading