Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor & improve stability #4

Merged
merged 32 commits into from
Jan 28, 2025
Merged

Refactor & improve stability #4

merged 32 commits into from
Jan 28, 2025

Conversation

t83714
Copy link
Contributor

@t83714 t83714 commented Jan 28, 2025

What this PR does

Fixes:

  • Rename EmbeddingGenerator to EmbeddingEncoder
  • Fixed serverOptions weren't passed through properly in test cases
  • Upgrade to @huggingface/transformers v3.2.4
  • Upgrade onnxruntime-node v1.20.1
  • Avoid including unused models in docker images (smaller image size)
  • Increase probe timeout seconds
  • Use worker pool
  • Process sentence list with separate model runs
  • set default workerTaskTimeout to 60 seconds
  • use quantized version (q8) default model
  • set default limits.memory to 850M
  • set default replicas number to 2
  • Add max_length config to model config (configurable via helm config)
  • set max_length of default model to 1024 due to excessive memory usage when working on text longer than 2048 (the default model supports up to 8192)
  • only use padding for multiple inputs received when encoding the input

Checklist

  • There are unit tests to verify my changes are correct or unit tests aren't applicable
  • I've updated CHANGES.md with what I changed.

t83714 and others added 30 commits January 7, 2025 18:49
…ngGenerator to EmbeddingEncoder

The embedding performance will be better but require more memory
- use full size model (unquantized) by default
- increase timeout seconds
…ure request timeout during new work creation
- set default limits.memory to 1100M
- set default replicas number to 2
- set default model to quantized (q8) Alibaba-NLP/gte-base-en-v1.5
…ge when working on text longer than 2048 (the model supports up to 8192)

- set limits.memory to 850M
@t83714 t83714 changed the title Refactor Refactor & improve stability Jan 28, 2025
@t83714
Copy link
Contributor Author

t83714 commented Jan 28, 2025

tested via alpha releases: v1.1.0-alpha.3

@t83714 t83714 merged commit 3fdd248 into main Jan 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant