Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v1.0.0
Highlights
- Support for Nomic models
- Support for Flash Attention for Jina models
- Metal backend for M* users
/tokenize
route to directly access the internal TEI tokenizer/embed_all
route to allow client level pooling
What's Changed
- fix: limit the number of buckets for prom metrics by @OlivierDehaene in #114
- feat: support flash attention for Jina by @OlivierDehaene in #119
- feat: add support for Metal by @OlivierDehaene in #120
- fix: fix turing for Jina and limit concurrency in docker build by @OlivierDehaene in #121
- fix(router): fix panics on partial_cmp and empty req.texts by @OlivierDehaene in #138
- feat(router): add /tokenize route by @OlivierDehaene in #139
- feat(backend): support classification for bert by @OlivierDehaene in #155
- feat: add embed_raw route to get all embeddings without pooling by @OlivierDehaene in #154
- added docs for
OTLP_ENDPOINT
around the defaults and format sent by @MarcusDunn in #157 - fix: use mimalloc to solve memory "leak" by @OlivierDehaene in #161
- fix: remove modif of tokenizer by @OlivierDehaene in #163
- fix: add cors_allow_origin to cli by @OlivierDehaene in #162
- fix: use st max_seq_length by @OlivierDehaene in #167
- feat: support nomic models by @OlivierDehaene in #166
New Contributors
- @MarcusDunn made their first contribution in #157
Full Changelog: v0.6.0...v1.0.0
v0.6.0
What's Changed
- Doc build only if doc files were changed by @mishig25 in #85
- fix: fix inappropriate title of API docs page by @ucyang in #88
- fix: hf hub redirects by @OlivierDehaene in #89
- feat: add grpc router by @OlivierDehaene in #90
- fix: fix padding support in batch tokens by @OlivierDehaene in #93
- fix: fix tokenizers with both whitespace and metaspace by @OlivierDehaene in #96
- fix: enable http feature in http-builder by @zhangfand in #98
- feat: add integration tests by @OlivierDehaene in #101
New Contributors
- @mishig25 made their first contribution in #85
- @ucyang made their first contribution in #88
- @zhangfand made their first contribution in #98
Full Changelog: v0.5.0...v0.6.0
v0.5.0
What's Changed
- feat: accept batches in predict by @OlivierDehaene in #78
- feat: rerank route by @OlivierDehaene in #84
Full Changelog: v0.4.0...v0.5.0
v0.4.0
What's Changed
- feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
- docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
- feat: support roberta by @kozistr in #62
- fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
- docs: Images links updated by @MKhalusova in #72
- feat: add
normalize
option by @OlivierDehaene in #70 - ci: Migrate CI to new Runners by @glegendre01 in #74
- feat: add support for classification models by @OlivierDehaene in #76
New Contributors
- @MKhalusova made their first contribution in #60
- @kozistr made their first contribution in #62
- @glegendre01 made their first contribution in #74
Full Changelog: v0.3.0...v0.4.0
v0.3.0
v0.2.2
What's Changed
fix: max_input_length should take into account position_offset (aec5efd)
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- fix: only use position offset for xlm-roberta (8c507c3)
Full Changelog: v0.2.0...v0.2.1
v0.2.0
v0.1.0
- No compilation step
- Dynamic shapes
- Small docker images and fast boot times. Get ready for true serverless!
- Token based dynamic batching
- Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
- Safetensors weight loading
- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)