Releases · huggingface/text-embeddings-inference

23 Feb 16:43

OlivierDehaene

v1.0.0

41b692d

v1.0.0

Highlights

Support for Nomic models
Support for Flash Attention for Jina models
Metal backend for M* users
/tokenize route to directly access the internal TEI tokenizer
/embed_all route to allow client level pooling

What's Changed

fix: limit the number of buckets for prom metrics by @OlivierDehaene in #114
feat: support flash attention for Jina by @OlivierDehaene in #119
feat: add support for Metal by @OlivierDehaene in #120
fix: fix turing for Jina and limit concurrency in docker build by @OlivierDehaene in #121
fix(router): fix panics on partial_cmp and empty req.texts by @OlivierDehaene in #138
feat(router): add /tokenize route by @OlivierDehaene in #139
feat(backend): support classification for bert by @OlivierDehaene in #155
feat: add embed_raw route to get all embeddings without pooling by @OlivierDehaene in #154
added docs for OTLP_ENDPOINT around the defaults and format sent by @MarcusDunn in #157
fix: use mimalloc to solve memory "leak" by @OlivierDehaene in #161
fix: remove modif of tokenizer by @OlivierDehaene in #163
fix: add cors_allow_origin to cli by @OlivierDehaene in #162
fix: use st max_seq_length by @OlivierDehaene in #167
feat: support nomic models by @OlivierDehaene in #166

New Contributors

@MarcusDunn made their first contribution in #157

Full Changelog: v0.6.0...v1.0.0

Contributors

OlivierDehaene and MarcusDunn

Assets 2

30 Nov 14:28

OlivierDehaene

v0.6.0

2828127

v0.6.0

What's Changed

Doc build only if doc files were changed by @mishig25 in #85
fix: fix inappropriate title of API docs page by @ucyang in #88
fix: hf hub redirects by @OlivierDehaene in #89
feat: add grpc router by @OlivierDehaene in #90
fix: fix padding support in batch tokens by @OlivierDehaene in #93
fix: fix tokenizers with both whitespace and metaspace by @OlivierDehaene in #96
fix: enable http feature in http-builder by @zhangfand in #98
feat: add integration tests by @OlivierDehaene in #101

New Contributors

@mishig25 made their first contribution in #85
@ucyang made their first contribution in #88
@zhangfand made their first contribution in #98

Full Changelog: v0.5.0...v0.6.0

Contributors

zhangfand, mishig25, and 2 other contributors

Assets 2

20 Nov 15:28

OlivierDehaene

v0.5.0

88c5b54

v0.5.0

What's Changed

feat: accept batches in predict by @OlivierDehaene in #78
feat: rerank route by @OlivierDehaene in #84

Full Changelog: v0.4.0...v0.5.0

Contributors

OlivierDehaene

Assets 2

15 Nov 18:20

OlivierDehaene

v0.4.0

b41601c

v0.4.0

What's Changed

feat: USE_FLASH_ATTENTION env var by @OlivierDehaene in #57
docs: The initial version of the TEI docs for the hf.co/docs/ by @MKhalusova in #60
feat: support roberta by @kozistr in #62
fix: GH workflows update: added --not_python_module flag by @MKhalusova in #66
docs: Images links updated by @MKhalusova in #72
feat: add normalize option by @OlivierDehaene in #70
ci: Migrate CI to new Runners by @glegendre01 in #74
feat: add support for classification models by @OlivierDehaene in #76

New Contributors

@MKhalusova made their first contribution in #60
@kozistr made their first contribution in #62
@glegendre01 made their first contribution in #74

Full Changelog: v0.3.0...v0.4.0

Contributors

MKhalusova, kozistr, and 2 other contributors

Assets 2

27 Oct 12:46

OlivierDehaene

v0.3.0

c7d758d

v0.3.0

What's Changed

feat: faster CPU image on AMD in #35
feat: support camembert in #42
feat: support float32 on cuda in #41
feat: support jinaAI variant in #48

Full Changelog: v0.2.2...v0.3.0

Assets 2

19 Oct 12:12

OlivierDehaene

v0.2.2

aec5efd

v0.2.2

What's Changed

fix: max_input_length should take into account position_offset (aec5efd)

Full Changelog: v0.2.1...v0.2.2

Assets 2

18 Oct 17:39

OlivierDehaene

v0.2.1

021899d

v0.2.1

What's Changed

fix: only use position offset for xlm-roberta (8c507c3)

Full Changelog: v0.2.0...v0.2.1

Assets 2

18 Oct 11:40

OlivierDehaene

v0.2.0

bd34ca5

v0.2.0

What's Changed

add support for XLM-RoBERTa in #5
get number of tokenization workers from the number of CPU cores in #8
prefetch batch in #10
support loading from .pth in #12
add --pooling arg in #14
fix compute cap matching in #21

Full Changelog: v0.1.0...v0.2.0

Assets 2

13 Oct 13:46

OlivierDehaene

v0.1.0

b26fd46

v0.1.0

No compilation step
Dynamic shapes
Small docker images and fast boot times. Get ready for true serverless!
Token based dynamic batching
Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
Safetensors weight loading
Production ready (distributed tracing with Open Telemetry, Prometheus metrics)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

What's Changed

What's Changed

What's Changed

Releases: huggingface/text-embeddings-inference

v1.0.0

Highlights

What's Changed

New Contributors

Contributors

v0.6.0

What's Changed

New Contributors

Contributors

v0.5.0

What's Changed

Contributors

v0.4.0

What's Changed

New Contributors

Contributors

v0.3.0

What's Changed

v0.2.2

What's Changed

v0.2.1

What's Changed

v0.2.0

What's Changed

v0.1.0