Skip to content

Releases: huggingface/text-embeddings-inference

v1.0.0

23 Feb 16:43
41b692d
Compare
Choose a tag to compare

Highlights

  • Support for Nomic models
  • Support for Flash Attention for Jina models
  • Metal backend for M* users
  • /tokenize route to directly access the internal TEI tokenizer
  • /embed_all route to allow client level pooling

What's Changed

New Contributors

Full Changelog: v0.6.0...v1.0.0

v0.6.0

30 Nov 14:28
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5.0

20 Nov 15:28
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.4.0...v0.5.0

v0.4.0

15 Nov 18:20
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0

27 Oct 12:46
Compare
Choose a tag to compare

What's Changed

  • feat: faster CPU image on AMD in #35
  • feat: support camembert in #42
  • feat: support float32 on cuda in #41
  • feat: support jinaAI variant in #48

Full Changelog: v0.2.2...v0.3.0

v0.2.2

19 Oct 12:12
Compare
Choose a tag to compare

What's Changed

fix: max_input_length should take into account position_offset (aec5efd)

Full Changelog: v0.2.1...v0.2.2

v0.2.1

18 Oct 17:39
Compare
Choose a tag to compare

What's Changed

  • fix: only use position offset for xlm-roberta (8c507c3)

Full Changelog: v0.2.0...v0.2.1

v0.2.0

18 Oct 11:40
Compare
Choose a tag to compare

What's Changed

  • add support for XLM-RoBERTa in #5
  • get number of tokenization workers from the number of CPU cores in #8
  • prefetch batch in #10
  • support loading from .pth in #12
  • add --pooling arg in #14
  • fix compute cap matching in #21

Full Changelog: v0.1.0...v0.2.0

v0.1.0

13 Oct 13:46
Compare
Choose a tag to compare
  • No compilation step
  • Dynamic shapes
  • Small docker images and fast boot times. Get ready for true serverless!
  • Token based dynamic batching
  • Optimized transformers code for inference using Flash Attention, Candle and cuBLASLt
  • Safetensors weight loading
  • Production ready (distributed tracing with Open Telemetry, Prometheus metrics)