Skip to content

Releases: segment-any-text/wtpsplit

Release 2.1.4

25 Jan 16:43
Compare
Choose a tag to compare
  • Introduce optional hat weighting by @lsorber
  • Clarify LoRA adaptation
  • Clarify treat_newline_as_space: renamed to split_on_input_newlines. treat_newline_as_space will be deprecated in a future release.

Release 2.1.2

14 Dec 11:06
Compare
Choose a tag to compare
  • Fixes #142: AssertionError when string is only comprised of newlines, whitespace, or if its an empty strong.

Release 2.1.1

27 Oct 14:19
Compare
Choose a tag to compare
  • Change default behaviour for newlines in SaT.split.
    • Now, while the model ignores them, they will used to split as simple post-processing.
  • Small bugfixes for LoRA training
  • Update Readme for advanced usage

Release 2.1.0

24 Sep 21:37
00d2d6c
Compare
Choose a tag to compare
  • Adds ONNX support for SaT models.
    • Including export scripts and an updated README.
    • This results in 50% improved inference time on GPU.

Release 2.0.8

09 Sep 10:49
Compare
Choose a tag to compare
  • Fix splitting of short sequences into individual characters (#127)

Release 2.0.7

02 Sep 13:26
Compare
Choose a tag to compare
  • Allow numpy>=2.0
  • Fix adaptation code
  • Add some comments

Release 2.0.5

08 Jul 07:41
Compare
Choose a tag to compare
  • Fixes potential CUDA device error when the input has exactly 511 tokens (#121).

Release 2.0.4

01 Jul 09:32
Compare
Choose a tag to compare
  • Fix a speed issue with SaT (#118). Now it is (as expected) ~6x faster than WtP.

Release 2.0.3

26 Jun 08:05
Compare
Choose a tag to compare

Implement SaT (https://arxiv.org/abs/2406.16678) and switch the default models to SaT🚀

The previous WtP models are still available but SaT is strictly better in accuracy and speed. See the updated README for details: https://github.com/segment-any-text/wtpsplit.

SaT was implemented and developed by @markus583 @igorsterner.

Release 1.3.0

22 Jan 15:30
Compare
Choose a tag to compare