Releases
v1.15
Backend updates
Transformers : bump to 4.45.
ExLlamaV2 : bump to 0.2.3.
flash-attention : bump to 2.6.3.
llama-cpp-python : bump to 0.3.1.
bitsandbytes : bump to 0.44.
PyTorch : bump to 2.4.1.
ROCm : bump wheels to 6.1.2.
Remove AutoAWQ, AutoGPTQ, HQQ, and AQLM from requirements.txt
:
AutoAWQ and AutoGPTQ were removed due to lack of support for PyTorch 2.4.1 and CUDA 12.1.
HQQ and AQLM were removed to make the project leaner since they're experimental with limited use.
You can still install those libraries manually if you are interested.
Changes
Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335 ). Thanks @p-e-w .
Make it possible to sort repetition penalties with "Sampler priority". The new keywords are:
repetition_penalty
presence_penalty
frequency_penalty
dry
encoder_repetition_penalty
no_repeat_ngram
xtc
(not a repetition penalty but also added in this update)
Don't import PEFT unless necessary. This makes the web UI launch faster.
Add beforeunload event to add confirmation dialog when leaving page (#6279 ). Thanks @leszekhanusz .
update API documentation with examples to list/load models (#5902 ). Thanks @joachimchauvet .
Training pro update script.py (#6359 ). Thanks @FartyPants .
Bug fixes
Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357 ). Thanks @GralchemOz .
API: Relax multimodal format, fixes HuggingFace Chat UI (#6353 ). Thanks @Papierkorb .
Force /bin/bash shell for conda (#6386 ). Thanks @Thireus .
Do not set value for histories in chat when --multi-user is used (#6317 ). Thanks @mashb1t .
typo in OpenAI response format (#6365 ). Thanks @jsboige .
You can’t perform that action at this time.