IDEFICS, GPTQ Quantization
IDEFICS
The IDEFICS model was proposed in OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents by Hugo Laurençon, Lucile Saulnier, Léo Tronchon, Stas Bekman, Amanpreet Singh, Anton Lozhkov, Thomas Wang, Siddharth Karamcheti, Alexander M. Rush, Douwe Kiela, Matthieu Cord, Victor Sanh
IDEFICS is the first open state-of-the-art visual language model at the 80B scale!
The model accepts arbitrary sequences of image and text and produces text, similarly to a multimodal ChatGPT.
Blogpost: hf.co/blog/idefics
Playground: HuggingFaceM4/idefics_playground
MPT
MPT has been added and is now officially supported within Transformers. The repositories from MosaicML have been updated to work best with the model integration within Transformers.
- [
MPT
] Add MosaicML'sMPT
model to transformers by @ArthurZucker & @younesbelkada in #24629
GPTQ Integration
GPTQ quantization is now supported in Transformers, through the optimum
library. The backend relies on the auto_gptq library, from which we use the GPTQ
and QuantLinear
classes.
See below for an example of the API, quantizing a model using the new GPTQConfig
configuration utility.
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_name = "facebook/opt-125m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
config = GPTQConfig(bits=4, dataset = "c4", tokenizer=tokenizer, group_size=128, desc_act=False)
# works also with device_map (cpu offload works but not disk offload)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, quantization_config=config)
Most models under TheBloke namespace with the suffix GPTQ
should be supported, for example, to load a GPTQ quantized model on TheBloke/Llama-2-13B-chat-GPTQ
simply run (after installing latest optimum and auto-gptq libraries):
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "TheBloke/Llama-2-13B-chat-GPTQ"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
For more information about this feature, we recommend taking a look at the following announcement blogpost: https://huggingface.co/blog/gptq-integration
Pipelines
A new pipeline, dedicated to text-to-audio and text-to-speech models, has been added to Transformers. It currently supports the 3 text-to-audio models integrated into transformers
: SpeechT5ForTextToSpeech
, MusicGen
and Bark
.
See below for an example:
from transformers import pipeline
classifier = pipeline(model="suno/bark")
output = pipeline("Hey it's HuggingFace on the phone!")
audio = output["audio"]
sampling_rate = output["sampling_rate"]
Classifier-Free Guidance decoding
Classifier-Free Guidance decoding is a text generation technique developed by EleutherAI, announced in this paper. With this technique, you can increase prompt adherence in generation. You can also set it up with negative prompts, ensuring your generation doesn't go in specific directions. See its docs for usage instructions.
- add CFG for .generate() by @Vermeille in #24654
Task guides
A new task guide going into Visual Question Answering has been added to Transformers.
- VQA task guide by @MKhalusova in #25244
Model deprecation
We continue the deprecation of models that was introduced in #24787.
By deprecating, we indicate that we will stop maintaining such models, but there is no intention of actually removing those models and breaking support for them (they might one day move into a separate repo/on the Hub, but we would still add the necessary imports to make sure backward compatibility stays). The main point is that we stop testing those models. The usage of the models drives this choice and aims to ease the burden on our CI so that it may be used to focus on more critical aspects of the library.
- Deprecate unused OpenLlama architecture by @tomaarsen in #24922
Translation Efforts
There are ongoing efforts to translate the transformers' documentation in other languages. These efforts are driven by groups independent to Hugging Face, and their work is greatly appreciated further to lower the barrier of entry to ML and Transformers.
If you'd like to kickstart such an effort or help out on an existing one, please feel free to reach out by opening an issue.
- 🌐 [i18n-KO] Translated
tasks/document_question_answering.md
to Korean by @jungnerd in #24588 - 🌐 [i18n-KO] Fixed Korean and English
quicktour.md
by @wonhyeongseo in #24664 - 🌐 [i18n-KO] Updated Korean
serialization.md
by @wonhyeongseo in #24686 - 🌐 [i18n-KO] Translated performance.md to Korean by @augustinLib in #24883
- 🌐 [i18n-KO] Translated
testing.md
to Korean by @Sunmin0520 in #24900 - 🌐 [i18n-KO] Translated
perf_train_cpu.md
to Korean by @seank021 in #24911 - 🌐 [i18n-KO] Translated
<tf_xla>.md
to Korean by @54data in #24904 - 🌐 [i18n-KO] Translated
perf_hardware.md
to Korean by @augustinLib in #24966 - 🌐 [i18n-KO] Translated
hpo_train.md
to Korean by @harheem in #24968 - 🌐 [i18n-KO] Translated
perf_infer_cpu.md
to Korean by @junejae in #24920 - 🌐 [i18n-KO] Translated pipeline_webserver.md to Korean by @kihoon71 in #24828
- 🌐 [i18n-KO] Translated
transformers_agents.md
to Korean by @sim-so in #24881 - 🌐 [i18n-KO] Translated
perf_infer_gpu_many.md
to Korean by @heuristicwave in #24943 - 🌐 [i18n-KO] Translated
perf_infer_gpu_one.md
to Korean by @eenzeenee in #24978 - 🌐 [i18n-KO] Translated
add_tensorflow_model.md
to Korean by @keonju2 in #25017 - 🌐 [i18n-KO] Translated
perf_train_cpu_many.md
to Korean by @nuatmochoi in #24923 - 🌐 [i18n-KO] Translated
add_new_model.md
to Korean by @mjk0618 in #24957 - 🌐 [i18n-KO] Translated
model_summary.md
to Korean by @0525hhgus in #24625 - 🌐 [i18n-KO] Translated
philosophy.md
to Korean by @TaeYupNoh in #25010 - 🌐 [i18n-KO] Translated
perf_train_tpu_tf.md
to Korean by @0525hhgus in #25433 - 🌐 [i18n-KO] Translated docs: ko: pr_checks.md to Korean by @sronger in #24987
Explicit input data format for image processing
Addition of input_data_format
argument to image transforms and ImageProcessor methods, allowing the user to explicitly set the data format of the images being processed. This enables processing of images with non-standard number of channels e.g. 4 or removes error which occur when the data format was inferred but the channel dimension was ambiguous.
import numpy as np
from transformers import ViTImageProcessor
img = np.random.randint(0, 256, (4, 6, 3))
image_processor = ViTImageProcessor()
inputs = image_processor(img, image_mean=0, image_std=1, input_data_format="channels_first")
- Input data format by @amyeroberts in #25464
- Add input_data_format argument, image transforms by @amyeroberts in #25462
Documentation clarification about efficient inference through torch.scaled_dot_product_attention
& Flash Attention
Users are not aware that it is possible to force dispatch torch.scaled_dot_product_attention
method from torch
to use Flash Attention kernels. This leads to considerable speedup and memory saving, and is also compatible with quantized models. We decided to make this explicit to users in the documentation.
- [Docs / BetterTransformer ] Added more details about flash attention + SDPA : #25265
In a nutshell, one can just run:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model = AutoModelForCausalLM.from_pretrained("facebook/opt-350m").to("cuda")
# convert the model to BetterTransformer
model.to_bettertransformer()
input_text = "Hello my dog is cute and"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
to enable Flash-attenion in their model. However, this feature does not support padding yet.
FSDP and DeepSpeed Changes
Users will no longer encounter CPU RAM OOM when using FSDP to train very large models in multi-gpu or multi-node multi-gpu setting.
Users no longer have to pass fsdp_transformer_layer_cls_to_wrap
as the code now use _no_split_modules
by default which is available for most of the popular models. DeepSpeed Z3 init now works properly with Accelerate Launcher + Trainer.
- add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
- fix fsdp checkpointing issues by @pacman100 in #24926
- fsdp fixes and enhancements by @pacman100 in #24980
- fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
- resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
- fix z3 init when using accelerate launcher by @pacman100 in #25589
Breaking changes
Default optimizer in the Trainer
class
The default optimizer in the Trainer
class has been updated to be adam_torch
rather than our own adam_hf
, as the official Torch optimizer is more robust and fixes some issues.
In order to keep the old behavior, ensure that you pass "adamw_hf" as the optim
value in your TrainingArguments
.
- 🚨🚨🚨Change default from
adamw_hf
toadamw_torch
🚨🚨🚨 by @muellerzr in #25109
ViVit and EfficientNet rescale bugfix
There was an issue with the definition of the rescale of values with ViVit and EfficientNet. These have been fixed, but will result in different model outputs for both of these models. To understand the change and see what needs to be done to obtain previous results, please take a look at the following PR.
- 🚨🚨🚨 Fix rescale ViVit Efficientnet by @amyeroberts in #25174
- 🚨🚨🚨 Vivit update default rescale_factor value by @amyeroberts in #25547
Removing softmax for the image classification EfficientNet class
The EfficientNetForImageClassification
model class did not follow conventions and added a softmax to the model logits. This was removed so that it respects the convention set by other models.
In order to obtain previous results, pass the model logits through a softmax.
- 🚨🚨🚨 Remove softmax for EfficientNetForImageClassification 🚨🚨🚨 by @amyeroberts in #25501
Bug fixes with SPM models
Some SPM models had issues with their management of added tokens. Namely the Llama
and T5
, among others, were behaving incorrectly. These have been updated in #25224.
An option to obtain the previous behavior was added through the legacy
flag, as explained in the PR linked above.
- 🚨🚨🚨 [
SPM
] Finish fix spm models 🚨🚨🚨 by @ArthurZucker in #25224
Bugfixes and improvements
- Disable ipex env var if false by @muellerzr in #24885
- Check for accelerate env var when doing CPU only by @muellerzr in #24890
- Avoid some pipeline tasks to use
use_cache=True
by @ydshieh in #24893 - Update tested versions in READMEs by @EliahKagan in #24895
- Fix
test_model_parallelism
forFalconModel
by @ydshieh in #24914 - Fixed issue where ACCELERATE_USE_CPU="False" results in bool(True) by @madhavajay in #24907
- fix typo in BARK_PRETRAINED_MODEL_ARCHIVE_LIST by @21jun in #24902
- Fix minor llama2.md model doc typos by @tmc in #24909
- [
Llama2
] replaceself.pretraining_tp
withself.config.pretraining_tp
by @younesbelkada in #24906 - [doc]
image_processing_vilt.py
wrong default documented by @stas00 in #24931 - Add multi-label text classification support to pytorch example by @ranchlai in #24770
- replace no_cuda with use_cpu in test_pytorch_examples by @statelesshz in #24944
- Generate: sequence bias can handle same terminations by @gante in #24822
- Update processing_vision_text_dual_encoder.py by @premsa in #24950
- Fix
main_input_name
insrc/transformers/keras_callbacks.py
by @ydshieh in #24916 - [DOCS] Example for
LogitsProcessor
class by @shauray8 in #24848 - fix type annotations for arguments in training_args by @shauray8 in #24550
- [
RWKV
] Add Gradient Checkpointing support for RWKV by @younesbelkada in #24955 - Change logic for logging in the examples by @muellerzr in #24956
- Contrastive Search peak memory reduction by @blbadger in #24120
- Fallback for missing attribute
Parameter.ds_numel
by @apoorvkh in #24942 - fix fsdp checkpointing issues by @pacman100 in #24926
- fix: cast input pixels to appropriate dtype for image_to_text pipelines by @JimAllanson in #24947
- fsdp fixes and enhancements by @pacman100 in #24980
- Fix missing spaces in system prompt of Llama2 tokenizer by @chenjoya in #24930
- [
LlamaConfig
] Nit: pad token should be None by default by @ArthurZucker in #24958 - Remove tokenizers from the doc table by @sgugger in #24963
- Avoid importing all models when instantiating a pipeline by @sgugger in #24960
- Fix type annotation for deepspeed training arg by @sgugger in #24988
- Use main_input_name for include_inputs_for_metrics by @sgugger in #24993
- Fix
llama
tokenization doctest by @ydshieh in #24990 - [
bnb
] Add simple check for bnb import by @younesbelkada in #24995 - [
Llama
] remove persistentinv_freq
tensor by @ArthurZucker in #24998 - improve from_pretrained for zero3 multi gpus mode by @1ytic in #24964
- Move template doc file to md by @sgugger in #25004
- [check_config_docstrings.py] improve diagnostics by @stas00 in #25012
- [
logging.py
] set defaultstderr
path ifNone
by @ArthurZucker in #25033 - fix(integrations): store serialized
TrainingArgs
towandb.config
without sanitization. by @parambharat in #25035 - [docs] Performance docs tidy up, part 1 by @MKhalusova in #23963
- Support GatedRepoError + use raise from by @Wauplin in #25034
- Better handling missing SYS in llama conversation tokenizer by @ichernev in #24997
- Add dispatch_batches to training arguments by @muellerzr in #25038
- Fix typo in LlamaTokenizerFast docstring example by @sbrunk in #25018
- Make more test models smaller by @sgugger in #25005
- Pvt model by @Xrenya in #24720
- compute_loss in trainer failing to label shift for PEFT model when label smoothing enabled. by @njbrake in #25044
- [
8bit
] Fix 8bit corner case with Blip2 8bit by @younesbelkada in #25047 - Better error message when signal is not supported on OS by @sgugger in #25049
- [
RWKV
] Add note in doc onRwkvStoppingCriteria
by @ArthurZucker in #25055 - Generate - add beam indices output in contrained beam search by @gante in #25042
- [Docs] fix rope_scaling doc string by @kashif in #25072
- Fix last models for common tests that are too big. by @sgugger in #25058
- fix: add TOC anchor link by @eenzeenee in #25066
- Set
TF32
flag for PyTorch cuDNN backend by @XuehaiPan in #25075 - Fix broken link in README_hd.md by @susnato in #25067
- replace
per_gpu_eval_batch_size
withper_device_eval_batch_size
in readme of multiple-choice task by @statelesshz in #25078 - [
generate
] Only warn users if thegeneration_config
'smax_length
is set to the default value by @ArthurZucker in #25030 - Fix: repeat per sample for SAM image embeddings by @xk-huang in #25074
- [DOCS] add example NoBadWordsLogitsProcessor by @SoyGema in #25046
- Allow generic composite models to pass more kwargs by @ydshieh in #24927
- [
ForSequenceClassification
] Supportleft
padding by @ArthurZucker in #24979 - [
TF
] Also apply patch to support left padding by @ArthurZucker in #25085 - Edit err message and comment in
test_model_is_small
by @connor-henderson in #25087 - [
PreTrainedTokenizerFast
] Keep properties from fast tokenizer by @ArthurZucker in #25053 - Hotfix for failing
MusicgenForConditionalGeneration
tests by @ydshieh in #25091 - [
T5
,MT5
,UMT5
] Add [T5, MT5, UMT5]ForSequenceClassification by @sjrl in #24726 - Fix doctest by @ydshieh in #25031
- fix tied_params for meta tensor by @SunMarc in #25101
- documentation for llama2 models by @shauray8 in #25102
- Fix
PvtModelIntegrationTest::test_inference_fp16
by @ydshieh in #25106 - Add descriptive docstring to TemperatureLogitsWarper by @nablabits in #24892
- fix "UserWarning: Creating a tensor from a list of numpy.ndarrays is … by @liucw2012 in #24772
- update
use_auth_token
->token
by @ydshieh in #25083 - Fix past CI after #24334 by @ydshieh in #25113
- Move common image processing methods to BaseImageProcessor by @amyeroberts in #25089
- Fix ViT docstring regarding default dropout values. by @ebezzam in #25118
- MaskFormer - enable return_dict in order to compile by @amyeroberts in #25052
- Move center_crop to BaseImageProcessor by @amyeroberts in #25122
- fix deepspeed load best model at end when the model gets sharded by @pacman100 in #25057
- fix delete all checkpoints when save_total_limit is set to 1 by @Pbihao in #25136
- [
T5/LlamaTokenizer
] default legacy toNone
to not always warn by @ArthurZucker in #25131 - Clarify 4/8 bit loading log message by @BramVanroy in #25134
- [
MptConfig
] support from pretrained args by @ArthurZucker in #25116 - Add offload support to Bark by @ylacombe in #25037
- More
token
things by @ydshieh in #25146 - Add bloom flax by @sanchit-gandhi in #25094
- Add new model in doc table of content by @sgugger in #25148
- Fix
.push_to_hub
and cleanupget_full_repo_name
usage by @Wauplin in #25120 - Add test when downloading from gated repo by @Wauplin in #25039
- override .cuda() to check if model is already quantized by @ranchlai in #25166
- Represent query_length in a different way to solve jit issue by @jiqing-feng in #25164
- make run_generation more generic for other devices by @statelesshz in #25133
- added compiled model support for inference by @markovalexander in #25124
- Update
use_auth_token
->token
in example scripts by @ydshieh in #25167 - [
Mpt
] Fix mpt slow test by @younesbelkada in #25170 - [
InstructBlip
] Fix instructblip slow test by @younesbelkada in #25171 - Fix beam search to sample at least 1 non eos token by @yonigottesman in #25103)
- [MusicGen] Fix integration tests by @sanchit-gandhi in #25169
- Musicgen: CFG is manually added by @gante in #25173
- Better error message in
_prepare_output_docstrings
by @ydshieh in #25202 - [
PreTrainedModel
] Wrapcuda
andto
method correctly by @younesbelkada in #25206 - Fix
all_model_classes
inFlaxBloomGenerationTest
by @ydshieh in #25211 - [quantization.md] fix by @stas00 in #25190
- [
pipeline
] revisit device check for pipeline by @younesbelkada in #25207 - Update tiny model info. and pipeline testing by @ydshieh in #25213
- Fix docker image build failure by @ydshieh in #25214
- make build_mpt_alibi_tensor a method of MptModel so that deepspeed co… by @sywangyi in #25193
- [
Pix2Struct
] Fix pix2struct cross attention by @younesbelkada in #25200 - [
Docs
/quantization
] Clearer explanation on how things works under the hood. + remove outdated info by @younesbelkada in #25216 - [
MPT
] Addrequire_bitsandbytes
on MPT integration tests by @younesbelkada in #25201 - [
Detr
] Fix detr BatchNorm replacement issue by @younesbelkada in #25230 - Move rescale dtype recasting to match torchvision ToTensor by @amyeroberts in #25229
- Fix set of model parallel in the Trainer when no GPUs are available by @sgugger in #25239
- fix get_keys_to_not_convert() to return correct modules for full precision inference by @ranchlai in #25105
- add pathname and line number to logging formatter in debug mode by @ranchlai in #25203
- Add
token
arugment in example scripts by @ydshieh in #25172 - resolving zero3 init when using accelerate config with Trainer by @pacman100 in #25227
- Update rescale tests - cast to float after rescaling to reflect #25229 by @amyeroberts in #25259
- Fix some bugs for two stage training of deformable detr by @jypjypjypjyp in #25045
- [DOCS] Add example and modified docs of EtaLogitsWarper by @ashishthomaschempolil in #25125
- Fix return_dict_in_generate bug in InstructBlip generate function by @eohomegrownapps in #25246
- Remove
pytest_options={"rA": None}
in CI by @ydshieh in #25263 - recommend DeepSpeed's Argument Parsing documentation by @BurnzZ in #25268
- [MMS] Fix mms by @patrickvonplaten in #25267
- CI with
num_hidden_layers=2
🚀🚀🚀 by @ydshieh in #25266 - CI with
pytest_num_workers=8
for torch/tf jobs by @ydshieh in #25274 - Docs: Update list of
report_to
logging integrations in docstring by @tomaarsen in #25281 - Update InstructBLIP & Align values after rescale update by @amyeroberts in #25209
- Docs: separate generate section by @gante in #25235
- Update bark doc by @ylacombe in #25234
- add generate method to SpeechT5ForTextToSpeech by @ylacombe in #25233
- Add timeout parameter to load_image function by @rolisz in #25184
- [JAX] Bump min version by @sanchit-gandhi in #25286
- [small] llama2.md typo by @H-Huang in #25295
- Fix typo: Roberta -> RoBERTa by @MrGeislinger in #25302
- Move usage of deprecated logging.warn to logging.warning by @PeterJCLaw in #25310
- Give more memory in test_disk_offload by @sgugger in #25315
- Generate: get generation mode as an enum by @gante in #25292
- Add offline mode for agents by @sgugger in #25226
- Deal with nested configs better in base class by @sgugger in #25237
- Document check copies by @sgugger in #25291
- Make
bark
could have tiny model by @ydshieh in #25290 - Document toc check and doctest check scripts by @sgugger in #25319
- [Whisper] Better error message for outdated generation config by @sanchit-gandhi in #25298
- Remove jnp.DeviceArray since it is deprecated. by @mariecwhite in #24875
- Update TF pin in docker image by @ydshieh in #25343
- Generalize CFG to allow for positive prompts by @oobabooga in #25339
- Loosen output shape restrictions on GPT-style models by @calpt in #25188
- Allow
trust_remote_code
in example scripts by @Jackmin801 in #25248 - Generate: remove Marian hack by @gante in #25294
- Fix more offload edge cases by @ydshieh in #25342
- Migrate Trainer from
Repository
toupload_folder
by @sgugger in #25095 - Adding more information in help parser on train_file and validation_file by @pphuc25 in #25324
- [DOCS] Add
NoRepeatNGramLogitsProcessor
Example forLogitsProcessor
class by @Rishab26 in #25186 - Docs: Added benchmarks for
torch.compile()
for vision models by @merveenoyan in #24748 - Add mask2former fp16 support by @pedrohml in #25093
- [DOCS] Add descriptive docstring to MinNewTokensLength by @nablabits in #25196
- Register ModelOutput subclasses as supported torch.utils._pytree nodes by @ringohoffman in #25358
- Fix
test_model_parallelism
by @ydshieh in #25359 - Add warning for missing attention mask when pad tokens are detected by @hackyon in #25345
- [ASR Pipeline] Clarify return timestamps by @sanchit-gandhi in #25344
- MaskFormer, Mask2Former - replace einsum for tracing by @amyeroberts in #25297
- Load state in else by @muellerzr in #25318
- Fix
token
in example template by @ydshieh in #25351 - Enable tests to run on third-party devcies by @statelesshz in #25327
- Fix
torch_job
worker(s) crashing by @ydshieh in #25374 - Generate: add config-level validation by @gante in #25381
- Fix missing usage of
token
by @ydshieh in #25382 - Use small config for
OneFormerModelTest.test_model_with_labels
by @ydshieh in #25383 - Add copied from for image processor methods by @amyeroberts in #25121
- change version by @SunMarc in #25387
- [DOCS] Add example for
TopPLogitsWarper
by @chiral-carbon in #25361 - 16059 - Add missing type hints for ASTModel by @nablabits in #25364
- rm useless condition since the previous condition contains it. by @jiqing-feng in #25403
- Fix path for dynamic module creation by @sgugger in #25402
- YOLOS - Revert default return_pixel_mask value by @amyeroberts in #25404
- Docs: introduction to generation with LLMs by @gante in #25240
- Generate: length validation by @gante in #25384
- Improve training args by @statelesshz in #25401
- Generate: generation config validation fixes in docs by @gante in #25405
- 16059 - Add extra type hints for AltCLIPModel by @nablabits in #25399
- Generate: lower severity of parameterization checks by @gante in #25407
- Update Bark generation configs and tests by @ylacombe in #25409
- aligned sample_beam output selection with beam_search by @hukuda222 in #25375
- Enable passing number of channels when inferring data format by @amyeroberts in #25412
- Bark: flexible generation config overload by @gante in #25414
- [DINOv2] Update pooler output by @NielsRogge in #25392
- Doc checks by @sgugger in #25408
- Generation: strict generation config validation at save time by @gante in #25411
- [WavLM] Fix Arxiv link and authors by @sanchit-gandhi in #25415
- Generate: Load generation config when
device_map
is passed by @gante in #25413 - Fix rendering for
torch.compile()
docs by @merveenoyan in #25432 - Add
examples
to tests to run whensetup.py
is modified by @ydshieh in #25437 - Fix issue with ratio evaluation steps and auto find batch size by @muellerzr in #25436
- docs: add LLaMA-Efficient-Tuning to awesome-transformers by @statelesshz in #25441
- Fix for #25437 by @ydshieh in #25454
- Refactor image processor testers by @amyeroberts in #25450
- Switch Transformers: remove overwritten beam sample test by @gante in #25458
- Reuse the cache created for latest
main
on PRs/branches ifsetup.py
is not modified by @ydshieh in #25445 - Update run_translation.py broken link example Pytoch by @SoyGema in #25461
- Add input_data_format argument, image transforms by @amyeroberts in #25462
- Mark flaky tests by @amyeroberts in #25463
- Revert "Reuse the cache created for latest
main
on PRs/branches" by @ydshieh in #25466 - import required torch and numpy libraries by @eze1376 in #25483
- fix : escape key of start_token from special characters before search end_token in token2json function of DonutProcessor by @nour-elkamel in #25472
- Remove logging code in TF Longformer that fails to compile by @Rocketknight1 in #25496
- Add type hints to Blip2QFormer, BigBirdForQA and ConditionalDetr family models by @nablabits in #25488
- Set can_generate for SpeechT5ForTextToSpeech by @ylacombe in #25493
- MaskFormer post_process_instance_segmentation bug fix convert out side of loop by @amyeroberts in #25497
- fix gptq nits by @SunMarc in #25500
- Conditional DETR type hint fix by @Rocketknight1 in #25505
- Check for case where
auxiliary_head
isNone
inUperNetPreTrainedModel
by @mmurray in #25514 - add repr to the BitsAndBytesConfig class by @ranchlai in #25517
- Make training args fully immutable by @muellerzr in #25435
- Use dynamic past key-values shape in TF-Whisper by @Rocketknight1 in #25523
- [TYPO] fix typo/format in quicktour.md by @lishukan in #25519
- Fix nested configs of Jukebox by @sgugger in #25533
- Marian: post-hack-fix correction by @gante in #25459
- Document the test fetcher by @sgugger in #25521
- Generate: fix default max length warning by @gante in #25539
- fix vit hybrid test by @SunMarc in #25543
- Fix
MaskFormerModelIntegrationTest
OOM by @ydshieh in #25544 - More frozen args by @muellerzr in #25540
- Input data format by @amyeroberts in #25464
- [ASR Pipeline] Fix init with timestamps by @sanchit-gandhi in #25438
- More utils doc by @sgugger in #25457
- Update trainer.py by @yundai424 in #25553
- Add documentation to dynamic module utils by @sgugger in #25534
- Fix MPT CI by @ydshieh in #25548
- Fix
torch.fx
tests on nightly CI by @ydshieh in #25549 - YOLOS - reset default return_pixel_mask value by @amyeroberts in #25559
- Skip
test_onnx_runtime_optimize
for now by @ydshieh in #25560 - [
Docs
] Fix un-rendered images by @younesbelkada in #25561 - Adds
TRANSFORMERS_TEST_DEVICE
by @vvvm23 in #25506 - Skip
test_beam_search_xla_generate_simple
forT5
by @ydshieh in #25566 - [
resize_embedding
] Introducepad_to_multiple_of
and guidance by @ArthurZucker in #25088 - [
SwitchTransformers
] Remove unused module by @ArthurZucker in #25427 - Inconsistency in PreTrainedModel.resize_token_embeddings When ZeRO3 Is Enabled by @sinamoeini in #25394
- [
NllbMoe
] Update code to properly support loss computation by @ArthurZucker in #25429 - [
Tests
] Fix failing 8bit test by @younesbelkada in #25564 - Revert "change version by @SunMarc in #25387)"
- add util for ram efficient loading of model when using fsdp by @pacman100 in #25107
- Skip
test_contrastive_generate
forTFXLNet
by @ydshieh in #25574 - add warning for 8bit optimizers by @SunMarc in #25575
- Fix typo in example code by @amelietamreymond in #25583
- Suggestions on Pipeline_webserver by @kihoon71 in #25570
- [
Docs
/BetterTransformer
] Added more details about flash attention + SDPA by @younesbelkada in #25265 - Added missing parenthesis in call to is_fsdp_enabled by @marma in #25585
- Replaces calls to
.cuda
with.to(torch_device)
in tests by @vvvm23 in #25571 - [
split_special_tokens
] Add support forsplit_special_tokens
argument to encode by @ArthurZucker in #25081 - [
Llama
] remove prompt and fix prefix finetuning by @ArthurZucker in #25565 - [Time series Informer] fix dtype of cumsum by @kashif in #25431
- fix z3 init when using accelerate launcher by @pacman100 in #25589
- [
TokenizerFast
] Fix setting prefix space in init by @ArthurZucker in #25563 - Make TTS automodels importable by @osanseviero in #25595
- reattach hooks when using
resize_token_embeddings
by @SunMarc in #25596 - Ignore all exceptions from signal in dynamic code by @sgugger in #25623
- Fix PEFT integration failures on nightly CI by @younesbelkada in #25624
- Run doctest for new files by @ydshieh in #25588
- Fix test_modeling_mpt typo in model id by @JuanFKurucz in #25606
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @ranchlai
- Add multi-label text classification support to pytorch example (#24770)
- override .cuda() to check if model is already quantized (#25166)
- fix get_keys_to_not_convert() to return correct modules for full precision inference (#25105)
- add pathname and line number to logging formatter in debug mode (#25203)
- add repr to the BitsAndBytesConfig class (#25517)
- @wonhyeongseo
- @Sunmin0520
- 🌐 [i18n-KO] Translated
testing.md
to Korean (#24900)
- 🌐 [i18n-KO] Translated
- @Xrenya
- Pvt model (#24720)
- @susnato
- @sjrl
- [
T5
,MT5
,UMT5
] Add [T5, MT5, UMT5]ForSequenceClassification (#24726)
- [
- @Jackmin801
- Allow
trust_remote_code
in example scripts (#25248)
- Allow
- @mjk0618
- 🌐 [i18n-KO] Translated
add_new_model.md
to Korean (#24957)
- 🌐 [i18n-KO] Translated