Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training crashes with Accelerate 0.33.0 #22

Closed
rationalism opened this issue Aug 24, 2024 · 1 comment
Closed

Training crashes with Accelerate 0.33.0 #22

rationalism opened this issue Aug 24, 2024 · 1 comment

Comments

@rationalism
Copy link

If you upgrade to the new accelerate, 0.33.0, BNB QLoRA training crashes with this stack trace:

loading checkpoint file model-00001-of-00030.safetensors
load params into module <class 'llama_pipe.LlamaDecoderLayerPipe'>
Traceback (most recent call last):
  File "/home/alyssa/lm_fun/qlora-pipe/train.py", line 418, in <module>
    pipeline_model, lora_model, lora_config = load_pipeline_model_with_lora(config, model_type)
  File "/home/alyssa/lm_fun/qlora-pipe/train.py", line 279, in load_pipeline_model_with_lora
    pipeline_model = engine.CustomPipelineModule(
  File "/home/alyssa/lm_fun/qlora-pipe/engine.py", line 274, in __init__
    super().__init__(layers, **kwargs)
  File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 212, in __init__
    self._build()
  File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 268, in _build
    module = layer.build()
  File "/home/alyssa/lm_fun/qlora-pipe/pipeline_model.py", line 75, in build
    return self.typename(*self.module_args, **self.module_kwargs)
  File "/home/alyssa/lm_fun/qlora-pipe/llama_pipe.py", line 113, in __init__
    loader_util.load_state_dict_into_module(self)
  File "/home/alyssa/lm_fun/qlora-pipe/pipeline_model.py", line 316, in load_state_dict_into_module
    transformers.modeling_utils._load_state_dict_into_meta_model(
  File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/transformers/modeling_utils.py", line 961, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 436, in set_module_tensor_to_device
    new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
TypeError: Params4bit.__new__() got an unexpected keyword argument 'original_name'
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/train.py", line 418, in <module>
[rank0]:     pipeline_model, lora_model, lora_config = load_pipeline_model_with_lora(config, model_type)
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/train.py", line 279, in load_pipeline_model_with_lora
[rank0]:     pipeline_model = engine.CustomPipelineModule(
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/engine.py", line 274, in __init__
[rank0]:     super().__init__(layers, **kwargs)
[rank0]:   File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 212, in __init__
[rank0]:     self._build()
[rank0]:   File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/deepspeed/runtime/pipe/module.py", line 268, in _build
[rank0]:     module = layer.build()
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/pipeline_model.py", line 75, in build
[rank0]:     return self.typename(*self.module_args, **self.module_kwargs)
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/llama_pipe.py", line 113, in __init__
[rank0]:     loader_util.load_state_dict_into_module(self)
[rank0]:   File "/home/alyssa/lm_fun/qlora-pipe/pipeline_model.py", line 316, in load_state_dict_into_module
[rank0]:     transformers.modeling_utils._load_state_dict_into_meta_model(
[rank0]:   File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/transformers/modeling_utils.py", line 961, in _load_state_dict_into_meta_model
[rank0]:     set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
[rank0]:   File "/home/alyssa/anaconda3/envs/lm_fun/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 436, in set_module_tensor_to_device
[rank0]:     new_value = param_cls(new_value, requires_grad=old_value.requires_grad, **kwargs).to(device)
[rank0]: TypeError: Params4bit.__new__() got an unexpected keyword argument 'original_name'

Suspect it's because of this PR:

huggingface/accelerate#2934

This PR might also be relevant:

huggingface/accelerate#2986

Reverting to Accelerate 0.32.0 resolves the crash. Thank you!

@tdrussell
Copy link
Owner

This should be fixed as of the latest commits now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants