Add support for `weights_only` flag when loading state_dict #32481

jerryzh168 · 2024-08-07T01:50:23Z

Summary:
This is to enable loading a state_dict with wrapper tensor subclasses (used in torchao to for quantized weights)

Test Plan:
tested locally with torchao weights, also need #32306:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TorchAoConfig
from torchao.utils import benchmark_model
import torchao

DEVICE_TYPE = "cuda"

def init_model_and_benchmark(model_id, torch_dtype=torch.bfloat16, quantization_config=None):
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    if quantization_config is not None:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.bfloat16, quantization_config=quantization_config)
    else:
        model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.bfloat16, weights_only=False)

    # sanity check: run the model
    input_text = "What are we having for dinner?"
    input_ids = tokenizer(input_text, return_tensors="pt").to(DEVICE_TYPE)
    output = model.generate(**input_ids, max_new_tokens=1000)
    print(tokenizer.decode(output[0], skip_special_tokens=True))

    NUM_WARMUP = 1
    NUM_RUNS = 5

    if quantization_config is not None:
        torchao.quantization.utils.recommended_inductor_config_setter()

    model = torch.compile(model, mode="max-autotune")

    benchmark_model(model.generate, NUM_WARMUP, kwargs=input_ids, device_type=DEVICE_TYPE)
    print("running benchmark")
    results = benchmark_model(model.generate, NUM_RUNS, kwargs=input_ids, device_type=DEVICE_TYPE)
    return model, results

model_id = "jerryzh168/test-model"
torchao.quantization.utils.recommended_inductor_config_setter()
bf16_model, bf16_time = init_model_and_benchmark(model_id)
print(f"bf16: {bf16_time}")

Reviewers:

Subscribers:

Tasks:

Tags:

amyeroberts

Thanks for adding this support!

Overall, I think this looks good. Let's get a second 👍 from @ArthurZucker too as it's touching core code

src/transformers/modeling_utils.py

jerryzh168 · 2024-08-07T23:06:04Z

I learned from @mikaylagawarecki that weights_only is going to be set to True by default in the future. but this flag could still be helpful if people are using older versions of pytorch and want to use torchao I think

jerryzh168 · 2024-08-08T21:23:43Z

Hi @ArthurZucker can you take a look at this PR?

jerryzh168 · 2024-08-09T18:49:32Z

please let me know if we want to add a test, but load test is a bit harder to write because it needs to upload some models like:

transformers/tests/test_modeling_utils.py

Line 930 in abbffc4

model = BertModel.from_pretrained("hf-internal-testing/tiny-random-bert")

ArthurZucker

LGTM sorry for being late on the review here!

ArthurZucker · 2024-08-26T15:52:49Z

Can you rebase on main and resolve the conflicts?

HuggingFaceDocBuilderDev · 2024-08-26T16:12:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jerryzh168 · 2024-09-03T23:36:40Z

thanks for the review @ArthurZucker and @amyeroberts I just updated the PR, please feel free to merge when the CI is green

amyeroberts · 2024-09-04T07:48:11Z

@jerryzh168 Could you run make fix-copies and push the changes? This should resolve the failing quality checks

jerryzh168 · 2024-09-06T02:16:20Z

thanks, updated, please take a look again @amyeroberts

ArthurZucker

let's revert unrelated changes and let's go

ArthurZucker · 2024-09-06T12:00:40Z

src/transformers/models/olmoe/modeling_olmoe.py

this unrelated change should not be included here ! 🤗

@ArthurZucker this is fix from running make fix-copies 😅

ok no worries then let's merge!

ArthurZucker

Sorry for the delay would kindly ask to rebase to make sure we are up to date as other things were merged !

Summary: This is to enable loading a state_dict with wrapper tensor subclasses (used in torchao to for quantized weights) Test Plan: tested locally with torchao weights, also need huggingface#32306: ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import TorchAoConfig from torchao.utils import benchmark_model import torchao DEVICE_TYPE = "cuda" def init_model_and_benchmark(model_id, torch_dtype=torch.bfloat16, quantization_config=None): tokenizer = AutoTokenizer.from_pretrained(model_id) if quantization_config is not None: model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, quantization_config=quantization_config) else: model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, weights_only=False) # sanity check: run the model input_text = "What are we having for dinner?" input_ids = tokenizer(input_text, return_tensors="pt").to(DEVICE_TYPE) output = model.generate(**input_ids, max_new_tokens=1000) print(tokenizer.decode(output[0], skip_special_tokens=True)) NUM_WARMUP = 1 NUM_RUNS = 5 if quantization_config is not None: torchao.quantization.utils.recommended_inductor_config_setter() model = torch.compile(model, mode="max-autotune") benchmark_model(model.generate, NUM_WARMUP, kwargs=input_ids, device_type=DEVICE_TYPE) print("running benchmark") results = benchmark_model(model.generate, NUM_RUNS, kwargs=input_ids, device_type=DEVICE_TYPE) return model, results model_id = "jerryzh168/test-model" torchao.quantization.utils.recommended_inductor_config_setter() bf16_model, bf16_time = init_model_and_benchmark(model_id) print(f"bf16: {bf16_time}") ``` Reviewers: Subscribers: Tasks: Tags:

ArthurZucker · 2024-10-03T15:03:58Z

Thanks once more @jerryzh168

…ace#32481) * Add support for `weights_only` flag when loading state_dict Summary: This is to enable loading a state_dict with wrapper tensor subclasses (used in torchao to for quantized weights) Test Plan: tested locally with torchao weights, also need huggingface#32306: ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import TorchAoConfig from torchao.utils import benchmark_model import torchao DEVICE_TYPE = "cuda" def init_model_and_benchmark(model_id, torch_dtype=torch.bfloat16, quantization_config=None): tokenizer = AutoTokenizer.from_pretrained(model_id) if quantization_config is not None: model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, quantization_config=quantization_config) else: model = AutoModelForCausalLM.from_pretrained(model_id, device_map=DEVICE_TYPE, torch_dtype=torch.\bfloat16, weights_only=False) # sanity check: run the model input_text = "What are we having for dinner?" input_ids = tokenizer(input_text, return_tensors="pt").to(DEVICE_TYPE) output = model.generate(**input_ids, max_new_tokens=1000) print(tokenizer.decode(output[0], skip_special_tokens=True)) NUM_WARMUP = 1 NUM_RUNS = 5 if quantization_config is not None: torchao.quantization.utils.recommended_inductor_config_setter() model = torch.compile(model, mode="max-autotune") benchmark_model(model.generate, NUM_WARMUP, kwargs=input_ids, device_type=DEVICE_TYPE) print("running benchmark") results = benchmark_model(model.generate, NUM_RUNS, kwargs=input_ids, device_type=DEVICE_TYPE) return model, results model_id = "jerryzh168/test-model" torchao.quantization.utils.recommended_inductor_config_setter() bf16_model, bf16_time = init_model_and_benchmark(model_id) print(f"bf16: {bf16_time}") ``` Reviewers: Subscribers: Tasks: Tags: * format

amyeroberts approved these changes Aug 7, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the load-model branch from 928c671 to ce3775a Compare August 7, 2024 17:07

ArthurZucker approved these changes Aug 26, 2024

View reviewed changes

jerryzh168 force-pushed the load-model branch from ce3775a to e309bf1 Compare September 3, 2024 23:35

ArthurZucker reviewed Sep 6, 2024

View reviewed changes

ArthurZucker approved these changes Sep 26, 2024

View reviewed changes

jerryzh168 force-pushed the load-model branch from e6c5a9a to 8dd01e0 Compare October 1, 2024 23:17

format

243e751

ArthurZucker merged commit 15a4d24 into huggingface:main Oct 3, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `weights_only` flag when loading state_dict #32481

Add support for `weights_only` flag when loading state_dict #32481

jerryzh168 commented Aug 7, 2024 •

edited

Loading

amyeroberts left a comment

jerryzh168 commented Aug 7, 2024

jerryzh168 commented Aug 8, 2024

jerryzh168 commented Aug 9, 2024

ArthurZucker left a comment

ArthurZucker commented Aug 26, 2024

HuggingFaceDocBuilderDev commented Aug 26, 2024

jerryzh168 commented Sep 3, 2024 •

edited

Loading

amyeroberts commented Sep 4, 2024

jerryzh168 commented Sep 6, 2024

ArthurZucker left a comment

ArthurZucker Sep 6, 2024

jerryzh168 Sep 6, 2024

ArthurZucker Sep 26, 2024

ArthurZucker left a comment

ArthurZucker commented Oct 3, 2024

Add support for weights_only flag when loading state_dict #32481

Add support for weights_only flag when loading state_dict #32481

Conversation

jerryzh168 commented Aug 7, 2024 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

jerryzh168 commented Aug 7, 2024

jerryzh168 commented Aug 8, 2024

jerryzh168 commented Aug 9, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Aug 26, 2024

HuggingFaceDocBuilderDev commented Aug 26, 2024

jerryzh168 commented Sep 3, 2024 • edited Loading

amyeroberts commented Sep 4, 2024

jerryzh168 commented Sep 6, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Sep 6, 2024

Choose a reason for hiding this comment

jerryzh168 Sep 6, 2024

Choose a reason for hiding this comment

ArthurZucker Sep 26, 2024

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 3, 2024

Add support for `weights_only` flag when loading state_dict #32481

Add support for `weights_only` flag when loading state_dict #32481

jerryzh168 commented Aug 7, 2024 •

edited

Loading

jerryzh168 commented Sep 3, 2024 •

edited

Loading