Fix bug: when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized #953

michaelnny · 2024-01-04T04:21:29Z

when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized, this issue is reported #937

The purposed code add a condition check and only apply the quantization if no quant_state exists.

The full test code can be found in this Colab notebook:
https://colab.research.google.com/drive/1rBWjA5VsWdTE6GiATJWHFKHbdt-HrHE4?usp=sharing

…ights gets requantized and cause shape mismatch in forward

Titus-von-Koeller · 2024-01-18T13:06:56Z

Hi @michaelnny,

sorry for not reacting earlier. I was off sick and we're generally still struggling to get on top of the maintenance load, as @younesbelkada and I are taking over maintenance from Tim.

Thanks for raising this issue and taking the initiative to contribute code. In this specific case, we've already had a related PR that we had to revert, due to failing PEFT integration tests with an error thrown due to something off about stuff being on the wrong device. So good catch, I think your code would have fixed that with the code

            if self.data.device != device:
                self.data = self.data.to(device)
                self.quant_state.to(device)

We actually discussed this now in this new PR for FSDP where this issue now also got fixed. This will be part of the next release.

Sorry for not being able to react earlier and thanks again for your valuable contribution, even if your solution didn't make it over the finishing line this time! Would be happy to see other future contributions by you :)

Seems now this is fixed, due to the new implementation here, as part of the PR for the FSDP functionality.

michaelnny · 2024-01-18T13:20:13Z

@Titus-von-Koeller Hi, thanks for the update. Yes, earlier today I saw the new PR for FSDP has already fixed this issue, so I'm closing this particular pull request now.

Fix bug: when moving quantized model between CPU and GPU, the 4bit we…

ed69fa3

…ights gets requantized and cause shape mismatch in forward

michaelnny closed this by deleting the head repository Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug: when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized #953

Fix bug: when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized #953

michaelnny commented Jan 4, 2024 •

edited

Loading

Titus-von-Koeller commented Jan 18, 2024

michaelnny commented Jan 18, 2024

Fix bug: when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized #953

Fix bug: when moving quantized model between CPU and GPU, the 4bit weights gets re-quantized #953

Conversation

michaelnny commented Jan 4, 2024 • edited Loading

Titus-von-Koeller commented Jan 18, 2024

michaelnny commented Jan 18, 2024

michaelnny commented Jan 4, 2024 •

edited

Loading