-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3.x SQ supports calib_func for auto-tune #1812
Conversation
Signed-off-by: Cheng, Zixuan <[email protected]>
Signed-off-by: Cheng, Zixuan <[email protected]>
Signed-off-by: Cheng, Zixuan <[email protected]>
🌩️ Required checks status: Pending 🟡Groups summary🟡 Code Scan Tests workflow
These checks are required after the changes to 🟡 Model Tests 3x workflow
These checks are required after the changes to 🟡 Unit Tests 3x-PyTorch workflow
These checks are required after the changes to Thank you for your contribution! 💜
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to #1810
Capture input data from run_fn and rebuild CapturedDataloader for SQ calibration
@@ -465,6 +473,9 @@ def forward_wrapper(model, input, device=torch.device("cpu")): # pragma: no cov | |||
output = model(*input) | |||
except: | |||
output = model(input) | |||
elif isinstance(input, zip): | |||
for args, kwargs in input: | |||
output = model(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point
|
||
block_modules = {} | ||
for key in self.block_names: | ||
block_modules[key] = get_module(self.model, key) | ||
self._add_blockwise_observer(block_modules) | ||
|
||
forward_wrapper(self.model, input, self.device) ##disable quant and get fp32 output | ||
# get input args and kwargs for the first block, then do forward | ||
total_block_args, total_block_kwargs = get_hidden_states(self.model, calib_sample_num, calib_func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we fused all output into once instead of per input? Will it impact the final result @yintong-lu
See new design in #1821 |
Type of Change
sq supports calib_func for auto-tune, no need for dataloader
Description
Layer-wise & block-wise enable
Add ut check auto-tune
Check llm examples
Expected Behavior & Potential Risk
How has this PR been tested?
Dependency Change?