-
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA #9438
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Hey @WoosukKwon - I have been using this PR for the I noticed that even with this PR and #9437, value of def load_model(self, *, model_config: ModelConfig,
device_config: DeviceConfig,
lora_config: Optional[LoRAConfig],
parallel_config: ParallelConfig,
scheduler_config: SchedulerConfig,
cache_config: CacheConfig) -> nn.Module:
target_device = torch.device(device_config.device)
with set_default_torch_dtype(model_config.dtype):
with target_device:
model = _initialize_model(model_config, self.load_config,
lora_config, cache_config,
scheduler_config) # << here, value of peak_bytes_used == bytes_used == weight_size
model.load_weights(self._get_all_weights(model_config, model)) # here, peak_bytes_used > bytes_used So, I was thinking we might want to reset |
…-project#9438) Signed-off-by: Randall Smith <[email protected]>
…-project#9438) Signed-off-by: NickLucche <[email protected]>
…-project#9438) Signed-off-by: NickLucche <[email protected]>
…-project#9438) Signed-off-by: Linkun Chen <[email protected]>
…-project#9438) Signed-off-by: Loc Huynh <[email protected]>
…-project#9438) Signed-off-by: Sumit Dubey <[email protected]>
…-project#9438) Signed-off-by: Maxime Fournioux <[email protected]>
…-project#9438) Signed-off-by: Tyler Michael Smith <[email protected]>
…-project#9438) Signed-off-by: s.kochetkov <[email protected]>
Should be merged after #9437 and after the 10/17 version of PyTorch XLA nightly is available.
This PR upgrades the PyTorch XLA, and uses the
peak_bytes_used
to correctly profile the peak HBM usage during the dummy profile run.