Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Characterizing Power Management Opportunities for LLMs in the Cloud #294

Open
gaocegege opened this issue Jul 25, 2024 · 1 comment
Open

Comments

@gaocegege
Copy link
Member

https://dl.acm.org/doi/pdf/10.1145/3620666.3651329

@gaocegege
Copy link
Member Author

文章通过限制频率和功率,测试了电力对 LLM 的影响(尤其是训练)

image

文章里的几个 insights 很有意思

The peak power draw across GPUs in LLM training iterations often reaches or exceeds their TDP. For cluster power design, this means that LLM training clusters need to overprovision GPU power to ensure power safety

Large power swings are common in LLM training due to alternating computation- and communication-intensive phases across many GPUs. Since current power delivery infrastructure cannot always safely support large-scale power swings, LLM training clusters need specialized power infrastructure and management.

Power capping reduces peak power draw without affecting troughs, making it effective at reducing the magnitude of training power swings. Frequency locking lowers the overall power consumption, making it effective at reclaiming power on demand. Thus, both are useful in improving the power management in LLM training clusters.

LLM inference has distinct power consumption phases corresponding to prompt computation and token generation: prompt phases are brief and typically reach or exceed GPU TDP, whereas token phases are longer and draw less power. For cluster power design, this means that peak power in LLM inference clusters must be provisioned for the prompt phases, but doing so leads to underutilization during token phases; this mismatch must be addressed to improve power efficiency. LLM 推理具有与快速计算和令牌生成相对应的不同功耗阶段:快速阶段很短,通常达到或超过 GPU TDP,而令牌阶段较长且耗电量较少。对于集群电源设计,这意味着必须为快速阶段配置 LLM 推理集群中的峰值功率,但这样做会导致令牌阶段的利用率不足;必须解决这种不匹配问题以提高电源效率。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant