You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The peak power draw across GPUs in LLM training iterations often reaches or exceeds their TDP. For cluster power design, this means that LLM training clusters need to overprovision GPU power to ensure power safety
Large power swings are common in LLM training due to alternating computation- and communication-intensive phases across many GPUs. Since current power delivery infrastructure cannot always safely support large-scale power swings, LLM training clusters need specialized power infrastructure and management.
Power capping reduces peak power draw without affecting troughs, making it effective at reducing the magnitude of training power swings. Frequency locking lowers the overall power consumption, making it effective at reclaiming power on demand. Thus, both are useful in improving the power management in LLM training clusters.
LLM inference has distinct power consumption phases corresponding to prompt computation and token generation: prompt phases are brief and typically reach or exceed GPU TDP, whereas token phases are longer and draw less power. For cluster power design, this means that peak power in LLM inference clusters must be provisioned for the prompt phases, but doing so leads to underutilization during token phases; this mismatch must be addressed to improve power efficiency. LLM 推理具有与快速计算和令牌生成相对应的不同功耗阶段:快速阶段很短,通常达到或超过 GPU TDP,而令牌阶段较长且耗电量较少。对于集群电源设计,这意味着必须为快速阶段配置 LLM 推理集群中的峰值功率,但这样做会导致令牌阶段的利用率不足;必须解决这种不匹配问题以提高电源效率。
https://dl.acm.org/doi/pdf/10.1145/3620666.3651329
The text was updated successfully, but these errors were encountered: