Enable cpu/xpu support for the benchmarking suite #905

louie-tsai · 2024-05-22T19:10:10Z

Enable Intel CPU and Intel XPU support for Benchmark Suite.
Many customers use deepspeed on CPU and XPU for LLM models, and this benchmark suite helps them to debugging communication issues on their environment.

an screenshot for two nodes run of all_reduce.py on CPU

an screenshot for two cards run of run_all.py on XPU

louie-tsai · 2024-05-23T01:09:19Z

@louie-tsai please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree [company="{Intel}"]

louie-tsai · 2024-05-23T01:09:42Z

@louie-tsai the command you issued was incorrect. Please try again.

Examples are:
@microsoft-github-policy-service agree
and
@microsoft-github-policy-service agree company="your company"

@microsoft-github-policy-service agree [company="{Intel}"]

louie-tsai · 2024-05-23T01:10:19Z

@microsoft-github-policy-service agree [company="{Intel}"]

louie-tsai · 2024-05-23T01:11:13Z

@louie-tsai the command you issued was incorrect. Please try again.

Examples are:
@microsoft-github-policy-service agree
and
@microsoft-github-policy-service agree company="your company"

@microsoft-github-policy-service agree company="Intel"

louie-tsai · 2024-05-23T01:12:11Z

@microsoft-github-policy-service agree company="Intel"

@microsoft-github-policy-service agree company=Intel

tjruwase · 2024-06-27T14:01:05Z

@louie-tsai, thanks so much. This is an amazing PR. We will review and merge shortly.

tjruwase · 2024-06-27T14:03:13Z

@louie-tsai, can you confirm if this PR is ready for review, I noticed that output (e.g., Gbps) is incorrect/missing.

benchmarks/communication/utils.py

benchmarks/communication/all_gather.py

louie-tsai · 2024-07-26T07:29:35Z

@louie-tsai, can you confirm if this PR is ready for review, I noticed that output (e.g., Gbps) is incorrect/missing.

The output issue is related to the duration calculation from event.
if I used time.time to measure instead of XPU event. it looks good.

I will escalate the XPU event issue and ask for a fix.
In the meantime, remove XPU support from README

louie-tsai requested review from tjruwase, awan-10, eltonzheng, duli2012, mrwyattii, arashb and xiaoxiawu-microsoft as code owners May 22, 2024 19:10

louie-tsai force-pushed the benchmark_cpu branch from 14d47e8 to 1d2856d Compare May 24, 2024 21:15

louie-tsai force-pushed the benchmark_cpu branch from 1d2856d to 9a97faf Compare June 11, 2024 23:53

enable cpu/xpu support for the benchmarking suite

ad3c3bf

aice-support force-pushed the benchmark_cpu branch from 9a97faf to ad3c3bf Compare June 26, 2024 16:40

tjruwase requested review from costin-eseanu and removed request for arashb, duli2012, awan-10, mrwyattii, eltonzheng and xiaoxiawu-microsoft June 27, 2024 13:59

costin-eseanu reviewed Jun 27, 2024

View reviewed changes

benchmarks/communication/utils.py Outdated Show resolved Hide resolved

costin-eseanu reviewed Jun 27, 2024

View reviewed changes

benchmarks/communication/all_gather.py Outdated Show resolved Hide resolved

fixes according to review feedback

8f835ea

louie-tsai requested a review from costin-eseanu July 26, 2024 07:44

loadams approved these changes Aug 14, 2024

View reviewed changes

loadams merged commit b04fedd into microsoft:master Aug 14, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cpu/xpu support for the benchmarking suite #905

Enable cpu/xpu support for the benchmarking suite #905

louie-tsai commented May 22, 2024 •

edited

Loading

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

tjruwase commented Jun 27, 2024

tjruwase commented Jun 27, 2024

louie-tsai commented Jul 26, 2024 •

edited

Loading

Enable cpu/xpu support for the benchmarking suite #905

Enable cpu/xpu support for the benchmarking suite #905

Conversation

louie-tsai commented May 22, 2024 • edited Loading

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

louie-tsai commented May 23, 2024

tjruwase commented Jun 27, 2024

tjruwase commented Jun 27, 2024

louie-tsai commented Jul 26, 2024 • edited Loading

louie-tsai commented May 22, 2024 •

edited

Loading

louie-tsai commented Jul 26, 2024 •

edited

Loading