Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmark job with IO #884

Merged
merged 2 commits into from
Jul 3, 2024
Merged

Add benchmark job with IO #884

merged 2 commits into from
Jul 3, 2024

Conversation

Sbozzolo
Copy link
Member

@Sbozzolo Sbozzolo commented Jul 3, 2024

The benchmakrs we've been running are not representative of real use cases because they don't save anything to disk (diagnostics/checkpoints). As a result, it is impossible to estimate the cost of any real run (where we care about the output). This PR adds a job with default diagnostics/checkpoints that can be used as reference of real-world usage.

┌──────────────────────────┬───────────────────────┬────────────────────────┬─────────────────────────┐
│            Build ID: 200 │ Horiz. res.: 30 elems │ CPU Run [64 processes] │       GPU Run [4 A100s] │
│                          │ Vert. res.: 63 levels │                        │                         │
│                          │           dt: 120secs │                        │                         │
├──────────────────────────┼───────────────────────┼────────────────────────┼─────────────────────────┤
│                          │               job ID: │          amip_diagedmf │       gpu_amip_diagedmf │
│                  Coupled │                 SYPD: │                 0.0378 │                  1.0704 │
│                          │          CPU max RSS: │              6.714 GiB │              10.208 GiB │
├──────────────────────────┼───────────────────────┼────────────────────────┼─────────────────────────┤
│                          │               job ID: │       amip_diagedmf_io │    gpu_amip_diagedmf_io │
│          Coupled with IO │                 SYPD: │                 0.0357 │                  0.5353 │
│                          │          CPU max RSS: │              6.699 GiB │              11.015 GiB │
├──────────────────────────┼───────────────────────┼────────────────────────┼─────────────────────────┤
│                          │               job ID: │    climaatmos_diagedmf │ gpu_climaatmos_diagedmf │
│    Atmos with diag. EDMF │                 SYPD: │                 0.0403 │                  1.0845 │
│                          │          CPU max RSS: │              6.737 GiB │               9.446 GiB │
├──────────────────────────┼───────────────────────┼────────────────────────┼─────────────────────────┤
│                          │               job ID: │             climaatmos │          gpu_climaatmos │
│ Atmos without diag. EDMF │                 SYPD: │                 0.1856 │                  4.4748 │
│                          │          CPU max RSS: │              6.071 GiB │               8.755 GiB │

@Sbozzolo Sbozzolo requested a review from szy21 July 3, 2024 14:45
Copy link
Member

@szy21 szy21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! hourly diagnostics (which is the default for t_end: 12h) is too frequent for production runs, so this will be an upper limit for how long diagnostics takes. Any reason why the diagnostics takes much more time for GPU run than for CPU run?

@szy21
Copy link
Member

szy21 commented Jul 3, 2024

We also have these draft PRs in atmos: CliMA/ClimaAtmos.jl#2646 and CliMA/ClimaAtmos.jl#2852

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Jul 3, 2024

Thanks! hourly diagnostics (which is the default for t_end: 12h) is too frequent for production runs, so this will be an upper limit for how long diagnostics takes.

Good point. I don't want us to keep dismissing IO, so I will decrease to one output over the 12h. This is a factor of 60 from monthly means, and I can also reduce the number of variables we output to 5 from 55 so that we are roughly in the same ballpark as a more realistic production run.

Any reason why the diagnostics takes much more time for GPU run than for CPU run?

The reason is that CPU runs spend much more time on computing, so if you add a little bit of IO, it won't change SYPD very much because that's not the dominant factor.

Sbozzolo added 2 commits July 3, 2024 09:04
The bucket tests was really minimal. Essentially, it just checked that
dss was being applied. Now that we no longer apply DSS in Land, the
test is no longer informative
@Sbozzolo Sbozzolo force-pushed the gb/benchmark_with_io branch from d1b5323 to 163a733 Compare July 3, 2024 16:04
@Sbozzolo
Copy link
Member Author

Sbozzolo commented Jul 3, 2024

@kmdeck I removed the bucket test here

@akshaysridhar
Copy link
Member

Can we run the CPU benchmarks on Caltech HPC? Currently we have:

CPU ClimaAtmos with diagnostic EDMF
   NodeList=clima
   BatchHost=clima

@Sbozzolo
Copy link
Member Author

Sbozzolo commented Jul 3, 2024

Can we run the CPU benchmarks on Caltech HPC? Currently we have:

CPU ClimaAtmos with diagnostic EDMF
   NodeList=clima
   BatchHost=clima

We could, but it is much simpler to run on the same machine because it makes the comparison a little bit more meaningful (e.g., we are using the same disk). Typically this workflow is not running during working hours, and sorry if I disrputed your work.

@Sbozzolo Sbozzolo merged commit b4733e9 into main Jul 3, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants