Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IO benchmark job #2646

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
386 changes: 201 additions & 185 deletions .buildkite/gpu_pipeline/pipeline.yml

Large diffs are not rendered by default.

1,432 changes: 716 additions & 716 deletions .buildkite/pipeline.yml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_diagedmf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,5 +25,5 @@ edmfx_sgs_mass_flux: true
edmfx_sgs_diffusive_flux: true
precip_model: 0M
dt: 100secs
t_end: 12hours
t_end: 1days
toml: [toml/diagnostic_edmfx_box.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ss_1process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ dt_cloud_fraction: 1hours
surface_setup: DefaultMoninObukhov
rayleigh_sponge: true
dt: 100secs
t_end: 12hours
t_end: 1days
job_id: gpu_aquaplanet_dyamond_ss_1process
toml: [toml/longrun_aquaplanet_dyamond.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ss_2process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ approximate_linear_solve_iters: 2
surface_setup: "DefaultMoninObukhov"
rayleigh_sponge: true
dt: "100secs"
t_end: "12hours"
t_end: "1days"
job_id: "gpu_aquaplanet_dyamond_ss_2process"
toml: [toml/longrun_aquaplanet_dyamond.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ss_4process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ approximate_linear_solve_iters: 2
surface_setup: "DefaultMoninObukhov"
rayleigh_sponge: true
dt: "100secs"
t_end: "12hours"
t_end: "1days"
job_id: "gpu_aquaplanet_dyamond_ss_4process"
toml: [toml/longrun_aquaplanet_dyamond.toml]
22 changes: 22 additions & 0 deletions config/gpu_configs/gpu_aquaplanet_dyamond_ss_diag_1process.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
dt_save_state_to_disk: Inf
dt_save_to_sol: Inf
h_elem: 30
z_max: 55000.0
z_elem: 63
dz_bottom: 30.0
dz_top: 3000.0
moist: equil
precip_model: 1M
rad: allskywithclear
idealized_insolation: false
dt_rad: 1hours
vert_diff: FriersonDiffusion
implicit_diffusion: true
approximate_linear_solve_iters: 2
dt_cloud_fraction: 1hours
surface_setup: DefaultMoninObukhov
rayleigh_sponge: true
dt: 100secs
t_end: 1days
job_id: gpu_aquaplanet_dyamond_ss_diag_1process
toml: [toml/longrun_aquaplanet_dyamond.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ws_1process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ approximate_linear_solve_iters: 2
surface_setup: "DefaultMoninObukhov"
rayleigh_sponge: true
dt: "100secs"
t_end: "12hours"
t_end: "1days"
job_id: "gpu_aquaplanet_dyamond_ws_1process"
toml: [toml/longrun_aquaplanet_dyamond.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ws_2process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ approximate_linear_solve_iters: 2
surface_setup: "DefaultMoninObukhov"
rayleigh_sponge: true
dt: "100secs"
t_end: "12hours"
t_end: "1days"
job_id: "gpu_aquaplanet_dyamond_ws_2process"
toml: [toml/longrun_aquaplanet_dyamond.toml]
2 changes: 1 addition & 1 deletion config/gpu_configs/gpu_aquaplanet_dyamond_ws_4process.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ approximate_linear_solve_iters: 2
surface_setup: "DefaultMoninObukhov"
rayleigh_sponge: true
dt: "100secs"
t_end: "12hours"
t_end: "1days"
job_id: "gpu_aquaplanet_dyamond_ws_4process"
toml: [toml/longrun_aquaplanet_dyamond.toml]
3 changes: 3 additions & 0 deletions perf/flame.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
redirect_stderr(IOContext(stderr, :stacktrace_types_limited => Ref(false)))
import Random
import PProf
Random.seed!(1234)
import ClimaAtmos as CA

Expand Down Expand Up @@ -28,6 +29,7 @@ mkpath(output_dir)
Profile.clear()
prof = Profile.@profile SciMLBase.step!(integrator)
results = Profile.fetch()
PProf.pprof(results, web=false)
Profile.clear()

ProfileCanvas.html_file(joinpath(output_dir, "flame.html"), results)
Expand Down Expand Up @@ -79,6 +81,7 @@ Profile.Allocs.@profile sample_rate = sampling_rate SciMLBase.step!(integrator)
results = Profile.Allocs.fetch()
Profile.Allocs.clear()
profile = ProfileCanvas.view_allocs(results)
PProf.Allocs.pprof(results, web = false)
ProfileCanvas.html_file(joinpath(output_dir, "allocs.html"), profile)

# We're grouping allocation tests here for convenience.
Expand Down
43 changes: 43 additions & 0 deletions perf/longer_flame.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
redirect_stderr(IOContext(stderr, :stacktrace_types_limited => Ref(false)))
import Random
Random.seed!(1234)
import ClimaAtmos as CA
import Profile, ProfileCanvas
import SciMLBase

length(ARGS) != 1 && error("Usage: longer_flame.jl <config_file>")
config_file = ARGS[1]
config = CA.AtmosConfig(config_file)
FT = eltype(config)
# 1 day is a good compromise to go through most of the functions and diagnostics with almost
# realistic cadence. For diagnostics, production runs will probably have a monthly output,
# so this is a worst case to optimize for.
config.parsed_args["t_end"] = "1days"
simulation = CA.get_simulation(config)

# First, we run the simulation for a few timesteps to compile most of the functions
# (including top level function such as timed_solve!)

simulation_dt = FT(CA.time_to_seconds(config.parsed_args["dt"]))
# Change the final time to 5dt
SciMLBase.reinit!(simulation.integrator, tf = 5simulation_dt)

# We force the compilation of all callbacks to also include radiation
CA.call_all_callbacks!(simulation.integrator)
# Solve for a little bit
CA.timed_solve!(simulation.integrator)

# Change the final time to 1day
SciMLBase.add_tstop!(simulation.integrator, FT(86400))

@info "$simulation"

@info "Collecting profile"
Profile.init(n = 10^7, delay = 0.1)
prof = Profile.@profile CA.timed_solve!(simulation.integrator)
results = Profile.fetch()

flame_path = joinpath(simulation.output_dir, "flame.html")

ProfileCanvas.html_file(flame_path, results)
@info "Flame saved in $flame_path"
Loading