Skip to content

Commit

Permalink
Merge pull request #874 from NREL-SIIP/dt/partitioned-simulations
Browse files Browse the repository at this point in the history
Add ability to split simulations into partitions and run them in parallel
  • Loading branch information
jd-lara authored Aug 12, 2022
2 parents 25dafb9 + 3ff2564 commit 4c01cb2
Show file tree
Hide file tree
Showing 13 changed files with 1,145 additions and 49 deletions.
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
DocStringExtensions = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
InfrastructureSystems = "2cd47ed4-ca9b-11e9-27f2-ab636a7671f1"
Expand Down
196 changes: 196 additions & 0 deletions docs/src/man/parallel_simulations_hpc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
## Run a Simulation in Parallel on an HPC

This page describes how to split a simulation into partitions, run each partition in parallel
on HPC compute nodes, and then join the results.

These steps can be used on a local computer or any HPC supported by the submission software.
Some steps may be specific to NREL's HPC `Eagle` cluster.

*Note*: Some instructions are preliminary and will change if functionality is moved
to a new Julia package.

### Setup

1. Create a conda environment and install the Python package `NREL-jade`:
https://nrel.github.io/jade/installation.html. The rest of this page assumes that
the environment is called `jade`.
2. Activate the environment with `conda activate jade`.
3. Locate the path to that conda environment. It will likely be `~/.conda-envs/jade` or
`~/.conda/envs/jade`.
4. Load the Julia environment that you use to run simulations. Add the packages `Conda` and
`PyCall`.
5. Setup Conda to use the existing `jade` environment by running these commands:

```
julia> run(`conda create -n conda_jl python conda`)
julia> ENV["CONDA_JL_HOME"] = joinpath(ENV["HOME"], ".conda-envs", "jade") # change this to your path
pkg> build Conda
```

6. Copy the code below into a Julia file called `configure_parallel_simulation.jl`.
This is an interface to Jade through PyCall. It will be used to create a Jade configuration.
(It may eventually be moved to a separate package.)

```
function configure_parallel_simulation(
script::AbstractString,
num_steps::Integer,
num_period_steps::Integer;
num_overlap_steps::Integer=0,
project_path=nothing,
simulation_name="simulation",
config_file="config.json",
force=false,
)
partitions = SimulationPartitions(num_steps, num_period_steps, num_overlap_steps)
jgc = pyimport("jade.extensions.generic_command")
julia_cmd = isnothing(project_path) ? "julia" : "julia --project=$project_path"
setup_command = "$julia_cmd $script setup --simulation-name=$simulation_name " *
"--num-steps=$num_steps --num-period-steps=$num_period_steps " *
"--num-overlap-steps=$num_overlap_steps"
teardown_command = "$julia_cmd $script join --simulation-name=$simulation_name"
config = jgc.GenericCommandConfiguration(
setup_command=setup_command,
teardown_command=teardown_command,
)
for i in 1:get_num_partitions(partitions)
cmd = "$julia_cmd $script execute --simulation-name=$simulation_name --index=$i"
job = jgc.GenericCommandParameters(command=cmd, name="execute-$i")
config.add_job(job)
end
config.dump(config_file, indent=2)
println("Created Jade configuration in $config_file. " *
"Run 'jade submit-jobs [options] $config_file' to execute them.")
end
```

7. Create a Julia script to build and run simulations. It must meet the requirements below.
A full example is in the PowerSimulations repository in `test/run_partitioned_simulation.jl`.

- Call `using PowerSimulations`.

- Implement a build function that matches the signature below.
It must construct a `Simulation`, call `build!`, and then return the `Simulation` instance.
It must throw an exception if the build fails.

```
function build_simulation(
output_dir::AbstractString,
simulation_name::AbstractString,
partitions::SimulationPartitions,
index::Union{Nothing, Integer}=nothing,
)
```

Here is example code to construct the `Simulation` with these parameters:

```
sim = Simulation(
name=simulation_name,
steps=partitions.num_steps,
models=models,
sequence=sequence,
simulation_folder=output_dir,
)
status = build!(sim; partitions=partitions, index=index, serialize=isnothing(index))
if status != PSI.BuildStatus.BUILT
error("Failed to build simulation: status=$status")
end
```

- Implement an execute function that matches the signature below. It must throw an exception
if the execute fails.

```
function execute_simulation(sim, args...; kwargs...)
status = execute!(sim)
if status != PSI.RunStatus.SUCCESSFUL
error("Simulation failed to execute: status=$status")
end
end
```

- Make the script runnable as a CLI command by including the following code at the bottom of the
file.

```
function main()
process_simulation_partition_cli_args(build_simulation, execute_simulation, ARGS...)
end
if abspath(PROGRAM_FILE) == @__FILE__
main()
end
```

### Execution

1. Create a Jade configuration that defines the partitioned simulation jobs. Load your Julia
environment.

This example splits a year-long simulation into weekly partitions for a total of 53 individual
jobs.

```
julia> include("configure_parallel_simulation.jl")
julia> num_steps = 365
julia> period = 7
julia> num_overlap_steps = 1
julia> configure_parallel_simulation(
"my_simulation.jl", # this is your build/execute script
num_steps,
period,
num_overlap_steps=1,
project_path=".", # This optionally specifies the Julia project environment to load.
)
Created Jade configuration in config.json. Run 'jade submit-jobs [options] config.json' to execute them.
```

Exit Julia.

2. View the configuration for accuracy.

```
$ jade config show config.json
```

3. Start an interactive session on a debug node. *Do not submit the jobs on a login node!* The submission
step will run a full build of the simulation and that may consume too many CPU and memory resources
for the login node.

```
$ salloc -t 01:00:00 -N1 --account=<your-account> --partition=debug
```

4. Follow the instructions at https://nrel.github.io/jade/tutorial.html to submit the jobs.
The example below will configure Jade to run each partition on its own compute node. Depending on
the compute and memory constraints of your simulation, you may be able to pack more jobs on each
node.

Adjust the walltime as necessary.

```
$ jade config hpc -c hpc_config.toml -t slurm --walltime=04:00:00 -a <your-account>
$ jade submit-jobs config.json --per-node-batch-size=1 -o output
```

If you are unsure about how much memory and CPU resources your simulation consumes, add these options:

```
$ jade submit-jobs config.json --per-node-batch-size=1 -o output --resource-monitor-type periodic --resource-monitor-interval 3
```

Jade will create HTML plots of the resource utilization in `output/stats`. You may be able to customize
`--per-node-batch-size` and `--num-processes` to finish the simulations more quickly.

5. Jade will run a final command to join the simulation partitions into one unified file. You can load the
results as you normally would.

```
julia> results = SimulationResults("<output-dir>/job-outputs/<simulation-name>")
```

Note the log files and results for each partition are located in
`<output-dir>/job-outputs/<simulation-name>/simulation_partitions`
80 changes: 80 additions & 0 deletions docs/src/man/parallel_simulations_local.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
## Run a Simulation in Parallel on a local computer

This page describes how to split a simulation into partitions, run each partition in parallel,
and then join the results.

### Setup

Create a Julia script to build and run simulations. It must meet the requirements below.
A full example is in the PowerSimulations repository in `test/run_partitioned_simulation.jl`.

- Call `using PowerSimulations`.

- Implement a build function that matches the signature below.
It must construct a `Simulation`, call `build!`, and then return the `Simulation` instance.
It must throw an exception if the build fails.

```
function build_simulation(
output_dir::AbstractString,
simulation_name::AbstractString,
partitions::SimulationPartitions,
index::Union{Nothing, Integer}=nothing,
)
```

Here is example code to construct the `Simulation` with these parameters:

```
sim = Simulation(
name=simulation_name,
steps=partitions.num_steps,
models=models,
sequence=sequence,
simulation_folder=output_dir,
)
status = build!(sim; partitions=partitions, index=index, serialize=isnothing(index))
if status != PSI.BuildStatus.BUILT
error("Failed to build simulation: status=$status")
end
```

- Implement an execute function that matches the signature below. It must throw an exception
if the execute fails.

```
function execute_simulation(sim, args...; kwargs...)
status = execute!(sim)
if status != PSI.RunStatus.SUCCESSFUL
error("Simulation failed to execute: status=$status")
end
end
```

### Execution

After loading your script, call the function `run_parallel_simulation` as shown below.

This example splits a year-long simulation into weekly partitions for a total of 53 individual
jobs and then runs them four at a time.

```
julia> include("my_simulation.jl")
julia> run_parallel_simulation(
build_simulation,
execute_simulation,
script="my_simulation.jl",
output_dir="my_simulation_output",
name="my_simulation",
num_steps=365,
period=7,
num_overlap_steps=1,
num_parallel_processes=4,
exeflags="--project=<path-to-your-julia-environment>",
)
```

The final results will be in `./my_simulation_otuput/my_simulation`

Note the log files and results for each partition are located in
`./my_simulation_otuput/my_simulation/simulation_partitions`
9 changes: 9 additions & 0 deletions src/PowerSimulations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ export InitialCondition
export SimulationModels
export SimulationSequence
export SimulationResults
export SimulationPartitions
export SimulationPartitionResults

# Network Relevant Exports
export NetworkModel
Expand Down Expand Up @@ -120,6 +122,7 @@ export run!
## Sim Model Exports
export execute!
export get_simulation_model
export run_parallel_simulation
## Template Exports
export template_economic_dispatch
export template_unit_commitment
Expand Down Expand Up @@ -199,6 +202,7 @@ export show_recorder_events
export list_simulation_events
export show_simulation_events
export export_realized_results
export get_num_partitions

## Enums
export BuildStatus
Expand Down Expand Up @@ -377,6 +381,7 @@ export get_resolution
import PowerModels
import TimerOutputs
import ProgressMeter
import Distributed

# Base Imports
import Base.getindex
Expand Down Expand Up @@ -408,6 +413,8 @@ export SOCWRConicPowerModel
export QCRMPowerModel
export QCLSPowerModel

export process_simulation_partition_cli_args

################################################################################

# Type Alias From other Packages
Expand Down Expand Up @@ -508,6 +515,8 @@ include("simulation/simulation_problem_results.jl")
include("simulation/realized_meta.jl")
include("simulation/decision_model_simulation_results.jl")
include("simulation/emulation_model_simulation_results.jl")
include("simulation/simulation_partitions.jl")
include("simulation/simulation_partition_results.jl")
include("simulation/simulation_sequence.jl")
include("simulation/simulation_internal.jl")
include("simulation/simulation.jl")
Expand Down
1 change: 1 addition & 0 deletions src/core/definitions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ const KNOWN_SIMULATION_PATHS = [
"recorder",
"results",
"simulation_files",
"simulation_partitions",
]
const RESULTS_DIR = "results"

Expand Down
Loading

0 comments on commit 4c01cb2

Please sign in to comment.