Data Model for Shared STATE (supports model modularity) RFC #112

rburghol · 2023-06-02T18:27:12Z

Currently, hsp2 features domain-specific (i.e. a single RCHERS, PERLND, etc) simulation and data access, with with only the current segment data, i.e. the ts Dict, parameters, and internal calculation state loaded, passed to the functional routine. To support the legacy SPEC-ACTIONS (#90) and potential new enhanced model modularity features, a robust data model is needed to facilitate passing STATE across multiple domains (segments) and amongst the various functions. This issue attempts to outline a proposed data structure schema to facilitate this, including performance considerations, and functional considerations. (see also: #126 )

Goals

Sharing STATE should allow calculations to be done on any publicly modifiable state variable (individual code/model domains must opt-in to STATE sharing).
State can be shared across spatial (segment) domain (i.e., from RCHRES1 to RCHRES2, etc)
State can be shared across model functional (operation) domains (i.e. HYDR, PERLND, PQUAL, GQUAL, ...)
State schema should discourage name collisions
Schema should be high performance (i.e., compatible with numba/@njit
Modules should be able to opt-in for state sharing, i.e., just because something is set in the shared STATE data structure does not guarantee that it modifies model behavior.

Benefits

STATE dependency ordering integrates well with Special Actions type behavior, as STATE is designed based on the nodular paradigm introduced by MASSLINKS and SPECACT.
Because STATE provides cross-domain data access, implementing modules can reduce data access and argument overhead.
STATE components/primitives can enable very concise runtime code chunks.
Execution tracing through special actions and other operations in STATE system is simplified.
Because STATE functions/objects are dependency ordered, code amendments are possible in a highly modular fashion, and labyrinthine code blocks can be reduced.
STATE primitives can be optimized, and thus, when meta-functions are assembled out of STATE primitives, code optimization is persistent.

Draft Data Model

Integer keyed STATE

state_paths is 1-d, string keyed, paths based on the hierarchical paths stored in the hdf5. Its keys are full path to hdf5 STATE, and its values point to the integer key for use in all other runtime Dicts. The key is generated automatically at the beginning of model loading and is static throughout the simulation.
- Ex:
  - The STATE for RCHRES 1, IVOL has an hdf5 path of /STATE/RCHRES_R001/IVOL
  - /STATE/RCHRES_R001/IVOL is the 25th item added to the STATE during model loading, and thus it's integer index is 25
  - This integer index for this /STATE/RCHRES_R001/IVOL is found in state_paths, such that state_paths['/STATE/RCHRES_R001/IVOL'] = 25
state_ix: Dict that holds scalar numeric state values. It is an integer keyed, version of hdf5.
- Ex:
  - At end of a given timestep RCHRES 1, IVOL state value is 1803.5
  - The integer index value for IVOL in RCHRES 1 variable is 25
  - Therefore state_ix[25] = 1803.5
dict_ix is integer keyed Dict to store values of array/matrix objects.
ts_ix - Dict of timeseries data (TBD: may be redundant to dict_ix, given that all ts data can be keyed via an hdf5 path)

state_paths = Dict.empty(key_type=types.unicode_type, value_type=types.int64)
state_ix = Dict.empty(key_type=types.int64, value_type=types.float64)
dict_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:,:])
ts_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:])

Concepts

Data Namespace Implement hdf5 path type notation is ideal schema for cross-domain sharing & compatibility with new hsp2 adoption of hdf5
- Discourages name collisions
- Ex: Inflow to RCHRES 1 at any timestep is stored in /STATE/RCHRES_R001/IVOL
- Ex: Inflow to RCHRES 2 at any timestep is stored in /STATE/RCHRES_R002/IVOL
Coupling Tightly coupled supporting modules should write to state at the end of every timestep, and read from state after executing coupled operational models.
- Current hsp2 model structure executes all timesteps for each spatial domain at one time
- Tight coupling would mean all spatial domains should execute in sequence at a given timestep
- Complete coupling would mean all spatial domains and functional domains would execute using "dependency ordered" method.
- This will require substantial rewrite of hsp2 routines, however, base functions are fairly amenable to extracting timestep loop code.
Performance
- Early testing of using character-keyed STATE Dict showed a large performance hit over using integer-keyed STATE Dict. This may be due to the way in which the character-keyed Dict was set up. Testing code to be included in this issue soon.

Implementation

numba compatibility requires separate storage for
- Scalar values
- Vector/Array-type values
Integer indexed Dict may perform faster than Character indexed Dict, therefore, this should be optimized. (see example below).
- Character keyed Dicts described below in *Data Structure Option 1
- Integer keyed Dicts described below in *Data Structure Option 2
Implemented with Data Structure Option 2 (see below)
See also module STATE sub-module refactor #126

Data Structure Option 1 - Character keyed STATE

Data Structures
- state_ix: Holds numeric state values. It is a single-dimension, hdf5 path keyed Dict.
- dict_ix Holds values of array/matrix objects. It is multi-dimensional hdf5 path keyed Dict.
- ts_ix - Holds values of timeseries dat, a (TBD: may be redundant to dict_ix, given that all ts data can be keyed via an hdf5 path)

state_ix = Dict.empty(key_type=types.int64, value_type=types.float64)
dict_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:,:])
ts_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:])

Data Structure Option 2

See above Draft Data Model

The text was updated successfully, but these errors were encountered:

rburghol · 2023-06-05T14:39:05Z

Performance test. Integer-keyed versus character-keyed STATE Dict. Tests performed for a 50-year, 1-hour timestep simulation.

Character-keyed timeseries (after compile) run times between 0.1693 and 0.2163 seconds
Integer-keyed timeseries (after initial compile) run times between 0.0866 and 0.1266 seconds
- Consistently 30% faster than character-keyed
Character-keyed STATE (after compile) run times between 0.0428 and 0.0434 seconds
Integer-keyed STATE (after initial compile) run times between 0.0866 and 0.1266 seconds
- Consistently 70% faster than character-keyed

Set Up State and Time Series Dicts

import time
import numpy as np
from numba import njit, jit
from math import sin, cos
from numba import types
from numpy import zeros, any, full, nan, array, int64
from numba.typed import Dict

steps  = 365 * 24 * 50# 50 year hourly simulation
# Character-indexed
ts = Dict.empty(key_type=types.unicode_type, value_type=types.float64[:])
state = Dict.empty(key_type=types.unicode_type, value_type=types.float64)
# set up some base data
ts['/RESULTS/RCHRES_001/SPECL/Qin'] = zeros(steps)
ts['/RESULTS/RCHRES_001/SPECL/Qlocal'] = zeros(steps)
ts['/RESULTS/RCHRES_001/SPECL/Qout'] = zeros(steps)
state['/RESULTS/RCHRES_001/SPECL/Qin'] = 0.0
state['/RESULTS/RCHRES_001/SPECL/Qlocal'] = 0.0
state['/RESULTS/RCHRES_001/SPECL/Qout'] = 0.0

# Integer-indexed
ts_ix = Dict.empty(key_type=types.int64, value_type=types.float64[:])
state_ix = Dict.empty(key_type=types.int64, value_type=types.float64)
# set up some base data
ts_ix[1] = zeros(steps)
ts_ix[2] = zeros(steps)
ts_ix[3] = zeros(steps)
state_ix[1] = 0.0
state_ix[2] = 0.0
state_ix[3] = 0.0

Timeseries Variable Setting Only


@njit
def iterate_specl_ts(ts, step):
    ts['/RESULTS/RCHRES_001/SPECL/Qlocal'][step] = sin(step)
    ts['/RESULTS/RCHRES_001/SPECL/Qin'][step] = abs(cos(5 * step))
    ts['/RESULTS/RCHRES_001/SPECL/Qout'][step] = ts['/RESULTS/RCHRES_001/SPECL/Qlocal'][step] + ts['/RESULTS/RCHRES_001/SPECL/Qin'][step]
    return

# Force a Compile
iterate_specl_ts(ts, 0)

@njit
def iterate_specl_ts_ix(ts_ix, step):
    ts_ix[1][step] = sin(step)
    ts_ix[2][step] = abs(cos(5 * step))
    ts_ix[3][step] = ts_ix[1][step] + ts_ix[2][step]
    return

# Force a Compile
iterate_specl_ts_ix(ts_ix, 0)

@njit
def run_ts(ts, steps):
    for step in range(steps):
        iterate_specl_ts(ts, step)

start = time.time()
run_ts(ts, steps)
end = time.time()
print(end - start, "seconds")
# Run times between 0.1693 and 0.2163 seconds

@njit
def run_ts_ix(ts_ix, steps):
    for step in range(steps):
        iterate_specl_ts_ix(ts_ix, step)

run_ts_ix(ts_ix, 1):

start = time.time()
run_ts_ix(ts_ix, steps)
end = time.time()
print(end - start, "seconds")
# Run times between 0.0866 and 0.1266 seconds

State Variable Setting Only

@njit
def iterate_specl_state(state, step):
    state['/RESULTS/RCHRES_001/SPECL/Qlocal'] = sin(step)
    state['/RESULTS/RCHRES_001/SPECL/Qin'] = abs(cos(5 * step))
    state['/RESULTS/RCHRES_001/SPECL/Qout'] = state['/RESULTS/RCHRES_001/SPECL/Qlocal'] + state['/RESULTS/RCHRES_001/SPECL/Qin']
    return

# Force a Compile
iterate_specl_state(state, 0)

@njit
def iterate_specl_state_ix(state_ix, step):
    state_ix[1] = sin(step)
    state_ix[2] = abs(cos(5 * step))
    state_ix[3] = state_ix[1] + state_ix[2]
    return

# Force a Compile
iterate_specl_state_ix(state_ix, 0)

@njit
def run_state(state, steps):
    for step in range(steps):
        iterate_specl_state(state, step)

run_state(state, 1)

start = time.time()
run_state(state, steps)
end = time.time()
print(end - start, "seconds")
# Run times between 0.1407 and 0.1451 seconds

@njit
def run_state_ix(state_ix, steps):
    for step in range(steps):
        iterate_specl_state_ix(state_ix, step)

run_state_ix(state_ix, 1)

start = time.time()
run_state_ix(state_ix, steps)
end = time.time()
print(end - start, "seconds")
# Run times between 0.0428 and 0.0434 seconds

rburghol · 2023-12-15T21:45:50Z

Done. Need to move to documentation. See module #126

This was referenced Jun 21, 2023

Specl and Objects hdf5 data model HARPgroup/HSPsquared#20

Open

Dynamic python code loading for hsp2 runtime customization #113

Closed

aufdenkampe mentioned this issue Oct 24, 2023

Enhancements for custom python SPECL, small GENER fixes #120

Merged

rburghol mentioned this issue Nov 20, 2023

specl: ACTIONS (parser and code) #105

Open

12 tasks

rburghol mentioned this issue Dec 15, 2023

STATE sub-module refactor #126

Open

3 tasks

rburghol closed this as completed Dec 15, 2023

rburghol mentioned this issue Dec 20, 2023

SPEC-ACTIONS in hsp2 #90

Open

26 tasks

aufdenkampe added this to the v0.12.0: Improved packaging, etc. milestone May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Model for Shared STATE (supports model modularity) RFC #112

Data Model for Shared STATE (supports model modularity) RFC #112

rburghol commented Jun 2, 2023 •

edited

Loading

rburghol commented Jun 5, 2023 •

edited

Loading

rburghol commented Dec 15, 2023

Data Model for Shared STATE (supports model modularity) RFC #112

Data Model for Shared STATE (supports model modularity) RFC #112

Comments

rburghol commented Jun 2, 2023 • edited Loading

Goals

Benefits

Draft Data Model

Concepts

Implementation

Data Structure Option 1 - Character keyed STATE

Data Structure Option 2

rburghol commented Jun 5, 2023 • edited Loading

Set Up State and Time Series Dicts

Timeseries Variable Setting Only

State Variable Setting Only

rburghol commented Dec 15, 2023

rburghol commented Jun 2, 2023 •

edited

Loading

rburghol commented Jun 5, 2023 •

edited

Loading