-
Notifications
You must be signed in to change notification settings - Fork 100
Multi Processing
This is all just prototype stuff now, but the general idea is to implement multiprocessing in as non-invasive way as possible, at least at first.
So we keep the run list, but we annotate it using a second structure (multiprocess_steps) that indicates how to intervene.
multiprocess_steps is an array of dicts. Each step multiprocess_step consists of one or models that can be run in sequence either as a single process, or multiprocessed with each process handling a subset of the model data.
label is a string used for logging and tagging output files.
Each step represents a set of model steps identified by the 'begin' key which names the first model step. To avoid redundency, the last model in the set is implicit: up to but not including the first model in the next multiprocess_steps (or the rest of the models for the last step.)
slice implicitly identifies a step as multiprocess. It contains instruction on how to slice the model data so that teh different segments can be processed independently. Usually, this would be segmentation by household (all persons must appear in the same segment because of intra-household dependencies.) However, the other segmentatinos are possible. THe most obvious being segmentatino by zone for accessibility calcuation. However, since the mtctm1 accessibility calculation is fast, we don't segment it in the example. the slice.tables entry contains a list of slicers to use to segment teh household, the first entry being primary, followed by additional cascading dependencies (e.g. persons segmentatino depends on households) following standard activitysim index_name/referring_column conventions. There is also the option of specifying a slice.except list to exclude tables from segmentation. (e.g. to avoid slicing the land_use table when calculating accessibility.)
num_processes indicates the number of processors to devote to the step. It is an error for single-process steps to specify more than 1 processor, or multi-process steps to specify less than 2. If not specified, the default value is 1 for single-process, and cpu_count for multi-process.
chunk_size specifies a custom chunk size for the step. If no specified, then the global chunk size is used, but for multiprocess, it is divided by the number of processes so that the total chunk size across processes totals to global chunk_size.
chunk_size: 4000000000
multiprocess_steps:
- label: mp_initialize
begin: initialize_landuse
- label: mp_households
begin: _school_location_sample
num_processes: 3
chunk_size: 1000000000
slice:
tables:
- households
- persons
- label: mp_summarize
begin: write_data_dictionary